Naming

Naming is a surprisingly deep subject in language design. Many programming languages have sophisticated systems for managing names.

We have seen some of the issues that arise with naming in the previous lecture, when we discussed variable capture. In this lecture, we will give a high level view of the design space for naming systems in programming languages.

The simplest solution to naming is to use a global, flat namespace, where every variable has a distinct name. This strategy comes with a variety of problems. As programs grow larger, they tend to accumulate new functions, values, and other named entities. A single global namespace is difficult for programmers to manage for a few reasons:

Therefore, most languages provide some facilities for organizing these names and controlling which names are in scope at a particular location in the program.

Static and dynamic scope

We introduced scopes in our discussion of FL. The scope of a variable is the part of the program where the variable can be referenced. The scoping rules of the language determine what this region is. Scoping rules can be either static or dynamic.

Static scope

In static scoping, it is always possible to determine the scope of a variable from the program's AST. Because scopes can be determined from syntax, static scoping is also called lexical scoping.

We've seen static scoping before: lambda parameters in FL are statically scoped. Here's another example, written in C:

void func(int x) {
  // in scope: x
  int y = x + 1;
  // in scope: x, y
  for (int i = y; i > 0; i--) {
    x++; // in scope: x, y, i
  }
  // in scope: x, y
  return x;
}

The scope of local variables and function arguments in C extends from the declaration of the variable to the end of the block. This is clearly a syntactic property of the program. Lexical scopes can usually be nested, and when looking up an identifier, the most enclosing scope that contains a binding of that identifier is chosen.

Many languages have multiple forms of static scope. For example, C offers block scope for local variables, file scope for global variables, function scope for labels (that is, the targets for goto), and function prototype scope (for the variable names inside function prototypes). The scoping rules for C have actually evolved over time; older versions of C did not introduce new scopes for the bodies of loops and conditionals. Notice how different scopes are used for different entities: function scope for labels vs block scope for variables.

Dynamic scope

In contrast, in dynamic scoping the scope of a variable can depend on the execution path of the program. Dynamic scope is much less common than static scope in modern programming languages, although it was used more extensively in older languages. In particular, early Lisp languages used dynamic scoping by default. Some lisps, like Emacs Lisp, still default to dynamic scope.

Here's an example from the Emacs manual:

(defvar x -99)    ; x receives an initial value of -99.

(defun getx () x) ; x is used free in this function.

(let ((x 1))      ; x is dynamically bound.
  (getx))
⇒ 1

;; After the let form finishes, x reverts to its previous value, which is -99.
(getx)
⇒ -99

As this example shows, looking up x from within getx can return different values depending on where it is called. If getx is called from a context that binds x, then getx sees that value of that binding. Otherwise, it sees the initial value.

One way to think about dynamic scope is that when we look up a variable, we walk up the call stack until we find a frame that contains a binding for that variable.

Dynamic scope vs global variables

Another way to think about dynamic scope is that it is a disciplined way to create overridable global variables. In fact, if we have mutable global variables we can implement dynamic scoping as a desugaring:

(dlet var value body)
=>
(let oldvalue var
     (progn
       (set var value)
       body
       (set var oldvalue)))

We haven't seen progn before, but it evaluates a list of expressions in order. set sets a global variable to a new value.

Of course, this desugaring works fine for sequential code, but falls apart entirely when dealing with concurrency, where we might want two concurrent executions to have different values of var.

Design discussion

There are good reasons why most languages have abandoned dynamic scoping:

  • It's anti-modular. Programmers have to reason about dynamically scoped names globally—calling a library function safely requires the programmer to know what dynamically scoped names it uses, and ensuring that those names are set to values that the library can handle. In contrast, static scoping allows names to be hidden from the outside world, which allows different parts of a codebase to reuse the same name without worrying about semantic mismatches.
  • It's generally easier to implement static scoping efficiently. Dynamic scoping requires a lookup in an associative container to find the current binding for a name, where static scoping can often be as simple as reading a location in memory.

Despite its drawbacks, dynamic scoping can be quite powerful.

  • It gives programmers a simple, powerful way to thread implicit parameters into functions.

    For example, you might want to write a library that lets the caller decide whether to write output to stdout or to a file. Callers might want to make this decision on a per-call basis. One way you could implement this feature is to add a parameter to every library function that determines where output goes. This approach works, but it adds a parameter that has to be threaded through calls inside the library, and it doesn't scale well if you have many such parameters. It also makes the library quite difficult to extend, because any new parameter must be threaded through all the existing functions.

    Instead, you could add a dynamic variable output that controls where the library writes its output. Callers could dynamically bind output before calling into your library, and get the same level of control as they would with a dedicated parameter.

    Implicit parameters are alive and well in modern programming languages (e.g. Scala), although they are much more restrictive than full dynamic scoping.

Namespaces

Many languages have more than one namespace for a given scope. This is particularly common in languages where some entities are not first class. It's also common for dynamically scoped variables to exist in a separate namespace from lexical variables.

Digression: first class entities

We often distinguish between first and second class entities in a language. The idea of a first class citizen comes from Strachey, and it refers to entities that act like values (Christopher S. Strachey, 2000). Strachey uses Algol as an example; in Algol, numbers are first class, but procedures are not. This is because numbers have the properties of values: they can be assigned to variables, passed to functions, and they have operations defined on them, and they can appear as the result of evaluating an expression. Procedures cannot be assigned to variables, passed to other procedures, or appear as the result of evaluation; they can only be called and must be called directly by name.

Modern languages have tended to make more entities first class, since this makes the language more expressive. First class functions are common; some languages offer more exotic constructs like first class types.

Namespaces in C

For example, in C we can use the same name (x) to refer to a variable, a struct field, and a type:

#include <stdio.h>

typedef struct {
  int x;
} x;

int f(x x) {
  return x.x + x.x;
}

int main() {
  x x = { 2 };
  printf("%d\n", f(x));
}
4

All the references to x appear in overlapping scopes; C can disambiguate them because they appear in different syntactic positions. Types and struct fields are not first class in C—they cannot be passed as arguments or stored in variables. The contents of a field can be, but the field itself cannot. We can't write a function that takes a list of struct fields and returns a new struct. When a variable and a type or field name can never appear in the same syntactic position, there's no need to use a shared namespace for them, and we can give the programmer more freedom to choose names by separating their namespaces.

Modern languages have tended to make more entities first class than historical languages. For example, even C treats functions as first class objects (this is why we can't rename f to x in our example).

A historical example: Lisp-1 and Lisp-2

The number of namespaces was a source of significant debate in the Lisp community. Traditionally, Lisp had separate namespaces for functions and variables. Scheme introduced several new ideas, including lexical scoping, and also eliminated the split between functions and variables. Some of these ideas, like lexical scoping, were added to Common Lisp, but the single namespace was not.

It's interesting to look back at the debate about this choice, because it's a great example of a programming language design dilemma that appears to have an obvious answer today, but was much less clear at the time.

A Lisp language with a single namespace is called a Lisp-1; two namespaces is a Lisp-2. Gabriel & Pitman lay out the arguments for and against Lisp-1 in "Technical Issues of Separation in Function Cells and Value Cells".

Arguments for Lisp-1:

  • Syntactic simplicity: functions are first class in Common Lisp, so having separate namespaces means that additional syntax (FUNCTION) is needed to turn functions into values and to call values as functions (FUNCALL).
  • Compiler simplicity: the compiler only needs to maintain a single namespace, which might make it slightly simpler.
  • Expression evaluation would no longer depend on context: x in (x ...) would be evaluated in the same way as in (... x ...).
  • Multiple meanings for a single name can be confusing.

Arguments against:

  • Allowing names to take on multiple meanings depending on context gives the programmer more flexibility and may allow them to write more understandable code.
  • Macros are slightly worse, because there are more names available to be captured.

Conclusion

Ultimately the designers of Common Lisp decided that turning it into a Lisp-1 would be too drastic a change, so kept the Lisp-2 design. Most modern languages have made the opposite decision, and separate namespaces persist mostly for non first class parts.

Last updated: 2023-03-07 Tue 18:59