Syntax 2

Extensibility

Many languages offer features for extending the syntax of the language. These features vary widely in their expressiveness. Expressive metaprogramming features were a significant area of interest in the 60s, but they proved to be difficult to reason about. In particular, significant extensions are often equally difficult to build using metaprogramming as directly writing in the base language.

However, some forms of extensibility have persisted, and old designs are re-emerging. For instance, Rust brought back many expressive features such as macros.

We'll discuss some common syntax extension features, from least to most expressive:

Extended operators

Some languages, such as OCaml and Haskell, allow the user to extend their set of infix operators. To be clear, this functionality is different from the ability to redefine existing operators, which many other languages have. Here we're introducing entirely new operators.

As discussed above, infix operators can lead to ambiguity if not handled carefully. In particular, there needs to be a way to declare the precedence and associativity of any new operators.

OCaml

OCaml solves this problem by determining precedence and associativity from the first character of the operator. For example, we could add a new kind of addition operator for some new type with the following code:

let (+?) x y = ... in
x +? y

The +? operator inherits the precedence and associativity from +.

Haskell

Haskell has a more general system. It allows operators to be redefined using a similar syntax to OCaml, but users can also specify their precedence and associativity directly using additional syntax:

(+?) x y = ...
infixl 5 +?

Haskell also allows arbitrary two-argument functions to be used infix: e.g. a `plus` b.

Agda

Agda further extends Haskell's custom operator system to allow what they call mixfix operators, or operators that take an arbitrary number of arguments. For example, if-then-else can be defined as an operator as follows1:

if_then_else_ : {A : Set} → Bool → A → A → A
if true then x else y = x
if false then x else y = y

The underscores denote the places where arguments should appear.

Design considerations

The ability to freely define new operators can be a blessing and a curse. It gives programmers more freedom, but overuse of novel operators can make programs entirely unreadable. Haskell in particular has gained a reputation for overuse of symbols. The Rust community discusses some of the design issues here.

Annotations

Annotations are a slightly more expressive syntax extension feature, and are offered by a wide variety of languages from Java to OCaml. Syntax annotations are a way to attach unstructured data to a program's AST. Language features are sometimes built using annotations, but generally they will be ignored by the compiler or interpreter. Instead, users are expected to write external tools that process the AST and annotations.

In Java, annotations are used both by the compiler and by external tools. For example, methods in a child class that override a parent method can be marked with the @Override annotation. For example:

@Override
void myMethod() { ... }

The Java compiler will check that there is actually a method in the parent class to override, and issue an error if not.

Annotations are also used for external tools like Project Lombok, which generates boilerplate code like toString or hashCode methods. From the Project Lombok documentation, the @ToString annotation generates the following code:

@ToString(includeFieldNames=true)
public static class Square {
    private final int width, height;
}
public static class Square {
    private final int width, height;

    @Override public String toString() {
	return "Square(width=" + this.width + ", height=" + this.height + ")";
    }
}

We can see that Java annotations can take parameters that are accessible to these external tools.

Magic comments

Some languages, like Go and Fortran, rely on special syntax in comments to provide language extensions. Go uses comments for platform specific compilation and for code generation. Fortran uses them for specifying parallelism through OpenMP. Arguably, this use of comments to provide extensions is a misfeature, and could be avoided by providing proper annotations. It makes it more difficult to write and compose preprocessors and other tools, because these tools must write comment parsers.

//go:generate stringer -type=Pill
package painkiller

type Pill int

const (
    Placebo Pill = iota
    Aspirin
    Ibuprofen
    Paracetamol
    Acetaminophen = Paracetamol
)

Design considerations

Annotations are a powerful language feature, and they have both pros and cons from a design perspective.

  1. They significantly lower the difficulty of writing a language extension, because all language extensions can share the parser of the host language.
  2. It's easier for other tooling (IDEs, syntax highlighting, syntax based navigation) to provide a reasonable experience in the presence of annotations than for more expressive extensions like macros, because the extensions are part of the language syntax.
  3. Language extensions provided through annotations are limited in their ability to blend in with the host language. They're relatively verbose, and they are limited to annotating existing constructs.

Footnotes:

1

See here.

Last updated: 2023-02-23 Thu 10:16