6.S050: Syntax

Created: 2023-05-11 Thu 11:06

What is syntax?

  • Rules that define which strings correspond to programs.
  • A description of the structure of programs.

What we will cover

  • Existing syntax designs—what options do you have as a designer?
  • How should we specify syntax?
  • Pitfalls to avoid

Syntax design in the wild: Python

def factorial(n):
  if n == 0:
    return 1
  return n * factorial(n - 1)

Syntax design in the wild: C

int factorial(int n) {
  return n == 0 ? 1 : n * factorial(n - 1);
}

Syntax design in the wild

https://tinyurl.com/bde2fzsc

Syntax design principles

  • Consistency
  • Compositionality
  • Concision

Consistency

  • External: when possible, use syntax that users already know
    • Example: use x[0], not x.(0) for indexing strings
    • Copying other languages reduces surprises
  • Internal: similar operations should look similar
    • Example: Ocaml has two assignment operators depending on the type of the left-hand-side: x := 1 and x.f <- 1

Compositionality

  • The meaning of a phrase should be determined by the meaning of the individual parts and the method of composition
    • Frege's principle
  • C++
    • type<arg>
    • type<type<arg>>
    • also have bitwise operator >>
    • so is >> an operator, or is it part of a template?
      • early C++ required a space > > in the template case

Concision

  • Shorter programs are (often) better than longer ones
  • Make common operations easy to write
    • Unwrap operator
    • maybe_fails(34)?
match maybe_fails(34) {
Ok (x) => x,
Err(e) => return Err(e)
}

Why not "Simplicity"

  • Languages should have few "basic concepts"
  • Lisp
    • (let ((a + 1 (* 2 3))) (/ a 2))
    • let a = 1 + 2 * 3; a / 2
  • Not clear that this kind of simplicity is helpful

Design principles in practice

https://tinyurl.com/bde2fzsc

Problem of syntax

Text -> Lexical structure -> Syntactic structure -> Abstract syntax

Concrete & abstract syntax

  • Concrete: reflects the representation of the program as text
  • Abstract: throws away textual details; represents only the program

Kinds of syntax specification

  • Natural language
  • Context-free grammar (variety of notations)
  • Implementation

Grammars by example

Consider a language that has:

  • Functions
  • If-then-else
  • Arithmetic
def factorial(n) {
  if (n == 0) then
    return 1;
  else
    return n * factorial(n - 1);
}

print(factorial(5));

Let's walk through a grammar for this language.

Terminology

  • Identifier: a name defined in the program
  • Keyword: a word used to define program structure
  • Statement: performs an action with a side effect
  • Expression: compute a value

Lexing

  • Earlier we said that languages have lexical and syntactic structure
  • This grammar mixes both
  • Split grammar into
    • Lexical rules
    • Phrase rules
  • Practical benefits
    • Simplicity (our grammar ignores whitespace!)
    • Performance

Pitfalls

Ambiguity

  • Multiple parse trees for a string
  • "John saw the man on the mountain with a telescope."
  • BBC headline: "Knife crime: St John Ambulance to teach teens to help stab victims"
  • Atlantic headline: "Susan Collins Unveils a Gun-Control Compromise: It would restrict sales to individuals on two terrorist watch lists"

    Source

Ambiguity

How should we parse 2 * x + y?

Ambiguity

  • No general strategy for eliminating ambiguity
    • Even detecting it is undecidable
  • Can cover some common cases
    • Precedence
    • Associativity
    • Dangling else

Precedence & associativity

  • Common problem in expression grammars
<expr> ::= <id> | <expr> ("-" | "*") <expr> | "(" <expr> ")"

Precedence & associativity

How to parse x - y - z * z?

Precedence & associativity

ambig.svg

Precedence & associativity

ambig2.svg

Precedence

How can we force x - y - z * z to parse as:

(x - y) - (z * z)

Precedence

<expr> ::= <expr> "-" <expr> | <expr1>
<expr1> ::= <expr1> "*" <expr1> | <expr2>
<expr2> ::= <id> | "(" <expr> ")"

Associativity

<expr> ::= <expr> "-" <expr1> | <expr1>
<expr1> ::= <expr1> "*" <expr2> | <expr2>
<expr2> ::= <id> | "(" <expr> ")"