6.S050: Syntax Continued

Created: 2023-05-11 Thu 11:06

Precedence (recap)

<expr> ::= <expr> "-" <expr> | <expr1>
<expr1> ::= <expr1> "*" <expr1> | <expr2>
<expr2> ::= <id> | "(" <expr> ")"
  • Forbids (x - y) * z
  • Still allows x - (y - z)

Associativity (recap)

<expr> ::= <expr> "-" <expr1> | <expr1>
<expr1> ::= <expr2> "*" <expr1> | <expr2>
<expr2> ::= <id> | "(" <expr> ")"
  • Forbids both (x - y) * z and x - (y - z)

Exercise

Extend the expression grammar with exponentiation "^" and unary negation "~" operators.

<expr> ::= <expr> "-" <expr1> | <expr1>
<expr1> ::= <expr2> "*" <expr1> | <expr2>
<expr2> ::= <id> | "(" <expr> ")"
  • Precedence: parens/ids, "~", "^", "*", "-"
  • "^" is right-associative so a^b^c*d should parse as (a^(b^c))*d

Solution

<expr> ::= <expr> "-" <expr1> | <expr1>
<expr1> ::= <expr2> "*" <expr1> | <expr2>
<expr2> ::= <expr3> "^" <expr2> | <expr3>
<expr3> ::= "~" <expr4> | <expr4>
<expr4> ::= <id> | "(" <expr> ")"

Dangling else

Many languages have an if-then-else construct with an optional else clause, e.g.:

if A then B else C
if A then B

Dangling else

How should we read the following code?

if A then if B then C else D
if A then (if B then C) else D

or

if A then (if B then C else D)

Avoiding dangling else

  • Add another rule to determine which parse is chosen
    • For example, else is attached to closest if
    • C, Java, Ocaml
  • Require braces around blocks

Ambiguity in C

  • There are real languages with ambiguous grammars
  • C example
    • 1 - 2
    • What about (int) -2?
    • Or (x) -2?
      • Is this a cast? Or subtraction?

Concrete & Abstract Syntax

concrete.svg

<expr> ::= <expr> "-" <expr1> | <expr1>
<expr1> ::= <expr1> "*" <expr2> | <expr2>
<expr2> ::= <id> | "(" <expr> ")"

Concrete & Abstract Syntax

abstract.svg

<expr> ::= <id> | <expr> ("-" | "*") <expr>

Abstract Syntax in Python

from dataclasses import dataclass

@dataclass
class Expr:
    pass

@dataclass
class Id(Expr):
    name: str

@dataclass
class Binop(Expr):
    op: str
    lhs: Expr
    rhs: Expr

ast = Binop('-', Binop('-', Id('x'), Id('y')), Binop('*', Id('z'), Id('z')))

Dataclasses?

  • New(ish) feature in Python 3.7
  • Automatically generated
    • Constructors
    • Value equality & comparison
    • Pretty printing

Parsing

  • How do we get from syntax to abstract syntax?
  • Lots of options
    • Parser generators (Antlr, Bison, etc.)
    • Recursive descent

Recursive-descent

  • Pros:
    • Used by many real compilers (e.g. Clang)
    • Relatively simple to write
    • Often easier to handle syntax errors by hand
  • Cons:
    • Strict recursive-descent has issues with left-recursion

Parts of a Recursive-descent Parser

  1. List or stream of tokens.
  2. One function per grammar rule.
    • Responsible for parsing anything that matches that rule.

Recursive-descent Example

class Parser:
    def expr(self):
	# Parse <expr> ::= <expr> "-" <expr1> | <expr1>
	pass

    def expr1(self):
	# Parse <expr1> ::= <expr2> "*" <expr1> | <expr2>
	pass

    def expr2(self):
	# Parse <expr2> ::= <id> | "(" <expr> ")"
	pass

Recursive-descent Example

class Parser:
    def expr2(self):
	# Parse <expr2> ::= <id> | "(" <expr> ")"
	if next_token_is_id():
	    return Id(token)
	elif next_token_is_lparen():
	    # Parse "(" <expr> ")"
	    pass
	else:
	    error()

Recursive-descent Example

class Parser:
    def expr2(self):
	# Parse <expr2> ::= <id> | "(" <expr> ")"
	if next_token_is_id():
	    return Id(token)
	elif next_token_is_lparen():
	    expr = self.expr()
	    assert_next_token_is_rparen()
	    return expr
	else:
	    error()

Recursive-descent Example

class Parser:
    def expr1(self):
	# Parse <expr1> ::= <expr2> "*" <expr1> | <expr2>
	lhs = self.expr2()
	if next_token_is_star():
	    return Binop("*", lhs, self.expr1())
	else:
	    return lhs

Recursive-descent Example

class Parser:
    def expr1(self):
	# Parse <expr1> ::= <expr2> "*" <expr1> | <expr2>
	lhs = self.expr2()
	if next_token_is_star():
	    return Binop("*", lhs, self.expr1())
	else:
	    return lhs
<expr1> ::= <expr2> <expr1-rest>
<expr1-rest> ::= "*" <expr1> | ε

Recursive-descent Example

class Parser:
    def expr(self):
	# Parse <expr> ::= <expr> "-" <expr1> | <expr1>
	lhs = self.expr() # ???

Left-recursion

Direct:

<expr> ::= <expr> "-" <expr1> | <expr1>

Indirect:

<expr1> ::= <expr2> "*" <expr1> | <expr2>
<expr2> ::= <id> | <expr1>

Handling Left-recursion

Convert to iteration:

<expr> ::= <expr1> ("-" <expr1>)*

Recursive-descent Example

class Parser:
    def expr(self):
	# Parse <expr> ::= <expr> ::= <expr1> ("-" <expr1>)*
	exprs = [self.expr1()]
	while next_token_is_dash():
	    exprs.append(self.expr1())

	pass # Process expressions

Processing Expression List

[Id("a")] -> Id("a")

[Id("a"), Id("b")] -> Binop("-", Id("a"), Id("b"))

[Id("a"), Id("b"), Id("c")] -> Binop("-", Binop("-",
						Id("a"),
						Id("b")),
				     Id("c"))

Extensibility

What if you want users to be able to extend your language syntax?

Why Add Extensibility?

  • Allow users to write domain-specific languages
  • Add abstractions that can't be written as functions
  • Easier to integrate other languages (e.g. SQL)

Extension Features

From least to most powerful:

  • User defined operators
  • Annotations
  • Macros

User-defined Operators

  • Extend the language with new infix (or prefix) operators
  • Allows user to make up compact notation for new domains

OCaml

let (+?) x y = ... in
x +? y

Inherits precedence/associativity from +

Haskell

(+?) x y = ...
infixl 5 +?

Precedence/associativity specified directly

Agda

if_then_else_ : {A : Set} → Bool → A → A → A
if true then x else y = x
if false then x else y = y

Design considerations

  • Balance between concision and readability

Annotations

@Override
void myMethod() { ... }

Allow users to attach unstructured data to AST

Java Example

@ToString(includeFieldNames=true)
public static class Square {
    private final int width, height;
}

Java Example

@ToString(includeFieldNames=true)
public static class Square {
    private final int width, height;
}
public static class Square {
    private final int width, height;

    @Override public String toString() {
	return "Square(width=" + this.width +
	    ", height=" + this.height + ")";
    }
}

Magic Comments

//go:generate stringer -type=Pill
package painkiller

type Pill int

const (
    Placebo Pill = iota
    Aspirin
    Ibuprofen
    Paracetamol
    Acetaminophen = Paracetamol
)

Magic Comments

  • If you don't provide annotations, users might find workarounds
    • Example: OpenMP integration in Fortran

Design Considerations

  • Reduces overhead of writing a language extension (i.e. no parser)
  • Other tooling (IDEs, linters, etc) can parse and understand (or at least ignore) annotations
    • Much harder with custom syntax
  • Annotations are verbose
  • Anything is better than magic comments

Macros

  • Enormous design space; far more than we can cover
  • A few key axes:
    • Lexical vs syntactic
    • Hygienic vs unhygienic

Lexical vs Syntactic

  • Lexical: e.g. C preprocessor—can easily generate invalid source
  • Syntactic: e.g. Rust—works on syntax tree; may still generate ill-typed code

Macro Hygiene

  • Example: #define INC(x) { int a = 0; x++; }
  • int a = 1; INC(a);