6.S050 Problem Set 1—Syntax

Released: Thu Feb 9 2023

Due: Thu Feb 23 2023 @ 11:59pm

Collaboration policy

Before starting this problem set, please review the collaboration policy.

Problem 1: Syntax design analysis (25 points)

For each of the following examples of syntax design, write a short (3-5 sentence) response explaining the advantages and disadvantages of the design choice. Frame your critique in terms of the syntax design principles that we discussed in lecture (consistency, compositionality, concision).

In Python, a <= b and b < c can be written a <= b < c.
In Ruby, function calls can be performed either with parentheses:
```
foo(1, "test")
```
or without:
```
foo 1, "test"
```
Functions that take no arguments may be called as noargs or noargs().
In Python (and some other languages, e.g. Haskell) list comprehensions allow for the concise construction of lists:
```
[x * 2 for x in l if x > 0]
```
Comprehensions also work for dictionaries and generators.
In C, switch statement cases implicitly fall through, leading to the following behavior:
```
int input = 2;

switch(input) {
case 2:
  printf("got 2!\n");
case 1:
  printf("got 1!\n");
default:
  printf("got something else!\n");
}
```
```
got 2!
got 1!
got something else!
```
Pascal, a language that was designed at approximately the same time, offers a case statement that behaves differently. In Pascal, at most one arm of the case will be executed. If none of the cases match, then control resumes after the case statement.
```
input := 2;
case input of
  2: writeln('got 2!\n');
  1: writeln('got 1!\n');
else
   writeln('got something else!\n');
end;
```
```
got 2!
```
Discuss the design of the C switch statement. Is it more expressive than the Pascal version? Is there a potential for programmers to introduce errors, particularly when moving from Pascal to C?
In C#, which was designed well after C but takes some inspiration from it, switch statement cases do not implicitly fall through, but the programmer is required to end every case with break anyway.

Problem 2: Domain-specific syntax design (35 points)

In this problem, you'll be designing a query language for an object language similar to JSON. We'll assume that objects follow this grammar:

\begin{align*} \newcommand{\nonterm}[1]{\texttt{<#1>}} \nonterm{obj} &::=\ \nonterm{num} ~|~ \nonterm{str} ~|~ \nonterm{dict} ~|~ \nonterm{list} \\ \nonterm{num} &::=\ -?\ [0-9]+\ (.\ [0-9]+)?\ \\ \nonterm{str} &::=\ "\ [\text{^}"]*\ " \\ \nonterm{dict} &::=\ \{ \nonterm{key-values} \} \\ \nonterm{key-values} &::=\ \nonterm{str}\ :\ \nonterm{obj} ~|~ \nonterm{str}\ :\ \nonterm{obj}\ ,\ \nonterm{key-values} \\ \nonterm{list} &::=\ [\ \nonterm{values}\ ] \\ \nonterm{values} &::=\ \nonterm{obj} ~|~ \nonterm{obj}\ ,\ \nonterm{values} \end{align*}

Like JSON, our language has lists, dictionaries, strings, and numbers. We use regular expression notation in the grammar to specify that:

Numbers consist of some decimal digits, followed by an optional decimal point and more digits and preceded by an optional negative sign.
Strings consist of alphanumeric/ characters between double quotes.

Your job is to propose a query language for these objects. Your query language should support each of the following queries, and it should follow the syntax design principles discussed in lecture.

We're only considering syntax in this problem, but you should have a rough idea of the semantics. We'll assume that queries take an object as input and return an object.

Given an object like the following, compute the total price (i.e. price * (1 + tax rate)).
```
{
  "model": "Macbook Pro",
  "year": 2022,
  "price": 1500,
  "tax-rate": 0.065
}
```
This query requires your language to include the following operations:
- Get a field from a dictionary.
- Perform arithmetic. As you are designing the arithmetic syntax, pay attention to operator precedence and ambiguity.

Given a list of apartment objects like the following, return the objects that have at least two bedrooms, sorted by rent from lowest to highest.

[
  {
    "address": "55 Somewhere Rd. #2",
    "rent": 1500,
    "rooms": [
      {
	"kind": "bedroom",
	"area": 400
      },
      {
	"kind": "bathroom",
	"area": 100
      }
    ]
  },
  {
    "address": "1 Nowhere Pl.",
    "rent": 1000,
    "rooms": [
      {
	"kind": "kitchen",
	"area": 150
      },
      {
	"kind": "bedroom",
	"area": 100
      }
    ]
  }
]

This query might require your language to be able to:

Find the objects in a list that have some property.
Determine the length of a list.
Sort a list of objects according to some property.

Using the apartment data, return the addresses of apartments that have below-average rent.

This query might require your language to be able to:
- Store the result of a query and refer to it later.
- Compute the sum and/or average of a list of numbers.
- Perform an operation on every element of a list.

If you need inspiration for your language design, you can look at Jq (a real-world JSON query language) or at SQL (you might find the way it expresses grouping and aggregation useful). If you draw ideas from these or other languages, be sure to cite your sources.

Deliverables

Give a grammar for your language in Backus-Naur form (BNF). BNF is a notation for communicating syntax to other people, not machines, so strive for clarity over formal precision. Your grammar should not be ambiguous, and you should avoid left-recursion.

You will be implementing a parser for your syntax in the next problem. You should expect to revise your language as you work on your parser, and you discover what is and isn't easy to parse.
For each of the example queries, rewrite the query into your language. Give a brief explanation of how your syntax corresponds to the query specification.
Give 4-5 sentence justification of your design choices. A few questions that you can consider:
- Did you choose to trade clarity for concision? If so, why? Does your language make common patterns concise?
- Did you use ideas from other query languages (this might increase consistency), or did you choose novel syntax?

Write your response to this problem in a file called problem2.pdf. You do not need to spend time using LaTeX to format your grammar or queries, but please do type your response and ensure that it is clear.

Important: Your response to this question will receive peer feedback, and submissions are double-blind, so do not write your name on your response.

We think that giving and receiving feedback on your designs will improve them and sharpen your design sense.

You will be turning in feedback on another student's submission as a part of problem set 2. We will include instructions for writing feedback in the problem set 2 handout.

Problem 3: Implementing a parser (40 points)

In this problem, you'll be implementing a recursive-descent parser for the language that you specified in problem 2.

You have quite a bit of freedom in how you implement this parser, but this is how we would suggest you proceed:

Write down the abstract syntax of your language. We recommend using Python's dataclasses to define the AST. If dataclasses are new to you, check out our tutorial.
Write a tokenizer. The tokenizer should be a function that takes a string as input and produces a list of tokens. A token is an individual unit of syntax, like an identifier, parenthesis, or number. We've provided our tokenizer for reference. You're welcome to reuse or modify that code in any way you like.
Eliminate left-recursion from your grammar (if necessary).
Write a recursive-descent parser. Your parser should have one function or method for each nonterminal in the grammar. Calling a method should parse the corresponding nonterminal, removing the parsed tokens from the list.

You can download our code for parsing Imp here, and we've written a short tutorial/explanation that walks through the code.

Deliverables

Write your parser in a file called parser.py. You are welcome to reuse or adapt any of the provided code.

You must include one test for each of the queries in problem 2 that shows that your parser can correctly construct an AST from an input string. Tests should follow the Pytest convention; each test should be a function whose name starts with test. Refer to the tests in imp_parser.py for examples.

You may include any other tests that you like, but you will only be graded on whether your parser can correctly parse the three example queries.

Resources

You might find the following resources helpful as you work on your parser:

This chapter on parsing using recursive descent: https://craftinginterpreters.com/parsing-expressions.html
This tool for evaluating grammars in BNF: https://bnfplayground.pauliankline.com/

Collaboration policy#

Problem 1: Syntax design analysis (25 points)#

Problem 2: Domain-specific syntax design (35 points)#

Deliverables#

Problem 3: Implementing a parser (40 points)#

Deliverables#

Resources#

Collaboration policy

Problem 1: Syntax design analysis (25 points)

Problem 2: Domain-specific syntax design (35 points)

Deliverables

Problem 3: Implementing a parser (40 points)

Deliverables

Resources