Modularity & Objects

\[ \newcommand{\target}[2]{\class{mathjax-symbol mathjax-symbol-#1}{#2}} \newcommand{\type}[3]{#1 \target{type}{\vdash} #2 \target{type}{:} #3} \newcommand{\abs}[3]{\target{abs}{\lambda} #1 \target{abs}{:} #2\target{abs}{.} #3} \newcommand{\tabs}[2]{\target{tabs}{\Lambda} #1 \target{tabs}{.} #2} \newcommand{\arrow}{\target{arrowT}{\rightarrow}} \newcommand{\subt}{\target{subT}{<:}} \newcommand{\Int}{\mathsf{Int}} \newcommand{\Bool}{\mathsf{Bool}} \newcommand{\lett}[3]{\mathsf{let}\ \mathit{#1} : #2 = #3\ \mathsf{in}} \newcommand{\letu}[2]{\mathsf{let}\ \mathit{#1} = #2\ \mathsf{in}\ } \newcommand{\ite}[3]{\mathsf{if}\ #1\ \mathsf{then}\ #2\ \mathsf{else}\ #3} \DeclareMathOperator{\proj}{.} \newcommand{\object}[1]{\mathsf{object}\ #1\ \mathsf{end}} \newcommand{\call}[2]{#1 \# \mathit{#2}} \newcommand{\method}[2]{\mathsf{method}\ \mathit{#1} = #2} \newcommand{\self}{\mathit{self}} \newcommand{\super}{\mathit{super}} \newcommand{\refc}{\mathsf{ref}\ } \newcommand{\inherit}[2]{\mathsf{inherit}\ #1\ \mathsf{as}\ \mathit{#2}} \newcommand{\pack}[3]{\mathsf{pack}\ (#1, #2)\ \mathsf{as}\ #3} \newcommand{\unpack}[4]{\mathsf{unpack}\ (#1, #2) = #3\ \mathsf{in}\ #4} \newcommand{\override}{\ \mathsf{override}\ } \newcommand{\loc}[1]{l_{#1}} \]

What is modularity?

what is modularity in software development?
- recursively breaking a large program into small pieces that can be understood separately
two important questions:
- how should we break software into reusable pieces/modules?
  - what mechanisms of division/composition should we provide? how granular should those mechanisms be?
    - objects, ml style modules, type classes, packages, namespaces, procedures, functions, etc.
- what can we do to help programmers understand their modules / write understandable modules?
  - modules should encapsulate complexity—their interfaces should be simpler than their implementations
  - how can we help programmers encapsulate?
    - interfaces, abstract types, information hiding, existential types

Records as a unit of modularity

we've already seen a simple modularity feature: records
so far we've treated records as a plain-old-data type, but if we put functions in them, we get something that looks a bit like an object or package
we can use records to structure our libraries into packages, and we can even pass them to functions or create them dynamically
here's an example in Python
- (Python has much better ways to do this, but this example only uses functions and records)

def sum_of_squares(num, a, b):
    return num['add'](num['mul'](a, a), num['mul'](b, b))

def sum_of_squares(num, a, b):
    return num['add'](num['mul'](a, a), num['mul'](b, b))

integer = {
    'zero': 0,
    'add': lambda x, y: x + y,
    'mul': lambda x, y: x * y
}

print(sum_of_squares(integer, 3, 4))

we can define a new record for rational numbers

def sum_of_squares(num, a, b):
    return num['add'](num['mul'](a, a), num['mul'](b, b))

def gcd(a, b):
    return a if b == 0 else gcd(b, a % b)

def rat(a, b):
    c = gcd(a, b)
    return (a // c, b // c)

rational = {
    'zero': (0, 1),
    'add': lambda x, y: rat(x[0] * y[1] + y[0] * x[1], x[1] * y[1]),
    'mul': lambda x, y: rat(x[0] * y[0], x[1] * y[1])
}

print(sum_of_squares(rational, (3, 1), (4, 1)))

we can also create modules out of other modules
this example uses a numeric module to define matrices
- (matrices are also numeric modules, so we can even create blocked matrices)

integer = {
    'zero': 0,
    'add': lambda x, y: x + y,
    'mul': lambda x, y: x * y
}

def square_matrix(num, n):
    def sum(l):
	s = num['zero']
	for x in l:
	    s = num['add'](s, x)
	return s

    def add(m1, m2):
	return [[num['add'](v1, v2) \
		 for (v1, v2) in zip(r1, r2)] \
		for (r1, r2) in zip(m1, m2)]

    def mul(m1, m2):
	return [[sum([num['mul'](m1[i][k], m2[k][j]) \
		     for k in range(n)]) \
		for j in range(n)] \
		for i in range(n)]

    return {
	'zero' : [[num['zero']] * n] * n,
	'add' : add,
	'mul' : mul
    }

print(square_matrix(integer, 3)['zero'])

m = [[1, 2], [3, 4]]
print(square_matrix(integer, 2)['mul'](m, m))

From records to objects

what is an object?
- a package of methods and internal state
- methods manipulate the state and provide an interface with the rest of the program
- objects may be composed out of other objects or inherit behavior from other objects
what differentiates records from objects?
- lots of things—objects are conflated with lots of language features
- self-reference is a big aspect
- objects have both functions (methods) and internal state

Extending `FL` with objects

We'll extend FL with a simple object system. We add objects to the syntax as follows (new constructs are boxed):

\begin{align*} \oplus ::=&\quad = ~|~ < ~|~ > ~|~ + ~|~ - ~|~ * ~|~ / ~|~ \mathsf{and} ~|~ \mathsf{or} ~|~ ** && \text{operators} \\ e ::=&\quad x && \text{variable} \\ &|\quad n &&\text{integer} \\ &|\quad b &&\text{boolean} \\ &|\quad e\ e &&\text{application} \\ &|\quad \lambda x. e &&\text{function abstraction} \\ &|\quad \mathsf{if}\ e\ \mathsf{then}\ e\ \mathsf{else}\ e &&\text{conditional} \\ &|\quad e \oplus e &&\text{operator application} \\ &|\quad \{ {x_{i} = e_{i}}^{i \in 1 \dots n} \}&&\text{record} \\ &|\quad e.x&&\text{field access} \\ &|\quad \mathsf{let}\ x = e\ \mathsf{in}\ e &&\text{let binding}\\ &|\quad \boxed{e\ \mathsf{override}\ e} &&\text{record field override}\\ &|\quad \boxed{\object{(\inherit{e}{x})?\ m_{1} \dots m_{k}}}&&\text{object} \\ &|\quad \boxed{\call{e}{x}} &&\text{method call} \\ m ::= &\quad \boxed{\method{x}{e}} &&\text{method definition} \end{align*}

We add object literals with the object ... end syntax and method calls. Objects are quite similar in appearance to records, and we'll see that we can describe the semantics of objects with a lightweight desugaring into records. In particular, the object literal syntax will desugar into record literals and method calls will desugar into record field accesses.

Here's an example program that creates a 2D point object for the point \((1, 2)\).

The dist method returns the squared Euclidean distance between this point and another. The semantics of self are similar to Python—each method takes self as its first parameter, and self is bound to the object. These objects only have methods—no fields—but that won't turn out to be a major limitation. The methods of the current object can be accessed using the # syntax on self.

We illustrate the semantics of objects by desugaring this example into a regular record:

\begin{align*} &\mathit{point} = \{\\ &\quad x = \lambda \self. 1,\\ &\quad y = \lambda \self. 2,\\ &\quad \mathit{dist} = \lambda \self. \lambda p. {((\self.x\ \self) - (p.x\ p))}^{2} + {((\self.y\ \self) - (p.y\ p))}^{2} \\ &\} \end{align*}

In this desugaring:

An object is just a record where each field in the record is a function that takes the object instance as its first parameter.
Method calls are syntactic sugar for field access, as follows: \[ \call{x}{f} \rightsquigarrow x \proj f\ x \]
Self-reference is facilitated by passing the object itself as the first parameter to every method call.

Creating new Objects

what if we want to change the object?
- return a new object that reflects the changes
- (we can also add & use ref cells, but it's useful to think about the pure functional case first)
we need a new record primitive
- we'll need it later for inheritance anyway
\(e\ \mathsf{override}\ e'\) — if e and e' both produce a record, then the overridden record prefers the fields in e to the fields in e'
using override, we can write a method that returns a new object, modifying some of its fields

\begin{align*} &\mathit{point} = \object{\\ &\quad\method{x}{\lambda \self. 1}\\ &\quad\method{y}{\lambda \self. 2}\\ &\quad\method{dist}{\lambda \self. \lambda p. {(\call{\self}{x} - \call{p}{x})}^{2} + {(\call{\self}{y} - \call{p}{y})}^{2}} \\ &\quad\method{shift}{\lambda \self. \lambda dx. \lambda dy. \{ x = \lambda \_. \self\#x + dx, y = \lambda \_. \self\#y + dy \}\ \mathsf{override}\ \self}\\ &} \end{align*}

We make override more object-flavored by calling it inherit and including it in the object syntax.

\begin{align*} &\object{\inherit{e}{\super}\ \method{x_{1}}{e_{1}} \dots \method{x_{k}}{e_{k}}} \rightsquigarrow \\ &\quad\letu{super}{e}\{x_{1} = e_{1}, \dots, x_{k} = e_{k}\} \override \super \end{align*}

Using this syntax, we can rewrite the previous program as:

\begin{align*} &\mathsf{let}\ point = \mathsf{object}\\ &\quad\mathsf{method}\ x = \lambda \self. 1\\ &\quad\mathsf{method}\ y = \lambda \self. 2\\ &\quad\mathsf{method}\ dist = \lambda \self. \lambda p. {(\self\#x - p\#x)}^{2} + {(\self\#y - p\#y)}^{2} \\ &\quad\mathsf{method}\ shift = \begin{aligned}[t] &\lambda \self. \lambda dx. \lambda dy.\\ &\quad\mathsf{let}\ x' = \self\#x + dx\ \mathsf{in}\\ &\quad\mathsf{let}\ y' = \self\#y + dy\ \mathsf{in}\\ &\quad\begin{aligned}[t] &\mathsf{object}\\ &\quad\mathsf{inherit}\ \self\\ &\quad x = \lambda \self. x'\\ &\quad y = \lambda \self. y'\\ &\mathsf{end} \end{aligned}\\ \end{aligned}\\ &\mathsf{end} \end{align*}

Python implementation:

def call(obj, method, *args):
    return obj[method](obj, *args)

def override(r1, r2):
    r3 = r2.copy()
    r3.update(r1)
    return r3

point = {
    'x': lambda self: 0,
    'y': lambda self: 1,
    'shift': lambda self, dx, dy: override({ 'x': lambda _: call(self, 'x') + dx,
					'y': lambda _: call(self, 'y') + dy },
				      self),
    'dist': lambda self, p: (call(self, 'x') - call(p, 'x')) ** 2 + (call(self, 'y') - call(p, 'y')) ** 2
}

print(call(point, 'dist', call(point, 'shift', 4, 5)))

Open recursion

we described the 'self' parameter as a way to access the methods of the current object, but that undersells its power
we can modify methods that are defined in a superclass and have the modifications used by other methods that we didn't touch

def call(obj, method, *args):
    return obj[method](obj, *args)

def override(r1, r2):
    r3 = r2.copy()
    r3.update(r1)
    return r3

point = {
    'x': lambda self: 0,
    'y': lambda self: 1,
    'shift': lambda self, dx, dy: override({ 'x': lambda _: call(self, 'x') + dx,
					'y': lambda _: call(self, 'y') + dy },
				      self),
    'dist': lambda self, p: (call(self, 'x') - call(p, 'x')) ** 2 + (call(self, 'y') - call(p, 'y')) ** 2
}

x_counter = 0
def access_x():
    global x_counter
    x_counter += 1
    return 0

access_point = override({ 'x': lambda self: access_x() }, point)

call(access_point, 'dist', call(access_point, 'shift', 4, 5))
print(x_counter)

Inheritance

a key feature of object-oriented languages is the ability to share code between different objects
- programming by differences—describe one class of objects in terms of the difference from another class
what does inheritance look like in real languages?
- two broad categories
- class-based inheritance
  - so far we've only looked at the behavior of bare objects
  - most oop languages actually don't have bare objects like this
  - instead, objects are instances of a class of objects
    - the class determines which methods the individual objects can have
    - inheritance operates on a class level
  - in our setting, classes are really just templates for generating objects—i.e. functions that return objects
- prototype inheritance
  - create new objects by copying and modifying an existing object
  - this is essentially how our encoding already works
  - we can write "constructor functions" that work a bit like a class, but we can also inherit directly from other objects
- why is class-based inheritance used at all?
  - this record override construct is actually quite difficult to implement efficiently
  - we can end up with objects that have a wide variety of shapes at runtime
  - looking up a method in an object of arbitrary shape might require a hash table or similar
  - if we know the shape of the object, we can just look at a specific offset
    - but this gets complicated with inheritance, so there still ends up being a hashtable lookup or similar to deal with looking up methods on parent objects

This roughly encodes a 2D point class:

\begin{align*} &\mathsf{let}\ \mathit{mkpoint} = \lambda x. \lambda y. \mathsf{object}\\ &\quad\mathsf{method}\ x = \lambda \self. x\\ &\quad\mathsf{method}\ y = \lambda \self. y\\ &\quad\mathsf{method}\ dist = \lambda \self. \lambda p. {(\self\#x - p\#x)}^{2} + {(\self\#y - p\#y)}^{2} \\ &\quad\mathsf{method}\ shift = \begin{aligned}[t] &\lambda \self. \lambda dx. \lambda dy.\\ &\quad\mathsf{let}\ x' = \self\#x + dx\ \mathsf{in}\\ &\quad\mathsf{let}\ y' = \self\#y + dy\ \mathsf{in}\\ &\quad\begin{aligned}[t] &\mathsf{object}\\ &\quad\mathsf{inherit}\ \self\\ &\quad x = \lambda \self. x'\\ &\quad y = \lambda \self. y'\\ &\mathsf{end} \end{aligned}\\ \end{aligned}\\ &\mathsf{end} \end{align*}

All it does is create point objects given some \(x, y\). Inheritance in this encoding looks like a function call:

\begin{align*} &\mathsf{let}\ \mathit{mksizedpoint} = \lambda x. \lambda y. \lambda \mathit{size}. \mathsf{object}\\ &\quad\mathsf{inherit}\ \mathit{mkpoint}\ x\ y\\ &\quad\mathsf{method}\ \mathit{size} = \lambda \self. \mathit{size}\\ &\mathsf{end} \end{align*}

Formalization

We formalize the semantics of objects by giving a desugaring transformation \(\rightsquigarrow\) from objects to records. \[ \object{\method{x_{1}}{e_{1}} \dots \method{x_{k}}{e_{k}}} \rightsquigarrow \{x_{1} = e_{1}, \dots, x_{k} = e_{k}\} \]

Method calls: \[ \call{x}{f} \rightsquigarrow x \proj f\ x \] If the left-hand-side of a method call is not an identifier, we need to evaluate it; we introduce a let so that the evaluation only happens once. \[ \call{e}{f} \rightsquigarrow \letu{obj}{e}\ \mathit{obj} \proj f\ \mathit{obj} \]

Beyond objects

objects and object-oriented programming is hugely popular, so attracts many complaints
efficiency
- the encoding we discuss is not very efficient
- there are better ways to encode objects, but some level of inefficiency always remains
  - dynamic dispatch is just fundamentally more expensive than static dispatch
complexity
- we didn't discuss type systems for objects, but they tend to be complex
- type systems on a simplicity <-> complexity, inexpressive <-> expressive axis
- type systems for objects need to combine subtyping & parametric polymorphism
  - type inference becomes a real problem
expressiveness
- objects package together methods that work on a single type
- difficult to describe interfaces that relate multiple types
- many oo languages have added generics, which adds much needed expressiveness

References

These notes are based on material from section 7.3 of "Design Concepts in Programming Languages".

What is modularity?#

Records as a unit of modularity#

From records to objects#

Extending FL with objects#

Creating new Objects#

Open recursion#

Inheritance#

Formalization#

Beyond objects#

References#