Recitation 20

Today: Type Checking

Optimizing: A Discussion of Pre-execution Analysis, Constant folding, etc
Type Checking
-- Why? Rationale
-- What? Form of types
-- How? Mechanism of a type checker

Why do Type Checking?

If type errors are allowed to occur at runtime and are not detected, the program may exhibit very strange behaviour. Some memory might be overwritten in a way that does not cause an immediate error, but results in a hard-to-trace error later. This makes bugs hard to track down, and makes programs unreliable and dangerous.

Even better than catching type errors at runtime (like Scheme does) is to catch them before the program is ever run. This is called static type checking. It has been done for a long time (eg in Algol and Pascal) but only recently became popular in industrial settings with the advent of Java.

The basic principle is that a tool called a type checker is run on the program before execution. If the type checker finds no errors, the program is guaranteed to run without any type errors happening during execution. This is called soundness.

Unfortunately, type checker aren't complete: they can't show that every type-safe program will execute OK. So there's a tradeoff. Statically typed languages must to some extent rule out programs that would otherwise be legal. Most of these are programs that you wouldn't want to write anyway (eg, the type error is in dead code). How many reasonable programs are ruled out depends on how sophisticated the type checker is. Research in type checking focuses on trying to allow more programs while still preventing errors. The checker you'll work with in the problem set is very limited and rules out lots of reasonable programs. Java's type checker rules out very few.

We discussed a variety of expressions:
-- Expressions that would fail that are caught :-)
(if #t (+ 1 #f) 3)
-- Expressions that would fail that aren't caught :-(
(if #t (/ 1 0) 3)
-- Expressions that wouldn't fail but are stil rejected :-(
(if #t 1 (+ 1 #f))
(if #t 2 #f)
(cons 1 2)
(cons #t nil)


Recall the basic notions of typing that we've discussed before:

Primitive Types
-- Symbol
-- Number
-- String
-- Boolean

Type constructors
-- Procedure arrow
-- List
-- Pair

Polymorphic Types

Polymorphic means "many shapes". A polymorphic procedure can handle arguments of many types. For example,

(lambda (a) a)

Don't confuse with overloading! In many languages + can be used on integers and strings, but these are really different procedures.

In the problem set type checker there's no polymorphism. It's much harder to deal with. So all our lists will be ists of numbers, and cons will have the type:

    cons : Number, List<Number> -> List<Number>

rather than

    cons : Number, List<A> -> List<A>

Opaque Types

A type checker can prevent violation of a data abstraction boundary. Here's the key idea. We associate two types with an ADT: the abstract type and the rep type. When analyzing the code of the ADT itself, these are known to be equal. When analyzing clients of the ADT, there is no relationship between them, and the abstract type is treated like a primitive type -- as opaque.

Draw a picture illustrating this.

How To Build a Type Checker

For each kind of expression, the type checker uses a rule.

For combinations:
-- find the type of the operator; say it's A -> B
-- find the type of the operand; it'll have to be A
-- the expression as a whole has type B

-- (+ 1 (+ 2 3))

The whole strategy is recursive
-- to find the type of an expression, find the types of its parts
-- put them together using the typing rules

What about an if-statement?
-- predicate must have type Bool
-- each branch must have some type T
-- then expression has type T

-- (if (< 3 4) 'a 'b)
-- (if (< 3 'a) 'a 'b)
-- (if #t 'a 3)

But how do we get going?
-- need to know types of primitives, eg +
-- have a type environment that binds names to types

What about lambda?
-- to make life easier, we'll change the syntax
-- programmer must tell us the types:

PARAM has form (NAME TYPE)

for example: (lambda number ((x number) (y number)) (+ x y))

Now the rule is
-- find the type of the body, using the types of the params
-- check that it's the result type
-- type of expression is argtypes -> result type

-- (lambda number ((x number)) x)
-- (lambda (listof number) ((x number)) (cons x nil))
-- (lambda (-> (number number) number) () +)

How do we "use the types of the params"?
-- binds names to types
-- at any point, we have a TE; start of with a global TE
-- to analyze body of lambda, extend TE with bindings of params

-- (lambda boolean ((x number)) (lambda boolean ((y number)) (< x y)) 3))

Type checking is a strange form of execution!
-- binds names to types, not values
-- "executes" all paths: eg, both branches of an if-statement
-- always terminates

A puzzle
-- Why does the type checker not need to form double bubbles?
-- how is lexical scoping reflected?

What about defines?
-- must declare type of variable
-- allows recursive definitions

What about abstract types?
-- problem set has clever way to handle
-- involves representing TE with two tables

Daniel Jackson
November 11, 1999