[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: <address@hidden>*Subject*: RE: dynamic vs. static typing*From*: Shriram Krishnamurthi <address@hidden>*Date*: Sun, 30 Nov 2003 23:40:05 -0500*In-reply-to*: <016001c3b441$0eacdeb0$0a00a8c0@femto.appsolutions.com>*References*: <brqz893m.fsf@ccs.neu.edu><016001c3b441$0eacdeb0$0a00a8c0@femto.appsolutions.com>*Sender*: address@hidden

Anton van Straaten wrote: > Perhaps this would be a good time for Shriram to expand on an earlier > comment he made (on Tue 11/18/2003 at 4:58 PM): > > > type systems that work by inference, rather than annotation > > checking, often do not explicitly "check". This phenomenon > > is why they sometimes provide incomprehensible type errors. > > (I can explain this in a bit more detail if anyone cares.) > > I understand the general principle that the more abstract a piece of code > is, the harder it becomes to produce errors (and other information) that are > meaningful to the domain to which the code is being applied. Is there > anything more specific at work in the type inferencing case that makes more > meaningful errors more difficult (or impossible)? I'm afraid I don't understand Dan Sugalski's response to this, so here's my version of what I meant. For background, it helps to understand what a type system is and how they work in principle. Rather than repeat myself, I'll point you to the section from my textbook that introduces semantics and types: http://www.cs.brown.edu/courses/cs173/2003/Textbook/2003-10-27.pdf In particular, read section 3 starting on page 7. The third example is the most important, because it illustrates what it means to have a type error. What this example demonstrates is that a type system really needs to be a collection of three things: 1. a grammar of types 2. a set of type judgments 3. an algorithm that ascribes types (1) to programs in accordance with the judgments (2) Lacking an algorithm, we're forced to resort to meta-mathematical arguments, as example 3 illustrates. Fortunately, for the type system presented in that example, one with explicit type declaration and monomorphism, there is an easy algorithm at hand. A simple top-down algorithm suffices to ascribe types. As far as type errors go, explaining a type error generally means describing two things: 1. what went wrong, and 2. where it went wrong. Observe that the latter usually implies the former but not vice versa. We will return to this later. In the type language of that section of my text, both of these are easy to explain because of explicit type annotations. For instance, if you invoke a function expecting type A with a value of type B, it's clearly the fault of the invocation. If, however, having consumed an A, the function treats it like a B, then it is the function's fault (eg, A = boolean, B = number, and the function passes the value of type A to an arithmetic primitive like +). Now let's shift to a language with type inference (and polymorphism, though that's not necessary). Again, the formal description of the type system is given in terms of type judgments. When the program can successfully be ascribed a type, no problem. When it can't, we are again in the realm of trying to reason about type judgment trees. The situation now is much more complex. The Hindley-Milner system works by essentially deriving a bag of constraints from the program and solving them through unification: for details, see http://www.cs.brown.edu/courses/cs173/2003/Textbook/2003-11-05.pdf http://www.cs.brown.edu/courses/cs173/2003/Textbook/2003-11-10.pdf http://www.cs.brown.edu/courses/cs173/2003/Textbook/2003-11-12.pdf Now, what's an error? This is crucial: *an error results when the unification process is unable to resolve all the generated constraints*. That's it. See section 5, starting page 4, of the first of these for an illustration of how this works in practice. This is a bit reminiscent of an error in the case of simple types, where it resulted from being unable to construct a valid type judgment tree. There's a subtle but significant difference, however. There, computationally vague notion of an error was in the formal description of the type system, not in the algorithm. Here, it's resulting from the very algorithm devised to ascribe types. The problem is then to reconstruct the error from (a) this unification failure and (b) this unordered bag of constraints. This has proven to be rather hard for a variety of technical reasons. Recall that a type error explanation means two things: 1. what went wrong, and 2. where it went wrong. At the point of failure, we have very little information to provide. What went wrong is often something simplistic like "couldn't unify number with boolean" (well, duh). Where? Well, who knows? Perhaps it's possible to extract a trace through the constraints by exploiting the equations that have unified thusfar, and so forth. But it's not easy to do this. Indeed, a variety of circumstances---lack of explicit type annotation, unification, etc., and often the combination of these---lead to a few odd phenomena. 1. Suppose you have a function F that is "error-free", i.e., there are arguments---indeed, an infinite number of them---for which F works without getting stuck. Similarly, you might have an argument G that can be supplied to an infinite number of functions. Yet these two can participate in a type error because F was applied to G. In simple cases, this is easy to track (simplest example: F is add1 and G is true). In principle, the type error report should have focused on neither F nor G but on the application. But as the examples grow more complex, the type system might assign fault to something in the internals of F or of G (this latter can happen especially if F is higher-order and G is itself a function). 2. Sometimes, an error can involve a long chain of constraints. This commonly happens when programmers write several layers of abstraction. Each layer of abstraction contributes another set of constraints, with the result that the actual mismatched types are a long ways apart. In these cases, reporting any one point along that chain is virtually useless, but the chain itself can be quite long, even linear in the size of the program! 3. The presence of higher-order control flow makes it hard to understand why some errors occur. Even if the number of program points and constraints involved in an error is small, the *reason* for some of the identities in the constraint set may not be easy to understand: in particular, they may need an understanding of the program's control-flow behavior to see why two expressions unify. 4. In particularly strange circumstances, the error report can appear to run counter to the program's control flow. This happens because the constraints are equalities between terms, and from the perspective of the unification algorithm, it has no reason to favor one direction of the equality over the other. Therefore, a value flowing from A to B can just as easily be seen as a value flowing from B to A. As a result, the error can appear to "backflow" from, say, a function to its (non-higher-order) argument and be reported at a location in the argument, even though visual inspection does not reveal a mistake in the argument and the user knows the argument flows into the function. In general, each of these problems greatly reduces in the presence of explicit type annotations. People who program in ML variants are accustomed to manually debugging type errors by introducing annotations and "pushing the blame". Some recent research has been focusing on algorithmic techniques for automating what programmers have been doing by hand. There have been several papers in ICFP, ESOP and other conferences lately on this and related techniques. Another way to dance around the problem is to replace equational constraints and unification with directed constraints and transitive closure. PLT's MrSpidey did this, and MrFlow continues in this tradition. This approach gets rid of many of the problems ascribed above to traditional type inference, but the resulting system has a very different flavor than Hindley-Milner (and is technically also quite different), so it doesn't appeal to everyone's aesthetic and may have a different set of theoretical properties than those one wants. At Brown, my student Guillaume Marceau has been developing a visual type error inspector within DrScheme that uses overlay arrows on the program to examine type errors. This is an area of current research in my group. Shriram PS: Responding explicitly to something Anton said: please note that explicit annotation does not preclude polymorphism.

**Follow-Ups**:**Re: dynamic vs. static typing***From:*Neelakantan Krishnaswami <neelk@gs3106.sp.cs.cmu.edu>

**References**:**Re: dynamic vs. static typing***From:*Joe Marshall <jrm@ccs.neu.edu>

**RE: dynamic vs. static typing***From:*"Anton van Straaten" <anton@appsolutions.com>

- Prev by Date:
**Re: dynamic vs. static typing** - Next by Date:
**Re: the forward method [dynamic vs. static typing]** - Previous by thread:
**RE: dynamic vs. static typing** - Next by thread:
**Re: dynamic vs. static typing** - Index(es):