Bugs as Deviant Buffer: A General Approach to Inferring Errors in Systems Code November 27, 2001 This very recent paper (circa 2001) from a group of researchers at Stanford University presents a new method for detecting and inferring bugs in code. Previous research from this group had established a method for rule-based checking of code -- that is, by specifying a rule (:a: must follow :b:), the program could verify that every occurrence of :a: was preceded by an occurrence of :b:. This paper extends that work by providing a method for inferring these correctness rules from the source code itself, removing programmers' lossy memories from the rule-creation process and by allowing any program to be verified without a priori knowledge of correctness. The main intuition behind this work is that the beliefs of the programmers can be inferred from common coding practices and common sense. For instance, the dereferencing of a pointer implies that the pointer should be non-null. The program derives what it believes are the beliefs of the programmers. If it encounters a contradiction, it suspects that there must be an error and signals this for the user to examine. As the paper emphasises, the key part of this belief is that contradictory beliefs can be found without any knowledge of what the correct belief is. The paper is motivated by the self-professed desire to "find as many serious bugs as possible." Six different checkers of the sort described above are implemented to check for a variety of different types of error classes. These checkers consider two types of programmer beliefs -- MAY beliefs and MUST beliefs. MUST beliefs are fairly cut and dry -- if a programmer dereferences a pointer, he must believe it to be non-null. MAY beliefs are more ambigious -- the fact that foo() always follows bar() implies that the programmer may believe that they go together. MUST beliefs are generally discovered based upon the analysis of . That is, if a belief is inferred at one place in the code, that belief is propagated to all related locations. If two beliefs conflict at any location, that signals a possible bug. MAY beliefs lend themselves to a technique termed statistical analysis. That is, statistical trends throughout the code are derived, and if parts of the code deviate from this trend, it is flagged as a possible error. As one might expect, there needs to be a ranking metric for errors and some threshold cutoff, lest all lines of code be flagged as buggy. They choose a metric that compares the number of examples of a belief to the number of counter-examples. The metric ranks beliefs on the basis of how far they differ from an expected ratio. Again, as to be expected, the size of the codebase is directly proportional to the success of statistical analysis. Four different checkers are examined in depth. The first method checks for internal null consistency. It looks for three types of errors: check-then-use (a null pointer is checked then used), use-then-check (a pointer is used and subsequently checked if it is null), and redundant checks (a pointer is uselessly tested multiple times). The second method, a security checker, looks for places where the kernel may unsafely deference a user pointer. The third method attempts to infer failure conditions by checking for places where error conditions are not appropriately handled. It does this by statistical analysis. The fourth method described attempts to derive temporal rules. These rules are of the form no :a: after :b:, and indicate relative temporal relationships in code. This paper seemed incredibly successful -- one of their goals was to find as many serious bugs as possible, and they proved this by finding a number of bugs in the OpenBSD and Linux kernels. Their work also seemed largely successful in minimizing the false positive rate. For statistical analysis like the type employed in this paper, specifying appropriate metrics and thresholds would seem to be extremely difficult part, but this work succeeded admirably. The fact that this method required no a priori of the semantics of this system made it brilliant. Most user-specifiable rules about correctness seem to be derivable from this type of analysis. One of the cutest and most-interesting parts of this work is that it is able to determine places in the code where a programmer might not fully understand the interface, but has managed to remove all compiler and warnings and get the program to work in the common case. By discovering the inconsistent (but syntactically correct) use of APIs, this program is able to flag suspicious code for audit. In general, this paper seems to be one of those great ideas paper -- the ideas are so obvious, I find it hard to believe that this work has not already been done -- I anticipate the day that gcc or Project Builder for Mac OS X, or Visual Studio for Windows has this feature.