[ CS 199 Project Plan ] 0. Author Josh Hyman SID: 802-971-681 Spring 2004 1. Goals There are a few well known classes of bugs which are quite common in C code. These include dereferencing a NULL pointer and accessing arrays out of bounds. The goal of this project is to apply compile time checks for these otherwise runtime errors. Further, to be able to apply these checks to arbitrary code written in C. The eventual direct application will be to check the GNU C Library for such errors. Any errors found will be fixed, and patches will be contributed back the GNU C Library project. 2. Tools This project will be making use of two syntax analysis tools: BLAST (Berkeley Lazy Abstraction Software Verification Tool) out of UC Berkley, and Simplify from Compaq Research Labs. These tools analyze preprocessed C source code, generating an finite state machine representing all possible states of the program. A reachability algorithm may then be applied to determine if undesirable states are reachable. If such a case exists, it represents a bug in the program's definition. 3. Method In order to test for undesired states, these states must be well defined. The tools employed specify these definitions in the form of function annotations separate from the original source code. Many undesirable states can be found by simply defining the pre- and post-conditions of a given function. If these conditions do not hold for any reason at any time, the function or its caller must be incorrectly defined. This is especially useful to ensure that error codes returned from functions are checked and handled. This style of annotation can potentially find a large number of common bugs in source code where such definitions are available. However, it would be more useful if these definitions could be applied to arbitrary C source code to find similar errors. To this end, the project will pursue two distinct possibilities. First, there will be an attempt to generalize existing annotations and build new general descriptions for arbitrary application. Second, an attempt will be made to automatically generate adequate annotations given C source code as input. To reach the goal of truly arbitrary application, a hybrid approach will likely emerge. Once a method is devised to analyze any given fragment of C source code, then this method will be applied to a large body of code. For this project, the GNU C Library will represent such a large body of code. Our generalization and generation methods will be applied to a small fraction of the library. From the length of time needed to find and fix errors in the fraction, the amount of time to check and repair the entire library can be extrapolated. 4. Proof of Concept To test these ideas in practice, a small portable section of the GNU C Library has been chosen as a test bed. The getopt routines have been refined and debugged over the last number of years. It is expected that few if any errors will be found in this code. Regardless, precise annotations will be generated by hand to become familiar with BLAST and Simplify as well as verify the correctness of getopt. Then, errors will intentionally be introduced into getopt; the existence of these errors will then be verified by rechecking the source using the previously defined annotations. 5. The Next Step Once complete annotations for the getopt routines have been devised, the concept will be expanded to a larger set of functions. To simplify the problem, a small class of errors will be defined. In all likelihood, the class of errors defined by dereferencing NULL pointers will be chosen. The focus will then shift from defining getopt specific function descriptions to defining general annotations to find the chosen class of errors in any function implementation. 6. Timeline April 21 Complete proof of concept May 10 Progress report June 11 Final report [ Original at http://www.joshhyman.com/opensource/bugs/project-plan.txt ]