[ CS 199 Project Plan ]


0. Author

   Josh Hyman
   SID: 802-971-681
   Spring 2004


1. Goals

  There are a few well known classes of bugs which are quite common in C code.
These include dereferencing a NULL pointer and accessing arrays out of bounds.
The goal of this project is to apply compile time checks for these otherwise
runtime errors.  Further, to be able to apply these checks to arbitrary code
written in C.  The eventual direct application will be to check the GNU C
Library for such errors.  Any errors found will be fixed, and patches will be
contributed back the GNU C Library project. 
   

2. Tools

  This project will be making use of two syntax analysis tools: BLAST (Berkeley
Lazy Abstraction Software Verification Tool) out of UC Berkley, and Simplify
from Compaq Research Labs.  These tools analyze preprocessed C source code,
generating an finite state machine representing all possible states of the
program.  A reachability algorithm may then be applied to determine if
undesirable states are reachable.  If such a case exists, it represents a bug
in the program's definition.


3. Method

  In order to test for undesired states, these states must be well defined.
The tools employed specify these definitions in the form of function
annotations separate from the original source code.  Many undesirable states
can be found by simply defining the pre- and post-conditions of a given
function.  If these conditions do not hold for any reason at any time, the
function or its caller must be incorrectly defined.  This is especially useful
to ensure that error codes returned from functions are checked and handled.

  This style of annotation can potentially find a large number of common bugs
in source code where such definitions are available.  However, it would be more
useful if these definitions could be applied to arbitrary C source code to find
similar errors.  To this end, the project will pursue two distinct
possibilities.  First, there will be an attempt to generalize existing
annotations and build new general descriptions for arbitrary application.
Second, an attempt will be made to automatically generate adequate annotations
given C source code as input.  To reach the goal of truly arbitrary
application, a hybrid approach will likely emerge.

  Once a method is devised to analyze any given fragment of C source code, then
this method will be applied to a large body of code.  For this project, the GNU
C Library will represent such a large body of code.  Our generalization and
generation methods will be applied to a small fraction of the library.  From
the length of time needed to find and fix errors in the fraction, the amount of
time to check and repair the entire library can be extrapolated.


4. Proof of Concept

  To test these ideas in practice, a small portable section of the GNU C
Library has been chosen as a test bed.  The getopt routines have been refined
and debugged over the last number of years.  It is expected that few if any
errors will be found in this code.  Regardless, precise annotations will be
generated by hand to become familiar with BLAST and Simplify as well as verify
the correctness of getopt.  Then, errors will intentionally be introduced into
getopt; the existence of these errors will then be verified by rechecking the
source using the previously defined annotations.


5. The Next Step

  Once complete annotations for the getopt routines have been devised, the
concept will be expanded to a larger set of functions.  To simplify the
problem, a small class of errors will be defined.  In all likelihood, the class
of errors defined by dereferencing NULL pointers will be chosen.  The focus
will then shift from defining getopt specific function descriptions to defining
general annotations to find the chosen class of errors in any function
implementation.


6. Timeline

  April 21    Complete proof of concept 
  May 10      Progress report 
  June 11     Final report


[ Original at http://www.joshhyman.com/opensource/bugs/project-plan.txt ]