[ CS 199 Final Paper ]

0. Author

  Josh Hyman
  SID: 802-971-681
  Spring 2004


1. Goals

There are a few well known classes of bugs which are quite common in C code.
These include dereferencing a NULL pointer and accessing arrays out of bounds.
The goal of this project is to apply compile time checks for these otherwise
runtime errors.  Further, to be able to apply these checks to arbitrary code
written in C.  The eventual direct application will be to check the GNU C
Library for such errors.  Any errors found will be fixed, and patches will be
contributed back the GNU C Library project. 


2. Similar projects

A project called Smatch (http://smatch.sf.net) provides similar compile-time
capabilities.  Smatch is based on the Stanford checker and attempts to provide
similar functionality at the compiler level.  Smatch provides a modified
compiler which generates annotated source code.  This annotated code can then
be checked for various unwanted properties.  Examples are checking for
unreachable code, simple for loop bounds checking, and simple pointer
dereferencing checks.  The majority of these techniques lean heavily on
specific syntax rather than on predicates in a control-flow-graph.  This makes
this approach more brittle and far less general than what we propose.


3. Methodology

This project took advantage of two projects out of Berkeley.  First, to provide 
source code annotations we used Ccured (http://manju.cs.berkeley.edu/ccured/).  
These annotations had the effect of adding a predicate for each property we
wished to verify.  Ccured takes preprocessed source as input and is able to
reason about the application using CIL (http://manju.cs.berkeley.edu/cil/).
It then inserts various run-time checks when it can not deduce that such a
check had been performed from the context.  Ccured can insert a variety of
run-time checks into arbitrary source code.  Of particular interest are NULL
checks for what Ccured calls SAFE pointers and bounds check for what Ccured
calls SEQ and FSEQ pointers.  I modified Ccured to correctly to insert unique
predicates for each run-time check.  Previously, Ccured would insert a macro
and call a function within a small library it included.  The benefit of
replicating the function to check a particular property will be discussed
later.

Second, to provide reachability analysis within the control-flow-graph we used
BLAST (http://www-cad.eecs.berkeley.edu/~rupak/blast/).  BLAST also uses CIL
to process C source code and creates a state space containing all possible
states of the program.  BLAST can then, given a particular state, check to see
if that state is reachable from any other given state (generally the state
assigned to the main function).  This allows us to determine if any of the
states added by Ccured provide a edge leading to a live failure state.  If
this is the case, it is a bug in the program and must be fixed.  Otherwise,
the state is safe and its addition can be attributed to Ccured being too
conservative.  However, our test programs contained many predicates that we
wanted to check.  Rupak modified BLAST to find all possible error states
within a given program, iterating over them and running the model checker for
each predicate.

An important problem with this approach is that BLAST's complexity is
exponential in the number of error states it's tracking.  To this end, Ccured
was modified to add an additional unique error state.  Then, BLAST was modified
to loop through all of the possible error states, replacing each in turn with
the unique error state, and then checking that unique state's reachability.
This would then allow BLAST to only track a single state at a time, a huge
performance win.  This procedure also lends itself to parallelism.  Since each
execution of the BLAST model checker is independent, a number of threads (on
an SMP machine or on a cluster) could be utilized to distribute the work.


4. Successes

When initially testing our technique, we used very simple code samples to
verify the correct insertions of NULL checks by Ccured and reachability
analysis by BLAST.  This served to show that Ccured conservatively inserts
checks, preferring to insert a check when the context is unclear even when
operating in optimized mode.  However, for this purpose, unneeded insertions
actually proved that BLAST was correctly analyzing the program's state space.

Further testing showed that the combination of Ccured and BLAST was able to
prove the correctness of trivial programs even with convoluted control-flow-
graphs.  Then we tested GNU getopt(3) using the same procedure.  This code
was intended to represents typical source code.  It also has the property
that it has been thoroughly tested and is heavily used.  Curiously, Ccured
inserted two NULL checks into the source code.  Upon inspection, it became
obvious that pointers in question would not be NULL; however, we proceeded to
check their reachability using BLAST.

This test unearthed a problem in BLAST regarding loops which do not have any
effect on the predicate in question.  A resolution to this problem is
introduced later.  However, after correcting this problem, BLAST was able to
verify that one of the two NULL checks that Ccured had inserted was
superfluous.  Upon further inspection, the second NULL check was also verified
to be unneeded.  I created the test cases based on getopt(3) as well as wrote
a set of other test cases to exercise particular problems within BLAST.


5. Failures

Unfortunately, BLAST was not able to verify that both NULL checks inserted
into getopt(3) were unneeded.  This is the case because getopt(3) initially
establishes properties of its input, but BLAST can not follow the procedure.
The problem arises when getopt(3) loops through all elements of argv[] until
it finds a NULL element.  It then deduces that all previous elements are
non-NULL.  From this construct, BLAST is only able to determine that the last
element is NULL but does not "remember" that all previous elements were
non-NULL.  This poses a serious problem as such techniques are used very
regularly when dealing with NULL terminated strings.  Currently, this is an
area of active research which may produce valuable results in the near future.

Another pitfall in BLAST is the fact that it attempts to traverse loops
within the control-flow-graph even if they do not modify the state of the
variable in question.  This leads to an exponential blow up in the number of
tracked predicates, all of which are effectively useless.  Rupak was able to
solve this particular problem (described in more detail below).


6. Improvements

There are several improvements that can be made to greatly increase the speed
of this analysis.  First, we can ignore loops which do not affect the
predicate in question.  This is complicated by the fact that the value of the
predicate may depend indirectly indirectly on the loop, but if independence
can be verified, then the loop can be ignored completely.  Otherwise, we can
assume pseudo-random values for the variables affected by the loop.  This
would effectively enlarge the space containing all states of the program, an
acceptable trade-off because this will dramatically reduce computation time.

Another major improvement would be to collapse all states and edges in the
control-flow-graph which have no effect on predicate being checked.  This
method, called slicing, would act in the following way.  First, it can
collapse all states "after" the current state.  These are unneeded because they
would not execute until after the current predicate is evaluated and therefore
have no effect on its value.  Second, it could traverse the control-flow-
graph backwards keeping track of variable that directly or indirectly affect
the value of the predicate of interest, replacing all other states and
transitions with no-ops.  The result is a slice of the program containing only
the states needed to verify the current predicate.  Not only will this reduce
overall execution time, but it will also help BLAST avoid potentially complex
areas within the code which are irrelevant for the current goal.  I will be
looking at this particular problem over the summer with Rupak's assistance.

The third improvement would be to add a fixed, user specified bound on the
computation time for any given predicate.  In truth, not all predicates can be
verified in a "reasonable" amount of time.  This is generally due to the
overall complexity of the code, but could also be BLAST getting caught in a
complex looping construct.  In either case, the goal would be to exit
gracefully and provide enough information to a human to essentially pick up
where the model checker left off.  A timeout facility already exists within
BLAST and I will port it to the iterative model checker. 

All of the improvements mentioned here are currently being implemented and
tested.  They are particularly necessary when attempting to verify any large,
non-trivial piece of source code.


7. Future

The aforementioned modifications are not the only needed changes to make this
process generally applicable; they constitute the first step.  One
particularly important and specifically difficult addition would be support
for quantified predicates.  This is the notion that a single predicate can
provide information about more than one element of a data structure.  The
canonical example of this would be NULL terminated strings.  If a program
were to traverse a string searching for the NULL character, terminating the
loop when the NULL character is found, the model checker should understand
that all previous elements are non-NULL.  Currently, BLAST will not reach this
conclusion without user intervention.  Instead, it will only note the fact
that the last element is NULL, "forgetting" that all previous elements are
non-NULL.  This is an important oversight because NULL terminated structures
are very common in C code.

Another, even more complex, addition would be to add ability for CIL to parse
C++ syntax; allowing BLAST to reason about programs written in C++ as well as
C.  This is not a trivial change because C++ syntax is significantly more
complex.  A similar, less ambitious change, would be to add complete support
for C syntax, reducing some of the syntactic sugar to its canonical form.
This could be implemented by further preprocessing the source before CIL
parses it or it could be implemented within CIL itself.

Lastly, we would like to devise a comprehensive test suite composed of
recently found bugs in open source software.  These test cases will be small
test cases which exhibit the unwanted trait.  We will attempt to show that our
tool would have been able to catch these bugs.  This will provide us with both
a rich variety of common C constructs as well as a sampling of real bugs in
the wild.  This will help us refine our techniques and provide reproducible
results for others.


[ Original at http://www.joshhyman.com/opensource/bugs/final.txt ]