[ CS 199 Final Paper ] 0. Author Josh Hyman SID: 802-971-681 Spring 2004 1. Goals There are a few well known classes of bugs which are quite common in C code. These include dereferencing a NULL pointer and accessing arrays out of bounds. The goal of this project is to apply compile time checks for these otherwise runtime errors. Further, to be able to apply these checks to arbitrary code written in C. The eventual direct application will be to check the GNU C Library for such errors. Any errors found will be fixed, and patches will be contributed back the GNU C Library project. 2. Similar projects A project called Smatch (http://smatch.sf.net) provides similar compile-time capabilities. Smatch is based on the Stanford checker and attempts to provide similar functionality at the compiler level. Smatch provides a modified compiler which generates annotated source code. This annotated code can then be checked for various unwanted properties. Examples are checking for unreachable code, simple for loop bounds checking, and simple pointer dereferencing checks. The majority of these techniques lean heavily on specific syntax rather than on predicates in a control-flow-graph. This makes this approach more brittle and far less general than what we propose. 3. Methodology This project took advantage of two projects out of Berkeley. First, to provide source code annotations we used Ccured (http://manju.cs.berkeley.edu/ccured/). These annotations had the effect of adding a predicate for each property we wished to verify. Ccured takes preprocessed source as input and is able to reason about the application using CIL (http://manju.cs.berkeley.edu/cil/). It then inserts various run-time checks when it can not deduce that such a check had been performed from the context. Ccured can insert a variety of run-time checks into arbitrary source code. Of particular interest are NULL checks for what Ccured calls SAFE pointers and bounds check for what Ccured calls SEQ and FSEQ pointers. I modified Ccured to correctly to insert unique predicates for each run-time check. Previously, Ccured would insert a macro and call a function within a small library it included. The benefit of replicating the function to check a particular property will be discussed later. Second, to provide reachability analysis within the control-flow-graph we used BLAST (http://www-cad.eecs.berkeley.edu/~rupak/blast/). BLAST also uses CIL to process C source code and creates a state space containing all possible states of the program. BLAST can then, given a particular state, check to see if that state is reachable from any other given state (generally the state assigned to the main function). This allows us to determine if any of the states added by Ccured provide a edge leading to a live failure state. If this is the case, it is a bug in the program and must be fixed. Otherwise, the state is safe and its addition can be attributed to Ccured being too conservative. However, our test programs contained many predicates that we wanted to check. Rupak modified BLAST to find all possible error states within a given program, iterating over them and running the model checker for each predicate. An important problem with this approach is that BLAST's complexity is exponential in the number of error states it's tracking. To this end, Ccured was modified to add an additional unique error state. Then, BLAST was modified to loop through all of the possible error states, replacing each in turn with the unique error state, and then checking that unique state's reachability. This would then allow BLAST to only track a single state at a time, a huge performance win. This procedure also lends itself to parallelism. Since each execution of the BLAST model checker is independent, a number of threads (on an SMP machine or on a cluster) could be utilized to distribute the work. 4. Successes When initially testing our technique, we used very simple code samples to verify the correct insertions of NULL checks by Ccured and reachability analysis by BLAST. This served to show that Ccured conservatively inserts checks, preferring to insert a check when the context is unclear even when operating in optimized mode. However, for this purpose, unneeded insertions actually proved that BLAST was correctly analyzing the program's state space. Further testing showed that the combination of Ccured and BLAST was able to prove the correctness of trivial programs even with convoluted control-flow- graphs. Then we tested GNU getopt(3) using the same procedure. This code was intended to represents typical source code. It also has the property that it has been thoroughly tested and is heavily used. Curiously, Ccured inserted two NULL checks into the source code. Upon inspection, it became obvious that pointers in question would not be NULL; however, we proceeded to check their reachability using BLAST. This test unearthed a problem in BLAST regarding loops which do not have any effect on the predicate in question. A resolution to this problem is introduced later. However, after correcting this problem, BLAST was able to verify that one of the two NULL checks that Ccured had inserted was superfluous. Upon further inspection, the second NULL check was also verified to be unneeded. I created the test cases based on getopt(3) as well as wrote a set of other test cases to exercise particular problems within BLAST. 5. Failures Unfortunately, BLAST was not able to verify that both NULL checks inserted into getopt(3) were unneeded. This is the case because getopt(3) initially establishes properties of its input, but BLAST can not follow the procedure. The problem arises when getopt(3) loops through all elements of argv[] until it finds a NULL element. It then deduces that all previous elements are non-NULL. From this construct, BLAST is only able to determine that the last element is NULL but does not "remember" that all previous elements were non-NULL. This poses a serious problem as such techniques are used very regularly when dealing with NULL terminated strings. Currently, this is an area of active research which may produce valuable results in the near future. Another pitfall in BLAST is the fact that it attempts to traverse loops within the control-flow-graph even if they do not modify the state of the variable in question. This leads to an exponential blow up in the number of tracked predicates, all of which are effectively useless. Rupak was able to solve this particular problem (described in more detail below). 6. Improvements There are several improvements that can be made to greatly increase the speed of this analysis. First, we can ignore loops which do not affect the predicate in question. This is complicated by the fact that the value of the predicate may depend indirectly indirectly on the loop, but if independence can be verified, then the loop can be ignored completely. Otherwise, we can assume pseudo-random values for the variables affected by the loop. This would effectively enlarge the space containing all states of the program, an acceptable trade-off because this will dramatically reduce computation time. Another major improvement would be to collapse all states and edges in the control-flow-graph which have no effect on predicate being checked. This method, called slicing, would act in the following way. First, it can collapse all states "after" the current state. These are unneeded because they would not execute until after the current predicate is evaluated and therefore have no effect on its value. Second, it could traverse the control-flow- graph backwards keeping track of variable that directly or indirectly affect the value of the predicate of interest, replacing all other states and transitions with no-ops. The result is a slice of the program containing only the states needed to verify the current predicate. Not only will this reduce overall execution time, but it will also help BLAST avoid potentially complex areas within the code which are irrelevant for the current goal. I will be looking at this particular problem over the summer with Rupak's assistance. The third improvement would be to add a fixed, user specified bound on the computation time for any given predicate. In truth, not all predicates can be verified in a "reasonable" amount of time. This is generally due to the overall complexity of the code, but could also be BLAST getting caught in a complex looping construct. In either case, the goal would be to exit gracefully and provide enough information to a human to essentially pick up where the model checker left off. A timeout facility already exists within BLAST and I will port it to the iterative model checker. All of the improvements mentioned here are currently being implemented and tested. They are particularly necessary when attempting to verify any large, non-trivial piece of source code. 7. Future The aforementioned modifications are not the only needed changes to make this process generally applicable; they constitute the first step. One particularly important and specifically difficult addition would be support for quantified predicates. This is the notion that a single predicate can provide information about more than one element of a data structure. The canonical example of this would be NULL terminated strings. If a program were to traverse a string searching for the NULL character, terminating the loop when the NULL character is found, the model checker should understand that all previous elements are non-NULL. Currently, BLAST will not reach this conclusion without user intervention. Instead, it will only note the fact that the last element is NULL, "forgetting" that all previous elements are non-NULL. This is an important oversight because NULL terminated structures are very common in C code. Another, even more complex, addition would be to add ability for CIL to parse C++ syntax; allowing BLAST to reason about programs written in C++ as well as C. This is not a trivial change because C++ syntax is significantly more complex. A similar, less ambitious change, would be to add complete support for C syntax, reducing some of the syntactic sugar to its canonical form. This could be implemented by further preprocessing the source before CIL parses it or it could be implemented within CIL itself. Lastly, we would like to devise a comprehensive test suite composed of recently found bugs in open source software. These test cases will be small test cases which exhibit the unwanted trait. We will attempt to show that our tool would have been able to catch these bugs. This will provide us with both a rich variety of common C constructs as well as a sampling of real bugs in the wild. This will help us refine our techniques and provide reproducible results for others. [ Original at http://www.joshhyman.com/opensource/bugs/final.txt ]