Software Integrity


CodeXM: Awesome code checker power (itty-bitty learning curve!)

CodeXM: Awesome Code Checker Power (Itty-bitty Learning Curve!)

What you need to know, and (more importantly) what you don’t, about the CodeXM checkers.

When you develop your software, you may not be aware of what the compiler is doing to transform source into an executable. The neat thing is you don’t need to. Just know things like what a variable declaration is, what a for loop consists of, how to call a function—and the compiler does the rest.

That’s the “smoothly functioning technology” Arthur C. Clarke once said is indistinguishable from magic.

As I mentioned previously, there’s a lot going on unseen when Coverity is checking your code. The folks in the Software Integrity Group eat, sleep, and drink all this behind-the-curtains magic to help you find those logical errors and security flaws in your code. When it came time to give you a tool to write your own checkers, though, we reasoned that just as compilers shield you from all that cryptic arcana, so too should CodeXM.

The upshot: You don’t need a Ph.D. in software verification or compiler theory to write a code checker using CodeXM.

What to know before you write a code checker

But what do you need to know, at least to start out?

You need to know your own programming language, whether it’s C or C++, Java, JavaScript, or any of the other languages CodeXM supports. You’ll be using CodeXM to write code checkers that examine code in that language, so you’ll need to know what things are called. But this is knowledge you likely already bring to the table. CodeXM provides “language libraries” that find things in your language of choice, expressed in the vocabulary of that language—constructs such as for loops, variable declarations, and the like.

You sort of need to know that Coverity parses your code and produces an internal representation. A CodeXM code checker will examine this internal representation to look for the things you identify as problematic. I say “sort of need to know” because the details of this internal representation are no more crucial to writing a code checker than to writing code that gets compiled later. Just understand that it happens, not so much how it happens.

You need to know that the primary tool of the trade in writing a code checker is a thing called a pattern. Patterns are said to match specific bits of your code. We provide some predefined patterns, but you can easily write your own. We’ll get into that in later posts.

You need to know that most analysis happens at the function-by-function level. That is, a checker looks at one function at time, examining the code within that function before moving to the next function. Formal computer science types would use terms like “iterate over,” “enumerate,” “traverse,” or “visit” to describe the process, but really, you can just think of it as “uses a for loop.”

Examining the “business part” of the code checker

Let’s put this together and look at that example from the previous blog post—in particular, the “business part” of the checker, repeated here:

     reports =
       for c in globalset allFunctionCode
         where c matches gotoStatement:  { 
            events = [

This should seem familiar since nearly every programming language has the concept of a for loop. In CodeXM, the for loop examines that internal representation I mentioned earlier, one part at a time. In this example, it examines all the code in every function that you provided to analysis.

The next line should be familiar to anybody that writes structured queries: The where clause indicates we’re only interested in goto statements within the code. We’re basically saying, “In any code of any function, if that code is a goto statement, then…”

A quick note here: Since CodeXM is examining parsed code, it won’t be fooled by the word “goto” in a comment, or a variable with the letters “goto” in it, or any other thing. It’ll recognize only real goto statements. Nice, right?

The “then” part of the query describes what the checker should do with each goto statement it finds. (In the example in the previous blog post, the checker creates a warning at the location of the goto statement.) If you’re familiar with Coverity’s GUI, Coverity Integrity Manager, you’ll be happy to know your results will appear there when you add this code checker to your analysis.

So while cool, writing a checker is hardly magic. We’ll continue the next post with a slightly more powerful code checker—and they can get pretty powerful—revealing a few more features along the way. Stay tuned.

Talk to us about CodeXM:
Learn more and ask questions in the Software Integrity Community


More by this author