Software Integrity

 

Let’s write more CodeXM checkers (second-stage ignition)

Let’s use CodeXM to write a Coverity checker (second-stage ignition)

If you read the previous installment, you’ll recall that we boosted ourselves to low earth orbit using CodeXM to write a Coverity checker to help enforce a naming convention (which, of course, you can tweak to suit your needs).

Our progress so far: local variables and function names (including method names).

Now we’ll push higher up, getting to global variables, classes, and their member variables. That should be enough of a platform for you to explore deeper into the CodeXM space.

Ready?

Global variables

It should come as no surprise to you that all the global variable declarations are contained in just another set to traverse. That is to say, we can use a for loop to examine them one by one.

for glob in globalset allGlobalVarDefinitions % nonconformingGlobalName

Like before, we filter all global variable definitions that are nonconforming using the set-filtering (a.k.a. which-are) operator, %, to give us just those definitions we deem problematic. All that remains is to define what we think “nonconforming” means. Let’s look at the whole checker, plus the pattern it relies upon:

pattern nonconformingGlobalName { // Take 1: good
  globalVariableDefinition {
    .variable.identifier != Regex("^g_[a-z][a-z0-9]*([A-Z][a-z0-9]*)*$")
  }
};

checker {
  name = "GLOBAL_NAMING_VIOLATION";
  reports = for glob in globalset allGlobalVariableDefinitions 
                                  % nonconformingGlobalName
    :
    {
      events = [
        { description = "Variable "
                      + (glob.variable.identifier ?? "(shown here)")
                      + " does not follow naming convention for global variables.";
          location = glob.location;
        }
      ];
    };
};

In this instance, we just look for the global variable to be prefixed by the letter g and an underscore (g_) but otherwise expect the name to follow camelCase naming convention.

This works, but it may fire more often than you want, because of definitions like this:

const int MY_OPTION_A = 1;
const int MY_OPTION_B = 2;

Technically, these are globals (that is, globally scoped definitions) too. The compiler may optimize them away, but Coverity’s analysis doesn’t need to make that optimization, because it’s not executing the code, only trying to understand it.

So in our Coverity checker, we can solve the matter by further constraining nonconformance to apply only to definitions that are not const, as follows:

pattern nonconformingGlobalName { // Take 2: better
  globalVariableDefinition {
    .variable.type       != typeQualifierConst; // ignore const-qualified globals
    .variable.identifier != Regex("^g_[a-z][a-z0-9]*([A-Z][a-z0-9]*)*$")
  }
};

You’ll remember from the last post that a decomposition with two or more constraints—like we see above—requires all of them to be met in order to match. So this is essentially a logical and of the two constraints: not being const-qualified, and not having a name that matches the g-underscore naming convention.

We might now consider the pattern done, but it does give an enforcement pass to those const globals. Let’s just apply another condition to make sure they follow some rules too. Since they’re usually a replacement for preprocessor definitions, we’ll require them to be in all capitals with underscore allowed. This takes the form of an alternative decomposition:

pattern nonconformingGlobalName { // Take 3: best
  globalVariableDefinition {
    .variable.type       != typeQualifierConst; // ignore const-qualified globals
    .variable.identifier != Regex("^g_[a-z][a-z0-9]*([A-Z][a-z0-9]*)*$")
  }
| globalVariableDefinition {
    .variable.type       == typeQualifierConst;
    .variable.identifier != Regex("^[A-Z][A-Z0-9]*(_[A-Z0-9]*)*$")
  }
};

What we’re saying here is that nonconformance is defined to be either a global variable definition that is not const-qualified and doesn’t follow the g-prefix naming convention, or a global variable definition that is const-qualified and doesn’t follow the all-caps-and-underscore naming convention.

Class names, and their members too!

Of course you’re likely to pick up on the pattern here. Following the structure we’ve now become familiar with, you would enumerate all the classes you define in your codebase like this:

for c in globalset allClasses % nonconformingClassName

where we will define what a nonconforming class name is with our own custom pattern. In all, the checker would look like this:

pattern nonconformingClassName {
  classDefinition {
    .declaredType.identifier != myProperCaseName;
  }
};

checker {
  name = "CLASS_NAMING_VIOLATION";
  reports = for c in globalset allClasses % nonconformingClassName
    :
    {
      events = [
        { description = "Class "
                      + (c.declaredType.identifier ?? "(shown here)")
                      + " does not follow naming convention for class members.";
          location = c.location;
        }
      ];
    };
};

For simplicity, we’ve just assumed that class names should be in ProperCase form; at this point you should feel comfortable replacing the reference to myProperCaseName with whatever pattern is appropriate for your own naming convention.

This takes care of class names. Next up: data members. Let’s look at how we do that:

for c in globalset allClasses:
  for mem in c.fieldList % nonconformingMemberName

That’s right—we’ve stepped up the game, going from a simple single for loop to where we now nest an inner loop (examining the field members) within the outer loop that examines all the classes.

And yes, you likely noticed that the inner one does not use the globalset keyword.

What gives?

Well, CodeXM—like Coverity in general—is designed to scale. It must be able to examine projects with a few dozen functions, or a few million. Of course, it’s expected to be able to handle either with equal ease. This demands the ability to handle sets that can become large, while other sets can safely be expected to remain in the realm of practical. Knowing which is which allows CodeXM to do smarter things when handling either. Making sure you know which is which means you don’t accidentally write a Coverity checker that won’t scale. Hence the keyword.

The chief takeaway is that you can nest loops, but only the outermost should be examining the contents of a globalset. There are ways to have a Coverity checker examine two global sets, but let’s leave that subject for another day. If you’re familiar with big-O notation or time complexity, things that are O(n2) generally don’t have great performance characteristics. CodeXM may be powerful stuff, but it can’t defy that reality.

But let’s get back to examining data members. You’ll see that the logic of the nested classes really just amounts to examining all the fields (nonstatic data members) of all the classes in your project. The set-filtering which-are operator, %, is used to identify any fields that don’t conform to the established naming convention.

But wait, what about static data members?

No surprises here. It would be a separate loop, like this:

for c in globalset allClasses:
    for mem in c.staticFieldList  % nonconformingStaticMemberName

Couldn’t that be folded into the same checker? Maybe, but that could complicate things if you have separate rules to apply to each, or want to tailor the resulting event messages. In general, it’s frequently more efficient (and productive) to define stand-alone checkers with minimal decision logic.

In the big reveal, here are the checkers we described above, plus the patterns we defined to support them.

pattern nonconformingClassName {
  classDefinition {
    .declaredType.identifier != ProperCaseName;
  }
};

checker {
  name = "CLASS_NAMING_VIOLATION";
  reports = for c in globalset allClasses % nonconformingClassName
    :
    {
      events = [
        { description = "Class "
                      + (c.declaredType.identifier ?? "(shown here)")
                      + " does not follow naming convention for class members.";
          location = c.location;
        }
      ];
    };
};

pattern nonconformingMemberName
{
  fieldSymbol {     // expect members to be prefixed with m_ and be camelCase
    .identifier     != Regex("^m_[a-z][a-z0-9]*([A-Z][a-z0-9]*)*$")
  }
};

pattern nonconformingStaticMemberName 
{
  fieldSymbol {     // as nonstatic members, but prefixed with ms_
    .identifier     != Regex("^ms_[A-Z][a-z0-9]*([A-Z][a-z0-9]*)*$")
  }
};

checker {
  name = "CLASS_MEMBER_NAMING_VIOLATION";
  reports = for c in globalset allClasses:
              for mem in c.fieldList  % nonconformingMemberName
    :
    {
      events = [
        { description = "Class member"
                      + mem
                      + " does not follow naming convention for class members.";
          location = mem.location;
        }
      ];
    };
};

checker {
  name = "CLASS_STATIC_MEMBER_NAMING_VIOLATION";
  reports = for c in globalset allClasses:
              for mem in c.staticFieldList  % nonconformingStaticMemberName
    :
    {
      events = [
        { description = "Static class member "
                      + mem
                      + " does not follow naming convention for static members.";
          location = mem.location;
        }
      ];
    };
};

Orbit achieved. We’ll consider this assignment complete; where you take this next is up to you. As we’ve seen, most of these checkers look remarkably similar, just varying in the things we examine and the messages we report when we find something of interest. (And to be honest, most simple checkers will look very much like this.) More sophisticated checkers are possible and will use more elements of the language that we haven’t covered so far. Yes, there is more you can learn about CodeXM, but as you see, even with these little bits, there’s quite a bit possible.

If all this naming-convention stuff is too much far-fetched pie-in-the-sky, maybe you’d like something a little more down-to-earth.

Pasta, anyone?

Got questions? We’ve got answers.
Join us in the Software Integrity Community.

 

More by this author