Getting started with writing checkers using CodeXM

Jul 08, 2021 / 3 min read

Table of Contents

First sight on CodeXM
Easy to understand, easy to use
Get started today

Static application security testing (SAST) is best described as a method of debugging by automatically examining the source code before the application is deployed. It provides an understanding of the code structure, finds quality and security flaws present in the code, and helps ensure adherence to secure coding standards.

Most open source or commercial SAST tools come with a set of checkers or rules that evaluate the code structure and help identify specific patterns or structure in the program that may lead to quality or security problems. Advanced SAST tools, including Black Duck Coverity®, provide additional capabilities that enable users to write customized rules or checkers. This greatly enhances users’ ability to identify specific patterns that aren’t covered by existing checkers and rules.

Coverity provides a domain-specific language, CodeXM (pronounced "code exam"), that allows users to create customized checkers. It can take a lot of effort to write a good checker, but CodeXM makes it easier to write certain types of checkers. It’s easy to understand, easy to use, and powerful enough to address hard problems. In this series, we will walk through some examples on how to easily create a custom checker using CodeXM.

First sight on CodeXM

In order to clarify the details of CodeXM, let’s look at a small code segment of a checker written in CodeXM.


```
include `C/C++`;

checker{
name = "Hello_Function";
reports =
for fd in globalset allFunctionDefinitions:
{
events = [
{
description = "Hello " + fd.functionSymbol + ".";
location = fd.location
}
];
}
};
```

Can you figure out what it does by the meaningful fields, even if you've never seen the syntax before? You may notice that the checker identifies all function definitions and prints the event message "Hello Function" before each function definition. Let's go through simple explanations for each line of code.


```
// Load the C/C++ library and also limit this checker only for C/C++ code.
include `C/C++`;

// Main entry point, equivalent to main function in some other languages.
checker{
// Checker name.
name = "Hello_Function";
// Where to report a defect and what information should be reported.
reports =
// Use variable 'fd' to traverse every function definition stored in "allFunctionDefinitions".
for fd in globalset allFunctionDefinitions:
{
// What information should be reported when we find a defect.
events = [
{
// The description of the defect. Here is "Hello ."
description = "Hello " + fd.functionSymbol + ".";
// Where this event is displayed.
location = fd.location
}
];
}
};
```

Easy to understand, easy to use

The syntax of CodeXM is much more appealing than the macro-based API in extended SDK. And it's easy to learn based on existing programming experience.

From the code segment, you can see that a basic checker consists of inclusive directives for the specified target language (C, C++) and the checker definition (name, reports, etc.). The body of the checker definition contains several components, including a for-loop-expression to filter what the checker searches for, and the event records to specify what to report when the filter is successful.

For example, you can create a checker to find all function definitions whose name begins with an underscore and consists of uppercase letters or underscores only. These identifiers are reserved, so you need to define a regex syntax to find such patterns.

Of course, all these function definitions with reversed names are contained in a global set:


`allFunctionDefinitions`,

```
for fd in globalset allFunctionDefinitions % reservedFuncId
```

Compared to the previous example, there is a filter operator '%' here, which means that the checker will only look for the function definitions as defined by reservedFuncID.

Then we need to define a `reservedFuncId` pattern to match those functions with reserved identifiers.

```
pattern reservedFuncId {
functionDefinition {
.functionSymbol.identifier == Regex("^_[A-Z_]*")
}
};
```

Now let’s look at the whole checker.

```
include `C/C++`;

pattern reservedFuncId {
functionDefinition {
.functionSymbol.identifier == Regex("^_[A-Z_]*")
}
};

checker{
name = "USE_RESERVED_IDENTIFIER_IN_FUNCTION_NAME";
reports =
for fd in globalset allFunctionDefinitions % reservedFuncId:
{
events = [
{
description = "The reserved identifier " + fd.functionSymbol.identifier + " is used.";
location = fd.location
}
];
}
};
```

The example above is a more meaningful checker. Not too difficult, right?

Get started today

In conclusion, CodeXM is a functional programming language, but it’s more like defining the formulas in a spreadsheet, and the whole logic is expressed in expressions, not statements. You need to specify what you're looking for and where to report violations.

CodeXM is designed to be language-independent, which means you can write checkers for several languages. CodeXM is also expressive, self-describing, and powerful enough to enable you to define your own checkers without knowing arcane concepts in software verification.

We’re just getting started with CodeXM. Stay tuned for more details in our upcoming series.