Software Integrity Blog

 

How to run your CodeXM checker

In part two of our series on writing checkers with CodeXM, we explore how to run your CodeXM checker with Coverity using a command line interface.

Running your CodeXM checker | Synopsys

In the last post, we discussed how to write a simple checker using CodeXM. But writing the checker is not our final purpose; our target is to use that checker on our own business code. In this post, we look at how to run your CodeXM checker with Coverity® using a command line interface.

The checker code and the target code

In the previous post, we wrote a CodeXM checker named “USE_RESERVED_IDENTIFIER_IN_FUNCTION_NAME” to find all function definitions whose names began with an underscore and consisted of uppercase letters or underscores only. The checker code is shown below. We’ll put this code in file and name the file mychecker.cxm.

```
include `C/C++`;

pattern reservedFuncId {
    functionDefinition {
        .functionSymbol.identifier == Regex("^_[A-Z_].*")
    }
};

checker{
    name = "USE_RESERVED_IDENTIFIER_IN_FUNCTION_NAME";
    reports =
        for fd in globalset allFunctionDefinitions % reservedFuncId:
        {
            events = [
                {
                    description = "The reserved identifier is used.";
                    location = fd.location
                }
            ];
        }
};
```

To see some results of running the checker, we need to prepare a simple code example that has the `reserved_function_id`. Your codebase may not have `reserved_function_id` as a function name, so this checker may not have any findings when running against your current codebase. We don’t suggest you add any problematic code in your codebase just to test the checker. Below is a simple code example to test the ability of the checker.

```c
int _Func(int i) // function name starting with ‘_’ and an uppercase ‘F’
{
    return i;
}
```

Save this code to “mytarget.c” and put this file together with “mychecker.cxm” in the same directory. Please note that Coverity doesn’t require you to put the checker file and test case file in the same directory; we put them in the same directory just for simplicity’s sake. You can put them anywhere you want.

Run a checker in the command line

The first step is to build the target code:

```
cov-build --dir idir gcc -o mytarget.o mytarget.c
```

For command cov-build:

  • –dir idir specifies the intermediate directory
  • idir is used to keep the building results
  • gcc -o mytarget.o mytarget.c is the build command of native complier

Not familiar with this command line? Refer to the Coverity user guides for how to use other compilers for cov-build. Coverity also supports other compilers.

After the first step is complete, you get the intermediate directory containing build results. Now you can use cov-analyze to analyze your target code.

```
cov-analyze --dir idir --disable-default --codexm mychecker.cxm
```
  • –dir idir specifies the intermediate directory that contains the build results
  • –disable-default disables default checkers in the cov-analyze command, so we can focus on the results from the checker
  • –codexm mychecker.cxm specifies the checker file you want to use

Once the analysis is completed, its output should appear something like this:

```
Coverity Static Analysis version 2021.06 on Linux 4.15.0-123-generic x86_64
Internal version numbers: 8edcdb2edc p-2021.06-push-503

Using 32 workers as limited by license
Looking for translation units
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Computing links for 1 translation unit
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Computing virtual overrides
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Computing callgraph
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Topologically sorting 1 function
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Computing node costs
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Running analysis
|0----------25-----------50----------75---------100|
****************************************************
[STATUS] Exporting summaries
|0----------25-----------50----------75---------100|
****************************************************
Analysis summary report:
------------------------
Files analyzed                 : 1 Total
    C                          : 1
Total LoC input to cov-analyze : 4
Functions analyzed             : 1
Paths analyzed                 : 1
Time taken by analysis         : 00:00:04
Defect occurrences found       : 1 USE_RESERVED_IDENTIFIER_IN_FUNCTION_NAME
```

You can see we found a defect named `USE_RESERVED_IDENTIFIER_IN_FUNCTION_NAME` in the last line of the summary report.

Viewing and managing the results

Typically, we would use `cov-commit-defects –dir intermediate-dir` to commit results to Coverity Connect®, a platform to view and manage defects. But if you want to see your results immediately, you may have a more convenient way.

```
cov-format-errors --dir idir --html-output errs
```

When this command finishes, you should find a folder, `errs`, that contains HTML files, notably `index.html`. To view the defects found by the analysis, use a browser to open `index.html`.

You can also output the analysis result on the command line screen. The command is:

```
cov-format-errors --dir idir --emacs-style
```

The output shows the detail information of what the checker found.

```
/<localdir>/test/codexm/mytarget.c:1
  Type: Other violation (USE_RESERVED_IDENTIFIER_IN_FUNCTION_NAME)

/<localdir>/test/codexm/mytarget.c:1:
  1. use_reserved_identifier_in_function_name: The reserved identifier is 
used.
```

Improve the checker

This is a very simple checker that finds the function definitions using reserved identifiers, but it only covers one situation of reserved identifiers. All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.

We created a pattern, `reservedFuncId`, to find any function names that follow the pattern of using reserved identifiers. This pattern is expressed as a simple Regex expression and is directly summarized from the description of one situation of reserved identifiers. But reserved identifiers may have more complex patterns of representations. What if we need to cover other representations of reserved identifiers with more complicated patterns?

Let’s take this code snippet for example:

```c
int myFunc(int arg1, int _Arg)
{
    return arg1;
}
```

The argument `_Arg` is a reserved identifier; all identifiers that begin with an underscore are always reserved for use as identifiers with a file scope including both the ordinary and tag name spaces.

Unfortunately, our checker can’t cover this scenario. Our checker traverses all the function definitions covered in globalset `allFunctionDefinitions` and matches all function names with the given Regex expression, but function parameters do not exist in the globalset `allFunctionDefinitions`, so the pattern `reservedFuncId` cannot match the argument `_Arg`. We can see that the main problem is how to match arguments in a function definition.

It’s not easy to find the patterns that can match a function parameter from the plain text code, but we have a way to dump out the AST (abstract syntax tree) of the code, and that can help us get more useful information from the code. The command is:

```
cov-manage-emit --dir idir find myFunc --print-codexm > myFunc-AST.json
```
  • –dir idir specifies the directory containing the build results
  • find myFunc specifies the function you want to see the AST, and you can use find to dump out AST for all functions
  • myFunc-AST.json is the output file containing ASTs for the specified function

We read the AST and dig up more information about this code, so we know that function parameters are in the `formalParameterList` of function definitions. So we can write code to match function parameters like this:

```
pattern reservedIdForAnyUse {
    Regex("^_([A-Z]|_).*")
};

checker{
    name = "USE_RESERVED_IDENTIFIER_IN_FUNCTION_PARAMETER";
    reports =
        for fnc in globalset allFunctionDefinitions:
            for fms in fnc.formalParameterList where
                fms.identifier matches reservedIdForAnyUse:
                {
                    events = [
                        {
                            description = "The reserved identifier which 
is reserved for any use is declared.";
                            location    = fms.location
                        }
                    ];
                }
};
```

Now we can go through the steps of running a checker again, and the result shows that our checker reports a new defect. Also, we can put the two checkers together and get a new checker that can cover two situations of `using reserved identifier`.

```
include `C/C++`;

pattern reservedFuncId {
    functionDefinition {
        .functionSymbol.identifier matches reservedIdForAnyUse
    }
};

pattern reservedIdForAnyUse {
    Regex("^_([A-Z]|_).*")
};

checker{
    name = "USE_RESERVED_IDENTIFIER";
    reports =
        (for fnc in globalset allFunctionDefinitions:
            for fms in fnc.formalParameterList where
                fms.identifier matches reservedIdForAnyUse:
                {
                    events = [
                        {
                            description = "The reserved identifier which 
is reserved for any use is declared.";
                            location    = fms.location
                        }
                    ];
                }
        )
        ++ // Use `++` to combine the two situations.
        (for fd in globalset allFunctionDefinitions % reservedFuncId:
            {
                events = [
                    {
                        description = "The reserved identifier is used.";
                        location = fd.location
                    }
                ];
            }
        )
};
```

Now this checker reports two defects when analyzing the code snippet below—one for the reserved function ID (_Func) and another for the reserved function parameter (_Arg).

```c
int _Func(int i) // defect here
{
    return i;
}

int myFunc(int arg1, int _Arg) // defect here
{
    return arg1;
}
```

Summary

Now you know how to run a checker in command line, along with some related commands. With these steps and Coverity commands, you can easily analyze your code with a checker, read the analyze results, and retrieve ASTs from code to improve your checker. And in the next post we will introduce the basic syntax of CodeXM.

Subscribe to the blog to get the latest AppSec news