What is property testing? In short, it can be described as a method of testing output of a program against the expected behavior, or properties, of a piece of code. Why should you care? The same reason we here at Tinfoil Security care: good testing goes beyond ensuring your code is functional. It can be crucial line of defense when it comes to the security of your applications, and property testing is a uniquely powerful tool in accomplishing these ends. But before we dive deeper, lets review more traditional testing.
The test above is fairly straightforward. It attempts to check that the biggest function will in fact return the biggest element of a provided list. It falls short in a few noticeable ways: What if the list is empty? What if it isn’t sorted? What if there are duplicate integers?
Traditional testing very often focuses on specific examples and is dependent on the assumptions of the programmer. The function above may have been created with only positive integers in mind and it may not occur to the writer to test for cases involving negatives.
This example is a simple one, but it demonstrates a major drawback of traditional testing: it reveals the presence of expected bugs, rather than the absence of unexpected bugs. How would we pursue the latter? Enter property testing.
Property testing is reliant on describing the properties of a function or piece of code. Very simply, these properties are general rules on how a program should behave. For the example function above, we might define the following:
biggestreturns
the largest element of a list.”We might describe the properties of other well-known algorithms as such:
sortreturns
a list with every element in ascending order.”appendreturns
a list with a length equal to the sum of the lengths of both lists passed to it.”appendreturns
a list with every element of of the first list, followed by every element of the second list.”Once we have defined the general properties of a program we can move beyond specific examples. From there we can use generators to test the output of our code against these properties using a variety of generated inputs.
This is easier said than done, however. Describing the properties of a program can be difficult, but there are a few general strategies, as described by Fred Herbert’s excellent book on Property Testing in Erlang:
Modeling: Modeling involves reimplementing your algorithm with a simpler (though likely less efficient) one. Our biggest
function for example could have it’s output compared with an algorithm that uses sort
to arrange a list in ascending order, then returns the final element. Sort
is far less time-efficient, O(n log n) compared to the O(n) of our biggest
function, but since it retains the same properties of biggest we can use it as a model to test our results against.
Equivalent Statements: Equivalent statements are used to reframe the property into a simpler one. For instance, we could say that the element returned by biggest
is larger than or equal to any of the remaining elements of the input list. This simplified property is not quite the same but fundamentally equivalent to the one we had defined above.
Symmetry: Some functions have natural inverses. The process of encrypting and decrypting data, for example, can be described by the following properties:
Oracles: Oracles involve using a reference implementation to compare your output against and, as such, are perhaps the best way to test the properties of your code. Oracles are most often used when porting existing code from language to another or when replacing a working implementation with an improved one.
Implementing property tests is not easy. It relies not only on describing the properties you wish to test against, but also on constructing generators to create the large, varied sets of randomized input to feed into your code. A single property may be tested hundreds of times, and generators will often create increasingly complicated inputs across these test iterations, or “generations” as they are called.
One can imagine that this randomly generated input could quickly become too unwieldy for the developer to make sense of. The failing case may contain large amounts of data irrelevant to what the actual cause of the failure. To help narrow things down to the true cause a property testing framework will often attempt to reduce, or “shrink”, the failing case down to a minimal reproducible state. This usually involves shrinking integers down to zero, strings to ""
, and lists to []
.
Fortunately, there are a variety of language-specific property testing libraries currently available. StreamData, for example, is an Elixir property testing library – and candidate to be included into Elixir proper – that provides built in generators for primitive data-types as well as tools to create custom ones. Generators can even be used to generate symbolic function calls, allowing the possibility to fuzz and test transitions on a state machine.
As a final note, it should be mentioned that while property testing is a powerful tool, it is not a perfect solution. Describing the properties of a piece of code can be difficult, as can coming up with tests for those properties. Furthermore, these tests are reliant on well-made generators to come up with the varied and unexpected input, which in itself can be a difficult and time consuming task.
It is also important to note that more traditional testing should not be entirely eschewed for property tests. The real strength of property testing is in using generated input to automate all the tedious work of thinking up unusual edge cases and writing individual tests for them, and it is at its best when used with unit tests that check for unique edge cases or document unusual behavior.
At Tinfoil Security, we understand that thorough and effective testing is an essential part of creating of efficient and secure technology. If you have any questions or would like to let us know how property testing has helped in your projects, feel free to contact us.