Fuzzing Bitcoin with the Defensics SDK, part 2: Fuzz the Bitcoin protocol

In part two of this series, learn how to create a data model for the Bitcoin network protocol and use the Defensics SDK to perform fuzzing on bitcoind.

Fuzzing Bitcoin with the Defensics SDK

In the previous article, you saw how to set up a test bed for bitcoind. We created two containers, fleur and viktor, and set up communication between the two bitcoind instances.

In this article, learn how to create a data model for the Bitcoin network protocol, and then use this model in the Defensics® SDK to perform fuzzing on bitcoind.

Sources:

Bitcoin message header model

I started fuzzing with the version message, which is the first message a peer sends to another peer to announce itself. As you can see in the Wireshark capture in the previous article, a version message sent by one peer gets a version message in response, as well as a verack.

The Bitcoin network protocol is well-documented here:
https://bitcoin.org/en/developer-reference#p2p-network

Every Bitcoin message has a common header that consists of four fields:

  • A magic number identifies the network, either mainnet, testnet, or regtest
  • A command string
  • A payload length
  • A payload checksum

The payload length and checksum are calculated based on the message payload.

Here is one way to represent the Bitcoin header in the Defensics SDK BNF format:

octet = 0x00-0xff 

 

command-chooser = !corr:target ( # Message name 

    .version-name: ‘version’ 5(octet) 

  | .verack-name: ‘verack’ 6(octet) 

  | .any: 12(octet)  

) 

 

bitcoin-header = ( 

    .magic: 0xfabfb5da # Magic number for regtest 

    .command: command-chooser 

    .size: !length32:target 0x00000000-0xffffffff 

    .checksum: !sha256x2:target 0x00000000-0xffffffff 

) 

You can see that the basic structure of the Bitcoin header is the magic number, the command, the length, and the checksum. In addition, we have specific rules that will be used for the length and checksum.

We’ll set up the rules in the Java code, but the designations here (like !length32:target) show that we will have a rule named length32 and that the results will be stored in the size field of the header.

A correlation rule is used to match the command strings in the header to the corresponding message payloads later. Here, we will use command-chooser to select one of the available message commands. We have included version and verack, but in a more complete model of the Bitcoin protocol, we could easily include more.

We use the correlation rule to select one of many possible payloads, one for each message command, as shown here:

bitcoin-payload = !corr:source ( 

    .version-payload: version-payload 

  | .verack-payload: verack-payload 

  | .any-payload: any-payload 

) 

Having set this up (and using rules we haven’t defined yet), here is the generic definition of a complete Bitcoin message:

bitcoin-message = @corr @length32 @sha256x2 ( 

  .header: bitcoin-header 

  .payload: !length32:source !sha256x2:source bitcoin-payload 

) 

The designation @corr @length32 @sha256x2 indicates that we are using the three named rules, and their source and target designations show where the rule will be applied.

Later we will look at how the rules are defined.

Bitcoin version message model

Defining the version message payload is similarly straightforward, although here we aren’t going into much detail about each field. For example, we could create more specific models for each field, which would apply more specific anomalizations and might help test the target more thoroughly.

version-payload = ( 

  .version: (0x7f110100 | 4(octet)) 

  .services: (0x0904000000000000 | 8(octet)) 

  .timestamp: (0x6ff27d5f00000000 | 8(octet)) 

  .addr-recvservices: (0x0100000000000000 | 8(octet)) 

  .addr-recvipaddress: (0x00000000000000000000000000000000 | 16(octet)) 

  .addr-recvport: 0x0000 – 0xffff 

  .addr-transservices: (0x0904000000000000 | 8(octet)) 

  .addr-transipaddress: (0x00000000000000000000000000000000 | 16(octet)) 

  .addr-transport: 0x0000 – 0xffff 

  .nonce: (0xcf7990b352cb105e | 8(octet)) 

  .user-agentbytes: (0x10 | 0x00-0xff) 

  .user-agent: (‘/Satoshi:0.20.1/’ | 0..255(octet)) 

  .start-height: (0x65000000 | 4(octet)) 

  .relay: (0x01 | octet) 

)

Pulling the model into the Defensics SDK

Once the definitions are outlined in the BNF, pulling them into the Defensics SDK is easy. Let’s say all the BNF is defined in a file resources/model.bnf. In a test suite, we can pull in these definitions very simply.

public void build(BuilderTools tools) throws Exception { 

    ElementFactory factory = tools.factory(); 

 

    // Set up rules… 

 

    factory.readTypes(tools.resources().getPathToResource(“model.bnf”));

Once the definitions are pulled in, specific Bitcoin messages can be assembled by selecting a command name in the header; the correlation rule takes care of selecting the associated payload. For example, this is how the version message is created to be ready to use in a message sequence:

MessageElement version = tools.factory().getType(“bitcoin-message”); 

    version.find().mandatory(“version-name”).element().select(); 

     

    tools.messages().message(“version”, version).finish(); 

Opening a can of whoop ass

All this modeling so far does not result in impressive test cases. In particular, the payload size field and the payload checksum field will not be correct. Because these are probably the very first fields examined by bitcoind, such test cases are immediately discarded.

To make our test cases look believable and to accomplish the best testing possible, we need the size and checksum fields to be correct.

In the Defensics SDK, rules are used for cases like this where certain fields need to behave in certain ways.

The correlation and length rules are easiest and can be accomplished using built-in rules from the Defensics SDK. The definitions should happen before the BNF definitions are loaded with readTypes().

    RuleFactory rf = tools.rule(); 

    rf.correlate(“corr”); 

    rf.length(“length32”).format(“int-lsb-32bit”); 

The last line creates a length rule named length32 that formats its result as a 32-bit integer, with the least significant bit first.

The checksum is more challenging because it cannot be addressed using a built-in rule. The Bitcoin protocol checksum is the least significant four bytes of the SHA256 digest of the SHA256 digest of the payload. That’s not a typo. First you calculate the SHA256 digest value of the payload. Then you calculate the SHA256 of that digest value. Then you take the least significant four bytes of the result and use that for the checksum.

I defined a custom rule in the Defensics SDK as follows:

package com.example.sdk; 

 

import java.security.*; 

import java.util.Arrays; 

 

import com.synopsys.defensics.api.message.*; 

import com.synopsys.defensics.api.message.rule.CustomChecksum; 

 

public class SHA256x2 implements CustomChecksum { 

  @Override 

  public byte[] calculate(SDKEngine engine, byte[] data) { 

    MessageDigest mDigest; 

    try { 

      mDigest = MessageDigest.getInstance(“SHA-256”); 

    } catch (NoSuchAlgorithmException e) { 

      throw new IllegalStateException(e); 

    } 

    byte[] shaTheFirst = mDigest.digest(data); 

    byte[] shaTheSecond = mDigest.digest(shaTheFirst); 

    return Arrays.copyOfRange(shaTheSecond, 0, 4); 

  } 

} 

Incorporating this rule is a matter of instantiating the custom rule:

    rf.checksum(“sha256x2”, new SHA256x2());

Again, this definition of the sha256x2 rule needs to happen before the BNF is loaded.

This is one of the places where we’re really cashing in on the value of generational fuzzing. Even when parts of the payload are anomalized beyond recognition, the rules we’ve defined in the data model ensure that the header size and checksum fields are set correctly. Test cases delivered to the bitcoind target will be scrutinized, and then pass through to further parsing code after the size and checksum are verified.

Putting it all together

I haven’t shown you all the source code, just the most important parts. Once you have everything together, you can load the test suite in Defensics and use it very much like any other Defensics test suite.

Assuming Defensics is on the same network as the test bed virtual machine, you simply tell Defensics the IP address and port number of the target. If you use port 18444, it’s mapped to the fleur container.

By default, Defensics will use implicit TCP instrumentation, which means Defensics assumes the target is still healthy as long as it can keep opening up the TCP port. If we did manage to kill bitcoind, Defensics would no longer be able to open the port and would flag an error.

If you want more information as you’re testing, use the following command to observe output to bitcoind‘s debug log. As with any fuzz testing, you’ll typically see a mix of messages.

Sometimes bitcoind will report the received test case, and sometimes it will complain about one thing or another. Monitoring the log is a good way to confirm that test cases are being received and processed by the target.

root@fleur:~# tail -f ~/.bitcoin/regtest/debug.log 

2020-11-10T14:34:34Z connection from 172.17.0.1:56228 accepted 

2020-11-10T14:34:34Z received: version (87 bytes) peer=2088 

2020-11-10T14:34:34Z ProcessMessages(version, 87 bytes): Exception ‘CDataStream::read(): end of data: iostream error’ (NSt8ios_base7failureB5cxx11E) caught 

2020-11-10T14:34:34Z ProcessMessages(version, 87 bytes) FAILED peer=2088 

2020-11-10T14:34:34Z socket closed for peer=2088 

2020-11-10T14:34:34Z disconnecting peer=2088 

2020-11-10T14:34:34Z Cleared nodestate for peer=2088 

 

For the most effective testing, you might need to disable bitcoind’s built-in protections using the -whitelist option.

Future directions

I hope you’ve enjoyed this romp through Bitcoin protocol fuzzing using the Defensics SDK. You can see how the Defensics SDK brings the power of generational fuzzing to any type of software.

In the specific case of bitcoind, you could take this testing further as follows:

  • Improve the data models. For example, the fields inside the version message are cursorily specified. Use the protocol specification to make them more “real.”
  • Model and test other messages in the Bitcoin protocol.
  • Improve failure monitoring. For example, you could run the bitcoind target with ASAN to catch memory errors, which integrates nicely with Defensics’ Agent Instrumentation Framework.

Special thanks

Aleksis Kauppinen and Janne Ruotsalainen, from the Defensics R&D team, were kind enough to review this article and made brilliant improvements to the code.

Want to know more about fuzzing Bitcoin?

Join us in the Software Integrity Community

 
Jonathan Knudsen

Posted by

Jonathan Knudsen

Jonathan Knudsen

Jonathan Knudsen likes to break things. He has tested all kinds of software, from network infrastructure and medical devices to cryptocurrency nodes. Jonathan has worked as a developer, consultant, and author. He has published books about 2D graphics, cryptography, and Lego robots, and has written more than one hundred articles on a wide range of technical subjects.


More from Building secure software