Software Integrity


Business logic: High frequency trading’s security lessons

Associated Press’s Twitter feed was hacked a posted tweet indicated that the president was injured in an explosion. The market momentarily lost $136 billion (*).

This event is instructive to security folk.  Building security in requires understanding it as an emergent property (let’s avoid the often misused term “business logic flaw”). I spent significant time as a dev lead helping build a trading system and its middleware and dealt with some of these very problems.

First, I built middleware, focusing on performance and scalability. When I took over the core trading engine code this focus remained. There were also a raft of business rules governing matching and trade and the engine needed to support applying these rules rapidly (two orders of magnitude faster than the previous generation system). The organization assigned me a business analyst from the get-go. He immediately levied what were, in his mind, functional requirements on me (these were not functional reqs… but whatever). Read them with an eye to security and think about “misuse cases”. He said, “We need the system to:…”

  • Conduct match logic based on risk criteria
  • Straightforward match must occur at XXX,XXX/sec
  • Contentious match must occur at XX,XXX/sec
  • Suspect match may occur at only X,XXX/sec

Quantitative performance goals: super. Next was to define “contentious” and “suspect”.

Examples of contentious behavior included:

  • 80% of traffic occurs on 20% of the instruments. Extremely high (or low instrument) volume relative to these normal targets.
  • Highest issue match / trade volume was beginning and end of day. Mid-day peak volumes qualify as contentious.
  • High issue volume from a non-market maker.
  • High volume of reversals on a single instrument.

Suspect behavior included

  • Issue price = $0
  • Issue price +- 10% over current issue price
  • Issue price w/ non-matching issuer (I dropped these in the middleware)
  • Invalid pricing (e.g. negative values, again dropped in my middleware)

These lists don’t represent an exhaustive list. In each case, however, the system re-routed matching and trade flow from highly optimized logic through addition checks, logging, notification, and clearing. Notification occurred “at boundary”. That is, before the message was sequenced and before matching logic was applied. This allowed the trading firms’ pricing engines a chance to “do over”. I found that to be one of the more clever requirements levied upon me.

In terms of the high-frequency trading crash, I contend that the system:

  1. Supports high throughput requirements (aka “high frequency”… For the bond market 13 years ago)
  2. Throttles throughput when it perceives things going wild
  3. Responds using a risk-based approach by default
  4. Helps prevent large (and automated) price swings

Building security in means not only preventing garden variety vulnerabilities but also accounting for the kind of misuse/abuses appropriate to a system’s business functions. Functionality accounting for (and a safety net protecting against) misuse is a key aspect of building security in to software. Failing to exhibit these emergent properties means the system is more rigid and may go “upside-down” under attack. So called ‘Black Swan’ events occurring during otherwise “normal use” may have profoundly negative effects as we saw in the AP example.

Think about misuse/abuse and black swan events as you build user stories. How might these situations manifest as attack vectors in the system’s threat model? Design the system to detect, prevent, or prove resilient against these problems and it will be more secure.

In the case of the Twitter/High-frequency trading event, should the exchange itself have resisted this “attack” ultimately originating from twitter through [at least one] trading firm? Should the trading firms’ systems? I’d argue that each had a role to play.

(it just occurred to me that maybe I can’t write a “quick ‘un” blog entry)


* () Special thanks to Sammy for the link: