Software Integrity

 

The IRS data breach: How not to do identity proofing

In 2015, thousands of U.S. taxpayers attempting to file a tax return discovered that a return had already been filed on their behalf with a tax refund deposited into an unknown account.  An investigation showed that a service known as Get Transcript was used to access the data of at least 700,000 individuals. That data was then used to file bogus tax returns on behalf of the victims with refunds already distributed to criminals. Here, we’ll address the shortcomings of the IRS’s Get Transcript service and discuss ways in which these widespread attacks could have been prevented.

Root cause: Identity proofing gone bad

The intended purpose of the Get Transcript service is to allow a tax payer to retrieve their own previously-filed tax returns for record keeping or amending a previous return, among other uses.

Although these tax returns contain very sensitive data, the IRS used a flawed system to determine the identity of the person requesting data through Get Transcript. The service prompted the user for the following data:

  • Requestor’s name
  • Date of birth
  • Social security number
  • Taxpayer’s filing status for the previous year

Once it was determined that this information matched a taxpayer in the system, the IRS used an identity proofing service from Equifax to query the individual with four knowledge-based-authentication (KBA) questions to further determine if the requestor was who he or she claimed to be. Some sample KBA questions are:

  • What is the initialization date of the mortgage on the property at 123 Main Street?
  • What was your address as of January 1st 2009?
  • Who was your employer as of June 1st 2014?

The answers for these and other KBA questions can be gleaned from professional registries, biographies (to gather employment and education history), social networking sites like LinkedIn (to gather employment history) or Facebook (to gather residence and address history), real estate sites like Zillow (to gather mortgage loan balance information and initiation dates), or a data-aggregation provider like Spokeo.

Each KBA question presented four or five possible answers in a multiple choice format.  The Get Transcript service locked users out for 24 hours (at most) for failing a certain number of attempts at the identity proofing questions. It is unclear how many attempts a user had before the 24-hour lockout. Some of the IRS information indicates that although 700,000 individual records were compromised, many more attempts were stopped before individual data was accessed. It appears that the attackers were blindly guessing at the answers for some individuals.

The security blog krebsonsecurity.com conducted an exercise in which non-public static data such as that used in the Equifax identity proofing routine (names, current addresses, address history, dependents names and connections, employment history, current mortgage loan information) was found for individuals on what Brian Krebs calls ‘underground cybercrime’ sites. This data most likely comes from previous data breaches of which the victims may or may not be aware.

Clearly, the IRS should not have been relying solely on static authentication data, such as the answers to the KBA questions, to identify users of the service.
[tweetthis]The 2015 IRS data breach affected at least 700,000 tax payers. Here’s what went wrong. [/tweetthis]

Proper risk analysis could have helped avoid this data breach

The IRS should have performed a risk analysis of the Get Transcript service to determine whether it’s reasonable to issue access credentials electronically without confirming a physical or electronic address for the user. A risk analysis is recommended by the NIST-800-63 Electronic Authentication Guideline and the E-Authentication Guideline for Federal Agencies for all U.S. Government electronic transactions that involve personally identifiable data (PII).

A properly conducted risk analysis on the Get Transcript services would have discovered that the service needed level three identity proofing assurance before issuing credentials. Level three identity proofing for non-in-person access requires verifying identity and address by confirming that a holder has a valid government ID (drivers license or passport number) matching the identity asserted by the applicant, and confirm the user address with at least one financial or utility account (e.g., checking account, savings account, utility account, loan, or credit card).

Where the record(s) indicate an electronic address (e.g. voicemail, email, or SMS), consistent with information provided by the user attempting the login, the IRS should have confirmed the ability of the user to receive messages at that address using a limited-duration (24 hours maximum, 30 minutes preferred) one-time password mechanism.

If there were no electronic addresses indicated in the records that match the identity asserted by the user, the IRS should have used the physical address known by the IRS (e.g. the taxpayer’s last known address—not an address provided by the user) to send a one-time password with a 7-day time limit in which the user could use to set their credentials in the Get Transcript service.  A note accompanying the one-time-password should have warned the individual that the Get Transcript service was being accessed and to call a specified phone number a.s.a.p. if it wasn’t their doing. It is only slightly ironic that the IRS requires physical proof—with a confirmation from a postal service agency—before they accept a physical address change.

In either case, once the user has the time-limited one-time password, they would be able to log into the Get Transcript service and set longer-term credentials.

Consider out-of-band delivery of sensitive information

In the aftermath of this data breach, the Get Transcript service has changed so that online access is no longer possible. Now, tax returns and other sensitive information are delivered to the physical address of the taxpayer. This functionality should have been considered from the beginning, based on the sensitivity of the data. Another option is to give the individual the ability to choose the protection of opting out of e-delivery of sensitive data. Other U.S. governmental agencies (e.g. the Social Security Administration) have established the ability to opt out of e-delivery services to prevent identity theft.

Lessons learned

  1. Perform a risk analysis on software systems that access sensitive data, including any third-party components. Re-analyze the system on a regular basis and after any changes to the code, dependent components, or process.
  2. Do not base security questions used for authentication on static data that can be obtained or inferred from public sources, and do not make the answers multiple choice.
  3. Incrementally extend the account lock-out time on authentication failures leading to complete lock down of the user’s account (e.g. 24 hours after the first three failures, 48 hours on fourth failure, one week on fifth failure, and at the sixth failure the account must be reset).
  4. When the user’s account is locked out, a reset mechanism is dependent on confirmation through a pre-arranged channel (e.g. user must be calling from their home phone, must be able to receive a request confirmation at a physical address on file, etc.)

References