close search bar

Sorry, not available in this language yet

close language selection

The IRS data breach: How not to do identity proofing

We discuss the shortcomings of the IRS’s Get Transcript service, especially identity proofing, and how these widespread attacks could have been prevented.

In 2015, thousands of U.S. taxpayers attempting to file a tax return discovered that a return had already been filed on their behalf with a tax refund deposited into an unknown account.  An investigation showed that a service known as Get Transcript was used to access the data of at least 700,000 individuals. That data was then used to file bogus tax returns on behalf of the victims with refunds already distributed to criminals. Here, we’ll address the shortcomings of the IRS’s Get Transcript service and discuss ways in which these widespread attacks could have been prevented.

Root cause: Identity proofing gone bad

The intended purpose of the Get Transcript service is to allow a tax payer to retrieve their own previously-filed tax returns for record keeping or amending a previous return, among other uses.

Although these tax returns contain very sensitive data, the IRS used a flawed system to determine the identity of the person requesting data through Get Transcript. The service prompted the user for the following data:

  • Requester’s name
  • Date of birth
  • Social security number
  • Taxpayer’s filing status for the previous year

Once it was determined that this information matched a taxpayer in the system, the IRS used an identity proofing service from Equifax to query the individual with four knowledge-based authentication (KBA) questions to further determine if the requester was who he or she claimed to be. Some sample KBA questions are:

  • What is the initialization date of the mortgage on the property at 123 Main Street?
  • What was your address as of January 1, 2009?
  • Who was your employer as of June 1, 2014?

The answers for these and other KBA questions can be gleaned from professional registries, biographies (to gather employment and education history), social networking sites like LinkedIn (to gather employment history) or Facebook (to gather residence and address history), real estate sites like Zillow (to gather mortgage loan balance information and initiation dates), or a data-aggregation provider like Spokeo.

Each KBA question presented four or five possible answers in a multiple choice format. The Get Transcript service locked users out for 24 hours (at most) for failing a certain number of attempts at the identity proofing questions. It is unclear how many attempts a user had before the 24-hour lockout. Some of the IRS information indicates that although 700,000 individual records were compromised, many more attempts were stopped before individual data was accessed. It appears that the attackers were blindly guessing at the answers for some individuals.

The security blog conducted an exercise in which nonpublic static data such as that used in the Equifax identity proofing routine (names, current addresses, address history, dependents names and connections, employment history, current mortgage loan information) was found for individuals on what Brian Krebs calls “underground cybercrime” sites. This data most likely comes from previous data breaches of which the victims may or may not be aware.

Clearly, the IRS should not have been relying solely on static authentication data, such as the answers to the KBA questions, to identify users of the service.

Proper risk analysis could have helped avoid this data breach

The IRS should have performed a risk analysis of the Get Transcript service to determine whether it was reasonable to issue access credentials electronically without confirming a physical or electronic address for the user. A risk analysis is recommended by the NIST-800-63 Electronic Authentication Guideline and the E-Authentication Guideline for Federal Agencies for all U.S. Government electronic transactions that involve personally identifiable data (PII).

A properly conducted risk analysis on the Get Transcript services would have discovered that the service needed Level 3 identity proofing assurance before issuing credentials. Level 3 identity proofing for non-in-person access requires verifying identity and address by confirming that a holder has a valid government ID (drivers license or passport number) matching the identity asserted by the applicant, and confirm the user address with at least one financial or utility account (e.g., checking account, savings account, utility account, loan, or credit card).

Where the record(s) indicate an electronic address (e.g., voicemail, email, or SMS), consistent with information provided by the user attempting the login, the IRS should have confirmed the ability of the user to receive messages at that address using a limited-duration (24 hours maximum, 30 minutes preferred) onetime-password mechanism.

If there were no electronic addresses indicated in the records that match the identity asserted by the user, the IRS should have used the physical address known by the IRS (e.g., the taxpayer’s last known address—not an address provided by the user) to send a onetime password with a 7-day time limit in which the user could use to set their credentials in the Get Transcript service.  A note accompanying the onetime password should have warned the individual that the Get Transcript service was being accessed and to call a specified phone number ASAP if it wasn’t their doing. It is only slightly ironic that the IRS requires physical proof—with a confirmation from a postal service agency—before they accept a physical address change.

In either case, once the user has the time-limited onetime password, they would be able to log into the Get Transcript service and set longer-term credentials.

Consider out-of-band delivery of sensitive information

In the aftermath of this data breach, the Get Transcript service has changed so that online access is no longer possible. Now, tax returns and other sensitive information are delivered to the physical address of the taxpayer. This functionality should have been considered from the beginning, based on the sensitivity of the data. Another option is to give the individual the ability to choose the protection of opting out of e-delivery of sensitive data. Other U.S. governmental agencies (e.g., the Social Security Administration) have established the ability to opt out of e-delivery services to prevent identity theft.

Lessons learned

  1. Perform a risk analysis on software systems that access sensitive data, including any third-party components. Re-analyze the system on a regular basis and after any changes to the code, dependent components, or process.
  2. Do not base security questions used for authentication on static data that can be obtained or inferred from public sources, and do not make the answers multiple choice.
  3. Incrementally extend the account lock-out time on authentication failures leading to complete lockdown of the user’s account (e.g., 24 hours after the first three failures, 48 hours on fourth failure, one week on fifth failure, and at the sixth failure, the account must be reset).
  4. When the user’s account is locked out, a reset mechanism is dependent on confirmation through a prearranged channel (e.g., user must be calling from their home phone, must be able to receive a request confirmation at a physical address on file, etc.)


  • Equifax blog post Data for KBA Questions Matters admits that their identity proofing service is inadequate if based on publicly available data.
  • Taxable Talk blog post “Life as a Second-Class Citizen” discusses the inadequacy of the identity proofing routines in the Get Transcript service.
  • Krebs On Security blog post Toward a Breach Canary for Data Brokers describes easy availability of supposedly nonpublic data for identity theft purposes.
David Harvey

Posted by

David Harvey

David Harvey

David Harvey, CISSP, is a former principal consultant with Synopsys. David evaluated agile practices, led code reviews, and trained developers on defensive coding and threat modeling. Before joining Synopsys, he worked for UnitedHealth Group, where he co-founded and led a developer-facing software security initiative. David has also worked as an architect and developer at Siemens, Boeing, and Unisys. He learned about software security when the “information superhighway” was just becoming a thing and Fortune 20 companies were starting to bear the brunt of unsanitary design and coding practices that had been de rigueur in the old, segregated, safe “client-server” or “feudal” deployment models.

More from Security news and research