The worst data privacy threat today isn’t data breach but data abuse. Organizations are using AI to learn more about us than we’d ever choose to tell them.
The original version of this post was published in Forbes.
You have to have a supreme sense of irony, or be in major denial, to call Monday, Jan. 28, Data Privacy Day.
Given the current state of big data collection and “sharing” (selling) by online giants and telcos, combined with the power of artificial intelligence (AI) and machine learning (ML) to draw intrusive inferences about us from all that data, it would be much more accurate to call it Privacy is Dead Day. Or Lack of Privacy Day.
Because the modern threats to privacy are not just that your credit card or bank account could get compromised. It is that your life—everything about your life—can be collected and analyzed by companies, governments, groups and even individuals in a way that, collectively, starts to sound very much like Big Brother.
“All the inferences about my pay, sexual preferences, friends, political affiliations, travel plans, vacation thoughts, infidelities, and so on and on and on are way more invasive than the standard name, Social Security number, credit card number, etc. invasions,” said Sammy Migues, principal scientist at Synopsys.
Yes, we’ve been talking about big data, AI and ML for more than a decade but, as Migues noted, they are now at a scale that “many companies, government agencies, private investigators, nosy neighbors, and others can ‘dox’ (broadcast private and/or identifiable information about an individual or organization) almost anyone.”
Andrew Burt, chief privacy officer and legal engineer at Immuta, made much the same point recently in the Harvard Business Review.
While “unauthorized access to our data used to pose the biggest danger to our digital selves,” he wrote, the biggest risk now is the threat of “unintended inferences” made possible by ML.
Actually, most of the inferences he mentioned don’t seem unintended. He wrote that “researchers used machine learning techniques to identify authorship of written text based simply on patterns in language.”
Which was obviously intended. And, of course, the same techniques could be used, as Migues said, to get accurate inferences about political leanings and much more intimate details about things like health, habits and preferences.
Rebecca Herold, CEO of The Privacy Professor, said “unintended” generally means that analysis done for one reason leads to others that weren’t necessarily anticipated. “It is like diving for oysters or clams to gather some food, but then discovering that they have pearls inside,” she said. “Once you discover this, you may want to go diving for more oysters and clams to find pearls instead of food.”
Burt wrote that the result of all this means that “privacy and security are converging.” But at the moment, it’s pretty clear we have neither. Just about every day brings word of another major data breach due to a lack of security, and/or of previously undisclosed privacy violations.
Which raises the obvious question on a day meant to promote personal privacy: Is it even possible to reverse that reality?
At one level, it looks like it might be. There are increasing, and increasingly vocal, demands to rein in what for years now has been termed a “golden age of surveillance.” Enough so that it might seem like you could call this coming Monday Reclaim Our Privacy Day.
Indeed, the outrage extends well beyond last year’s Facebook/Cambridge Analytica scandal and the somewhat cathartic theater of every TV news channel showing Facebook CEO Mark Zuckerberg being figuratively frog-marched before congressional committees for some rhetorical flogging after the social media giant sold data on 87 million or so of its users to the now-defunct British data analytics company.
Reportedly, the Federal Trade Commission (FTC) is getting ready to hit Facebook with a record-setting fine—more than the $22.5 million it levied on Google for privacy violations.
But in just the past few weeks, there has been another little explosion of demands that companies stop collecting data on people without their awareness or permission.
Apple CEO Tim Cook declared earlier this month in an op-ed in Time magazine that the problem is “solvable,” that it is possible to “strip identifying information from customer data or avoid collecting it in the first place,” to let users know what data is being collected about them and why, to give them access to that data, to correct or delete it, and to make sure what data is held is secure.
A recent investigation by Motherboard showed that telcos, including T-Mobile, AT&T and Sprint, were selling location data on their customers that could end up in the hands of private entities “ranging from car salesmen and property managers to bail bondsmen and bounty hunters.”
That prompted a blistering letter from Sen. Ron Wyden, D-Ore., to T-Mobile CEO John Legere over what Wyden called Legere’s “continued partnership with companies that have enabled spying on Americans without their knowledge or consent,” adding that this was in “direct contradiction” to what Legere had told him six months earlier.
Wyden also filed, last November, a “discussion draft” of his proposed Consumer Data Protection Act (CDPA) of 2018.
The European Union’s General Data Protection Regulation (GDPR), which took effect last May, threatened (and has started to impose) major fines on companies that violate the privacy of their customers or users.
Even regular people seem to care more about their privacy now than they have in the past. A survey conducted last July by software analytics company SAS found that 66 percent of respondents said they have taken steps to secure their data, and 73 percent said they are more concerned about their data privacy now than they were a few years ago.
And John Verdi, vice president of policy at the Future of Privacy Forum, said privacy and security are converging in two ways.
“First, companies’ internal security and privacy compliance teams are increasingly cooperating. Second, privacy requirements are slowly becoming more concrete, which means they can be more readily operationalized in ways that are familiar to security experts and cybersecurity frameworks,” he said.
Verdi added that there is greater of a difference between simple “unauthorized access” to data and the collection, sharing and use of data that can be used to make decisions about a person, such as whether to admit them to a school or allow them to rent an apartment.
Still, awareness or even good intentions do not necessarily yield results. Herold said the increasing power of ML and AI analytics makes it harder and harder to anonymize data.
And while she and others say it is worth the effort, even marginal progress looks like a very heavy lift.
“The fact is, even if an organization can create what they believe is an anonymized dataset, as soon as that dataset is combined with other datasets, with other types of data items, and ML, big data analytics and AI are used on them, then new, unexpected discoveries of specific individuals, and insights into those individuals’ lives, can be discovered,” she said.
Migues said the kind of “convergence” of security and privacy that would be necessary to protect people from the unending “unintended inferences” made possible by current technology is “the impossible kind.”
“It is not possible to stop my ISP from selling my surfing habits to my healthcare company, which combines it with my vacation data, my neighborhood data, and so on to know things about me that I would never tell them and that they possibly have no legal right to ask,” he said.
“It would take something stricter than GDPR to make a dent, and even then mergers and acquisitions would result in companies knowing more than I intended them to know.”
“In addition, no human can reason about the inferences they allow by telling X to Facebook, Y to Twitter, and Z to WhatsApp,” Migues said. “It’s too hard of a problem. Imagine a tool that calculates for you that if you use Facebook, Twitter, and WhatsApp, and have a Nest thermostat, a Ring doorbell, a toll road auto-pay thing in your car, a Fitbit, and a smart electrical meter on your house, then your ISP absolutely knows beyond any shadow of a doubt when you’re having an affair and who you’re having it with, including the exact make and model of the ‘toys’ involved in the tryst.”
Or, as Wyden put it regarding the sale of location data by telcos, “When stalkers, spies, and predators know when a woman is alone, or when a home is empty, or where a White House official stops after work, the possibilities for abuse are endless.”