Privacy law has a definitions problem. One of the most essential but most difficult definitions at the center of data privacy statutes is ‘Personally Identifiable Information,’ also known as ‘Personal Information’ or ‘PII.’
All of the 50 state data breach notification laws define PII as a combination of column A-identifiers, like name and address, and column B-identifiers such as an exposed account number. These laws attempt to combat identity theft, and so these two categories of data, taken together, are crucial in that effort. Similarly, HIPAA covers a column A identifier with a column B health care information item. This method is easy to enforce and works well for tightly specified protection schemes.
The GCPR and CCPA are considerably more ambitious in their goals to protect vast swaths of information. Their definitions of protectable data are much broader and squishier than earlier, more narrow laws. And we are likely to devote a couple of blog posts on these definition problems, but today we are addressing a single aspect to shine light on the broader subject – erased or de-identified data.
Removing PII from a database of sensitive personal records does not erase the threat to your privacy. Studies indicate that it’s becoming easier to re-identify ‘lost’ or ‘masked’ information, lending more credence to what WBD Partner Ted Claypoole wrote in Privacy In The Age of Big Data: “loss of anonymity is rapidly increasing and the basic loss of ability to keep secrets is in jeopardy.”
Anonymized data-“data rendered anonymous in such a way that the data subject is not or no longer identifiable”-is not considered “personal data” for purposes of the EU General Data Protection Regulation (GDPR) and is therefore not subject to the obligations of the law. The threshold to reach the exclusion for anonymization under EU data protection law is very high In the GDPR definition, data can only be considered anonymous if re-identification is impossible.
Similarly, the CCPA defines de-identification as information that cannot reasonably identify, relate to, describe, be capable of being associated with, or be linked, directly or indirectly to a particular consumer. Subsection 1798.145(a)(5), of the statute states that nothing in the CCPA should restrict a business’s ability to “collect, use, retain, sell, or disclose consumer information that is deidentified or in the aggregate.” Like the GDPR, the CCPA makes the standard for anonymization or de-identification appropriately difficult to meet. And this definition may have serious consequences.
These two laws reflect a harsh reality: reaching true anonymization is nearly impossible. A recent research report indicates that numerous supposedly anonymous datasets have recently been released and re-identified. A different study showed that four spatiotemporal data points being enough to re-identify 90% of credit card data to unique individuals. We emit spatiotemporal data points constantly, and they’re exposed through many vehicles including our smartphones, internet browsers, cars, and credit/debit cards. The dangers of location data have even led New York City lawmakers to consider its ban.
Prohibitions on data location sharing would not resolve this problem. Spatiotemporal data points can also be found through much more easily accessible means, such as user generated public content. Research out of Arizona State University discusses the manner in which social media users provide location data by discussing their vacation plans publicly, or where they shopped or dined and at what times. Cross referencing social media with de-identified data sets could immediately provide enough information to re-identify what was considered anonymous data.
As researchers make clear – – all they need is access to a few points of information and an internet browser to re-identify you. Not only will it be difficult for companies to remove data from regulation when we know that adding a couple more common pieces of data can reidentify the subject again, but trying to enforce such standards will be nightmarish, digging into the depths of a company’s data mines, analytics and collection procedures.
There’s no such thing as anonymity in our brave new world. Trying to bring it back may be a fools’ errand.