How Anonymization Protects Data Privacy, or Not
- De-personalization can be re-personalized
- How to assess the risks of re-identification
- 3 reasons de-personalization helps your organization, customers, and regulators
Read more below.
The Indian lawyer, activist, and political ethicist Mahatma Gandhi once said, “Whatever you do will be insignificant, but it is very important that you do it.” The same could be said of some of the measures we take to protect privacy. For example, the incremental gains from training staff in good day-to-day privacy practices can be dwarfed by the data exposed in a large data breach. Yet, privacy training can make small differences every day, especially when regulators come knocking..
Another privacy practice that falls into this category is data “de-personalization,” the name given to a range of techniques that make personal information less personal. With the powerful analytics, vast amounts of data, and computing capacity that exist in our digital world, these protections may not guarantee that an individual can’t be identified from exposed data. But it’s still important to de-personalize data wherever reasonable and possible.
Even if 100% privacy protection isn’t possible, making data harder to identify helps protect individuals and your organization.
GDPR establishes the data protection measure of “pseudonymisation,” which is defined as, “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information, as long as such additional information is kept separately and subject to technical and organisational measures to ensure non-attribution to an identified or identifiable person.” However, this concept does not account for the imaginative ways in which cybercriminals can re-personalize data.
The De-personalization Spectrum and the Re-personalization Problem
According to the IAPP, de-identification is “an action that one takes to remove identifying characteristics from data,” and anonymization is “the process in which individually identifiable data is altered in such a way that it no longer can be related back to a given individual.” Both are data protection measures, but in fact, there’s a lot more nuance to the practice of making personal information less personal.
De-identification and anonymization are at different ends of a whole spectrum of de-personalization techniques that can be employed, depending on how the data will be used. (The Future of Privacy Forum has an excellent overview of these techniques available for download here.)
For example, if data will be used for statistical research, it can be aggregated and generalized, stripping out individual information and grouping data by age groups or zip codes. If it needs to be protected while being processed by a business partner, personally-identifying information might be decrypted so that it can be re-identified after processing. For paper-based information, protections run the gamut from redacting by blacking out personal details to physically cutting out the identifying info.
The more data is anonymized, the harder it is for it to be misused. However, short of nuclear options—such as pulverizing computer disks or pulping paper records—all of these techniques still leave some potential that personal data could be exposed.
De-identified data can be re-identified if a hacker is able to steal the encryption key. With enough computing power, they might not even need the key: almost ten years ago, the FTC’s chief technologist pointed out that hashed data records can be restored to their original state by making recursive attempts to guess them until a matching hash is identified, and available computing power has increased exponentially since then.
Even anonymized data could be re-identified by correlating it with other data sets. A study on 1990 census data discovered that 87.1% of individuals in the US could be uniquely identified by their birth date, gender, and zip code. And when Netflix released movie rating data that had been anonymized by removing personal detail, researchers were able to de-anonymize the data by comparing it against publicly available ratings on the Internet Movie Database.
Assessing the Risks of Re-identification
No existing privacy regulations give an iron-clad definition of what anonymization is required to move personal information outside the scope of regulation. So, organizations have to determine for themselves how de-identified or anonymized data involved in an incident impacts their notification or reporting requirements. To make accurate assessments, the privacy and data security teams need to do their homework ahead of time, with data mapping that documents:
- What data sets are protected and the de-identification methods used for each set
- Keys or other data sets that could be used to re-identify the protected data
- External or publicly available data that might be used to re-identify the protected data
Analyzing the potential for re-identification ahead of time will speed risk assessment during incident response. If you know the encryption keys or related data sets are still secure, risks may be small enough that the incident isn’t notifiable. If the information could be re-identified using publicly available data, notification may be advisable even if the exposed information is de-identified.
As you determine whether or not to notify, be sure to document whether and how the exposed data is protected. When you consider the risks of de-identification, note those conclusions in the incident response documentation as well.
- Data cannot be used to identify an individual
- State-of-the-art encryption, key not compromised, no evidence of access
- No evidence of access, but unsure of encryption standard or key security
- Evident of unauthorized access with valid credentials
- Encryption key was compromised
- Password protected
- Password was not compromised
- Evident of unauthorized access with valid credentials
- Password was compromised
- Statistically de-identified
- No Protection measures were in place
The Importance of De-personalization
De-identifying or anonymizing information, mapping data protections, and pre-assessing the risks of re-identification can be a lot of work. And if even full-on anonymization might not completely protect personal information, why spend the time and resources to do it? Three reasons:
- To minimize notifiable incidents: As new regulations expand the definitions of protected personal information, the potential for notifiable incidents increases. Anonymized data is not personal information and is therefore not regulated, so the more data you can de-identify or anonymize, the more you reduce the risk of notifiable incidents.
- To protect data subjects: While de-personalized information could be re-identified, the more anonymized it is, the harder it will be to exploit. So de-personalization can slow down or discourage misuse.
- To protect your organization: If you have a data breach, regulators will take a more favorable view of your privacy program when you can show evidence of data protections such as anonymization or de-identification.
While de-personalizing information can’t prevent notifiable breaches, the protection it provides is still significant to the affected individuals and your organization. And regulators will consider it very important that you do it.
You might also be interested in:
Topics: Incident Response Management