An overview of GDPR-compliant data privacy techniques


The General Data Protection Regulation (GDPR)

The General Data Protection Regulation (GDPR) requires that every organization or business protect and keep private all the personal data they collect from users, process or store. Specifically, GDPR protects the users by giving them a better say on their personal data while promoting transparency on how organizations collect and use private information.

Although this regulation applies to all the countries in the European Union, any organization that collects personal data from the EU citizens must also comply, even if based outside the region.

GDPR_Graphic.jpg
 

The benefits of enhancing data privacy

Guaranteeing data privacy has other benefits beyond complying with regulations. It also provides an opportunity to improve the organization’s processes.

The main benefits include:

  • Improving security, safety, and privacy of personal data

  • Ensuring compliance with GDPR, HIPAA, PCI, and other regulatory requirements

  • Maintain good reputation

  • Gaining users trust

  • GDPR provides an opportunity to improve data privacy


The organizational challenges of GDPR compliance

An organization that collects, handles, stores or processes private data face a wide range of challenges as they try to comply with GDPR. Basically, complying is an ongoing process that depends on a variety of other factors including changes in business operations, third party service providers, and more.

The main challenges include:

MEETING SEVERAL NEW REQUIREMENTS

One of the main challenges, especially when collecting data from users is meeting the stringent GDPR rules of getting a visitor’s consent. It takes a lot of effort, money, time and other resources to plan and implement effective measures. In addition to a dedicated team, the organization needs to have the right tools, budget, strategies, and processes.

BROADENING THE PERSONAL DATA RANGE

In addition to the names, age, identification numbers, and other individual’s details, the personal data under GDPR cover items such as cookies and other identifiers that web analytics rely on. Another challenge how to handle the data the organization may have collected or was storing using non-compliance techniques prior to the GDPR becoming mandatory.

LIMITATIONS OF TRADITIONAL DATA PRIVACY PRESERVATION METHODS

Unfortunately, the traditional privacy models are inadequate to safeguard the personal data in compliance with GDPR due to the large volumes, wide variety and fast speeds at which the data grows.

Despite these challenges, the use of privacy techniques that makes the data anonymous can help the organizations process the data without violating the rules, expose or put personal data at risk.

By removing the personal aspect from the data, organizations do not have to request consent from the users and can use or keep the data any way they want without violating the rules.


Recommended GDPR-compliant data privacy techniques 

There are several privacy techniques that organizations can use. Even though GDPR does not dictate which technique an organization should use, it highly recommends anonymization and / or pseudonymization. These two privacy enhancement techniques that usually remove, or modify user identification information, makes it impossible to trace back the information to the individual or reverse the process to produce the original data.

Data anonymization modifies parts of the data to make it anonymous while pseudonymization is a technique that substitutes identifiable data to hide the details about individuals.


Data Anonymization

Anonymization removes all the direct and indirect personal data identifiers such as: an individual's full or family name, address, sex, age, religion, medical records, job, location, credit card numbers, email address, etc.

Basically, the technique changes, hides or removes some key parts of the personal data such that it is impossible to retrieve or access any private or sensitive information.  Anonymization is ideal for the healthcare, financial, media and other industries that collect or process sensitive information.

Benefits:

  • Removes the need to get consent from users before you process their data

  • Makes the data completely anonymous

  • No limit as to how long you can keep data

  • Share data without infringing on privacy

  • Can use data in any way since it does not contain personal identifiers

  • More secure

Drawbacks:

  • Expensive

  • Complex

  • Requires more resources

Examples:

NOISE ADDITION

This is a process of introducing random characters without making significant statistical changes to the data. This removes the identifying information while preserving the process-ability of the data. Generally, the process corrupts the data hence making it anonymous. Ideally, automating the process makes it truly random and impossible to reverse engineer.

For example, the process can add or subtract a random number of between say 10 and 20 to the individual’s age. The technique then deletes name records hence removing the personal details. Noise addition allows the company to establish the average ages of the users but make it impossible for outsiders to establish the real dates of birth.

SUBSTITUTION / PERMUTATION

Substitution involves replacing or overwriting the personal data, such as names, age and others with random values from an algorithm or a made-up data table. The process retains the original data structure but with fake values that have no relationship with the real personal data details. This makes it suitable for analysis. However, there is the possibility of missing out on some personal data fields.

AGGREGATION

The data aggregation is a technique that analyzes the personal data and then rendering it into a summary or statistical form while removing the personal identifiers. This reduces the privacy risk by anonymizing the personal data identifier such as the user’s IP addresses while still fulfilling an application’s requirement such providing a general location for the users. This is only applicable when there is a need to gather summary or statistical information about a group of people.

GENERALIZATION

This replaces all individual values in a field, such as the age, with a broader category such as a range. For example, if the age of a user is 35, the process may replace it with a general expression such as '30 < Age ≤ 40. The irreversible process modifies the data permanently hence making it impossible to neither attribute it back to a specific individual nor restore it to its original form.

K-ANONYMITY

Combining the masking and generalization leads to the k-anonymity which ensures privacy while still retaining the usefulness of the data for analysis purposes.


 Data pseudonymization

Pseudonymization is a technique that substitutes identifiable data with a fake, reversible value. It makes it impossible to identify an individual without additional information. The basic process involves replacing the personal data with a pseudonym or a placeholder. The pseudonym can be a token that helps in retrieving the original information.

Applicable personal data includes device type or ID, browser, time zone, IP address, login details, plug-in details, name, account numbers, credit card number, cookies, etc. Generally, Pseudonymization limits the exposure of personal information by creating an obfuscated dataset that is still realistic.

Benefits:

  • Removing the sensitive data

  • It removes the direct identifiers such as the user’s name; credit card numbers, contact information hence reduce the risk of a data breach or theft.

  • Enable re-use of data in other processes

  • A business can legally use the pseudonymized data in other processes beyond original collection purposes.

  • Flexible

Drawbacks:

  • The process is reversible hence less secure

  • Risky

Examples:

DATA MASKING

This replaces certain parts of the personal data with random characters or unrelated information that makes it harder to read the original data. Usually, the process substitutes some of the letters or digits with random values but produces a realistic dataset that requires additional information to re-identify or reverse. For example:  the process may store a credit card number “5500 0000 0000 0008” as “XXXX XXXX XXXX 0008”.

The masking preserves the original data format hence prevent application errors. However, the random characters may lead to the loss of analytical or statistical value of the data.

ENCRYPTION

This is an irreversible process that alters the data to make it unreadable unless there is a matching and valid decryption key. To comply with GDPR, the encrypted data (pseudonymized data) must be in a separate location with the decryption key which in this case is the additional information.

Drawbacks include the ability to see the personal data as long as one has the appropriate key. This makes the personal data vulnerable and exposed to attacks. In addition, the encryption process places a significant load on the computer resources.

A better solution would be to use the homomorphic encryption which allows reading the information inside the encrypted message without decrypting or removing the protective layer. However, the technology is very slow compared to the normal encryption and therefore challenge that organizations need to tackle first.

DATA TOKENIZATION

Tokenization involves replacing some parts of personal data with a unique token. This stands in for the original data such as the credit card number. For example, when transacting online, instead of sending credit card numbers through the internet, the payment systems usually send a unique token, hence protecting the user identity, credit card number or possible fraud.

SCRAMBLING / SHUFFLING

The scrambling or shuffling substitutes or mixes parts of the personal data fields in order to create an anonymized dataset. For example, the word Edward may become Derdaw. This retains the original data structure and hence suitable for data analysis. However, it is a reversible process and attackers in addition to the risk of deciphering the shuffling algorithm.

DATA BLURRING

This approximates the data values to make them meaningless or impossible to identify an individual.


How to choose the right GDPR privacy solution for you

Each organization or business has its unique requirements that should dictate what type of data privacy techniques to apply. Generally, one technique is not enough and the best practice is to combine several of them to make it almost impossible to re-identify or reverse the de-identification process.

While GDPR recommends certain techniques such as pseudonymization and anonymization, organizations are free to choose what works best for them and this largely deepens on a wide range of factors, including: organization and data structures, processes, type of data and processes, etc.

Generally, anonymization is more secure but difficult to implement. Pseudonymization is easy to implement and is flexible, but less secure. The data technique depends largely on the data types and the impact assessment.

In addition to the privacy techniques, it is essential to deploy effective security systems to protect the applications and data against internal and external threats, malicious activities and viruses.


GDPR best practices

With GDPR, organizations have the responsibility of ensuring that they collect, store and collect data legally without exposing personal information about the individuals. Noncompliance may lead to heavy financial and reputation penalties.

To ensure compliance at all times, the organizations need to:

  • Understand the type of sensitive information and the effective GDPD compliant privacy technique to apply

  • Ensure that data remains valuable even after the anonymization, pseudonymization or undergoing through another technique

  • Perform regular penetration testing to verify that the data privacy measures are working, whether it is possible to reverse engineer the anonymization processes, or identify the individuals using lawfully obtained public data, single or multiple private facts

  • Combine several techniques in order to prevent data re-identification or risks in case of a process reversal

  • Maintain GDPR compliance by performing regular data flow audits, reviewing the business processes and technologies, and creating awareness across the entire organization including third party providers


A successful implementation of GDPR compliant privacy techniques

Apple gains insight into user experience without collecting personal data.

Apple uses differential privacy technique to help them collect information from their users while still preserving the private data. This enables them to gain insights into what the users are doing, the words that are trending, most popular emojis and other issues without learning about the individual’s personal details.

Since these may require obtaining personal data such as the type of devices, keyboards, operating system and more, the company must apply techniques that hide the user identity.

To comply with GDPR, Apple combines techniques such as the differential privacy, scrambling, aggregation, and others to remove all the private information. This enables them to learn about user activities without collecting personal data.

The techniques convert the information before it leaves the user’s device in a manner that it is impossible to reproduce the original data, see the personal data or identify an individual.


Conclusion

There are several data privacy techniques that organizations can use to comply with GDPR. It is important to take time and determine the best techniques based on the organization’s unique security and privacy requirements, as well as the complexity of the data.

About SID

SID - SIGHTLINE INNOVATION DATA TRUST

Our latest product is a smart-contract platform to secure and monetize your data and the latest addition to our family of AI products.