Practical Pseudonymization Techniques for GDPR Compliance
A technical guide to pseudonymization techniques that reduce data protection risk while preserving data utility.
What Is Pseudonymization?
Pseudonymization is the processing of personal data so that it can no longer be attributed to a specific individual without the use of additional information. That additional information must be kept separately and protected by technical and organizational measures.
GDPR explicitly encourages pseudonymization as a data protection safeguard (Article 25 and Recital 28). Pseudonymized data is still personal data under GDPR -- unlike truly anonymized data -- but it benefits from a reduced risk profile and more flexible processing options.
Pseudonymization vs. Anonymization
| Feature | Pseudonymization | Anonymization |
|---|---|---|
| Reversible | Yes, with additional information | No (irreversible) |
| Still personal data under GDPR | Yes | No |
| GDPR applies | Yes, but with benefits | No |
| Data utility | High (can be re-linked) | Variable (may lose granularity) |
| Suitable for analytics | Yes | Yes, but limited |
| Suitable for individual-level processing | Yes (when re-linked) | No |
The key distinction: pseudonymized data can be re-identified using the separately stored mapping, while truly anonymized data cannot be re-identified by any reasonably available means.
Benefits of Pseudonymization Under GDPR
- Risk reduction: Pseudonymized data poses less risk if breached, as the attacker cannot immediately identify individuals
- Broader processing grounds: Recital 29 suggests pseudonymization can facilitate processing beyond the original purpose under certain conditions
- DPIA mitigation: Pseudonymization is recognized as a risk mitigation measure in Data Protection Impact Assessments
- Breach notification: A breach involving pseudonymized data may not require individual notification if the data is practically unintelligible to unauthorized parties
- Data minimization: Pseudonymization supports the data minimization principle by reducing the identifiability of data in systems that do not need to identify individuals
Pseudonymization Techniques
1. Token Replacement (Tokenization)
Replace identifying values with randomly generated tokens. Maintain a separate lookup table that maps tokens back to original values.
How it works:
- Original:
john.smith@email.combecomesTKN-8f3a2b1c - The mapping
TKN-8f3a2b1c -> john.smith@email.comis stored in a secured, separate system
Best for: Structured data in databases where you need to maintain referential integrity across tables
Considerations:
- The token mapping table is the critical asset -- it must be secured with the highest controls
- Tokens should be random, not derived from the original data
- One-to-one mapping preserves uniqueness for joins and analytics
2. Hashing
Apply a cryptographic hash function to identifying values to produce a fixed-length output.
How it works:
- Original:
john.smith@email.com - SHA-256 hash:
e3b0c44298fc1c149afbf4c8996fb924...
Best for: Scenarios where you need consistent pseudonymization (same input always produces same output) without needing to reverse it
Considerations:
- Hashing alone is vulnerable to rainbow table attacks, especially for low-entropy inputs like email addresses
- Always use a secret salt or keyed hash (HMAC) to prevent reversal through brute force
- Hashing is deterministic, which enables linking records across datasets -- this can be a benefit or a risk depending on context
3. Keyed Hashing (HMAC)
A more secure variant of hashing that incorporates a secret key.
How it works:
- HMAC-SHA256 with a secret key produces a pseudonym that cannot be reversed without the key
- Different keys produce different pseudonyms for the same input
Best for: Cross-dataset linkage where you control the key, research scenarios, analytics pipelines
Considerations:
- The secret key must be managed with the same rigor as an encryption key
- Rotating the key requires re-pseudonymizing all affected data
4. Format-Preserving Encryption (FPE)
Encrypt data while preserving the format and length of the original value.
How it works:
- Original credit card:
4532-1234-5678-9012 - FPE output:
8271-6543-2109-3847
Best for: Legacy systems that validate data formats (credit card numbers, phone numbers, postal codes)
Considerations:
- Uses approved algorithms like FF1 or FF3-1 (NIST SP 800-38G)
- Reversible with the encryption key
- Preserves format constraints, which is valuable for system compatibility
5. Data Masking
Replace parts of a data value with placeholder characters while retaining some original information.
How it works:
- Original email:
john.smith@email.combecomesj***@email.com - Original phone:
+44 7700 900123becomes+44 7700 ***123
Best for: Display purposes, customer service screens, reports where partial identification is sufficient
Considerations:
- Static masking permanently replaces data; dynamic masking applies at query time
- Partial masking may not be sufficient pseudonymization if the remaining visible data allows re-identification
- Dynamic masking requires database or application-level support
6. Generalization
Replace specific values with broader categories.
How it works:
- Age
34becomes age range30-39 - Postal code
EC2A 4NEbecomesEC2A - Date of birth
1991-03-15becomes1991
Best for: Analytics and statistical processing where exact values are not needed
Considerations:
- Reduces data utility with each level of generalization
- May not qualify as pseudonymization on its own if the generalized data is still identifying in context
- Often used in combination with other techniques
Implementation Architecture
Separation of Mapping Data
The mapping between pseudonyms and original identifiers is the most sensitive component. Protect it with:
- Storage in a separate, access-controlled system
- Encryption at rest with customer-managed keys
- Strict access controls (minimal number of authorized users)
- Comprehensive audit logging of all access
- Geographic separation from the pseudonymized data where practical
Pseudonymization Service Pattern
Build a centralized pseudonymization service that:
- Accepts original identifiers and returns pseudonyms
- Maintains the mapping securely
- Supports reverse lookup for authorized re-identification
- Enforces access policies based on the requester's role and purpose
- Logs all pseudonymization and re-identification operations
Key Rotation and Re-Pseudonymization
For techniques that use keys (HMAC, FPE), establish a key rotation schedule:
- Rotate keys annually or upon suspected compromise
- Re-pseudonymize affected data with the new key
- Securely destroy old keys after re-pseudonymization is complete
Choosing the Right Technique
| Scenario | Recommended Technique |
|---|---|
| Database records with cross-table references | Tokenization |
| Analytics pipeline without need for re-identification | Keyed hashing (HMAC) |
| Legacy systems requiring format compatibility | Format-preserving encryption |
| Customer service screens | Dynamic data masking |
| Statistical reporting | Generalization |
| Research datasets | Combination of generalization + keyed hashing |
Pseudonymization and Data Residency
Pseudonymization becomes especially powerful when combined with data residency controls. By pseudonymizing personal data before it leaves a jurisdiction, you can process it in other regions while the re-identification mapping stays within the original jurisdiction.
GlobalDataShield's region-specific hosting complements pseudonymization strategies by ensuring that the sensitive mapping data -- the keys to re-identification -- remains within the geographic boundaries you define, while pseudonymized data can be used more flexibly for analytics and processing across your infrastructure.
Ready to Solve Data Residency?
Get started with GlobalDataShield - compliant document hosting, ready when you are.