The European Data Protection Board (EDPB) recently released its comprehensive Guidelines 01/2025 on Pseudonymisation, a document rich with practical insights into the application of pseudonymisation under the General Data Protection Regulation (GDPR).
While pseudonymisation is often associated with the replacement of personal identifiers, the EDPB’s guidance makes it clear that achieving effective pseudonymisation requires more than this technical step. It involves a suite of technical and organizational measures to ensure that data can no longer be attributed to individuals without additional, securely stored information. The Guideline further demonstrates the intricate relation among pseudonymization, data minimization, purpose limitation, privacy by design and default, and other GDPR requirements.
While the entire Guideline is a worthwhile read for anyone tasked with GDPR compliance in the flavor of data pseudonymization, this article highlights one particular example and fleshes out a challenge that is insufficiently featured in the Guideline, the basic and deceivingly simple step of accurately identifying and replacing personal identifiers in unstructured data.
Example: The Privacy-Preserving Dental Implant Register
Among the 10 great examples in the EDPB guidelines is one, Example 3, that showcases how a dental implant register can pseudonymise sensitive data to achieve a balance between functionality and privacy. This Register collects detailed information about dental implants to monitor quality, provide feedback to practitioners, and ensure continuity of care for patients by different practitioners. The multi-step procedure involving temporary and permanent pseudonyms as well as a designated Trust Center is impressive, demonstrating how pseudonymisation enables data to be used responsibly without compromising privacy.
However, while the guidance does a stellar job of laying out the technological and organizational measures necessary for managing lookup tables, securing data flows, and safeguarding additional information, it sidesteps a particularly thorny issue: how do you reliably identify and replace personal identifiers in the unstructured data in the first place? Unstructured data, like free-form notes, scanned medical records, or text-rich emails, is notoriously messy. Identifiers can be hidden in unexpected places, making the first step of pseudonymisation—detecting and replacing personal identifiers—far from straightforward.
What’s Going On in Example 3?
In this scenario, dentists collect patient data, including identifiers, implant details, and medical information. They send this data, tagged with temporary pseudonyms, to the Register. A Trust Centre then steps in, replacing the temporary pseudonyms with permanent ones and managing the lookup table that connects pseudonyms to patient identities. The Register, armed with this pseudonymised data, analyzes implant quality and provides aggregated feedback to practices, all while ensuring patient identities remain protected. When subsequent caregivers need access, they can request the necessary data from the Register via controlled channels.
It’s a well-orchestrated process with clear benefits, but here’s the twist: a lot of this data is likely unstructured, a difficulty the example does not surface.
The Difficulty with Unstructured Data
Unstructured data doesn’t follow neat, predefined formats. It can include patient names buried in free-text fields, operation details scattered across scanned documents, or sensitive information embedded in potentially handwritten medical notes. The guidelines don’t dive into how to tackle these messy data types, but this is where a key real-world challenge lies. Overlooking even a single identifier during pseudonymisation can weaken the process and increase the risk of non-compliance.
Private AI’s Role
This is where Private AI shines. Our technology is purpose-built to deal with unstructured data, tackling the tricky task of finding and replacing personally identifiable information (PII) with speed and precision. Here’s how we help:
- Spotting the Hidden: Advanced machine learning models detect personal data like names, health insurance numbers, and payment information, even when buried in free-text fields, handwritten notes, or multilingual documents.
- Reporting on PII: The technology can generate a detailed report on the PII contained in the data which is essential for the assessment that will guide the determination of subsequently deployed additional technical safeguards as required under GDPR. Importantly, this report can include indirect identifiers that are not supposed to be replaced with pseudonyms like medical history or diagnosis.
- Precision Pseudonymisation: Once identified, PII is removed or replaced with pseudonyms or placeholders, the first critical step towards proper pseudonymization.
What Else Needs to Be Done?
While Private AI can take the lead in identifying and replacing identifiers, achieving proper pseudonymisation requires additional complementary measures tailored to the specific context and pseudonymisation domain. Controllers must first define the pseudonymisation domain—determining who should be precluded from attributing data to individuals and assessing the risks posed by actors within and outside that domain. Subsequently, appropriate technological and organizational measures must be implemented, such as securely managing any lookup tables that link pseudonyms to identities and enforcing strict protocols for data flows. These steps, guided by the EDPB’s recommendations, are essential for ensuring that pseudonymisation effectively minimizes risks and upholds data protection principles.
Conclusion
For organizations grappling with the complexities of pseudonymisation, Private AI ensures they’re equipped to meet the challenge head-on, even when dealing with the messiest, most unstructured datasets. The EDPB Pseudonymization Guideline can then be consulted for clear insights into what else is required to meet the GDPR requirements of safeguarding personal data that has been stripped of personal identifiers.
To explore how Private AI can assist organizations with advanced data pseudonymization, explore our demo or contact us to test it on your own data.