Pseudonymization

Pseudonymization replaces direct identifiers in personal data with reversible aliases. Under GDPR it reduces some obligations but does not remove the data from scope entirely.

What is pseudonymization?

Pseudonymization is the processing of personal data such that the data can no longer be attributed to a specific data subject without additional information, provided that additional information is kept separately and subject to technical and organizational measures. The technique replaces direct identifiers with stable aliases; the mapping between alias and identifier is held separately under stricter access controls.

It is explicitly defined and encouraged by GDPR (Recital 28, Articles 6, 25, 32, 89). However, pseudonymized data remains personal data — pseudonymization reduces risk and may unlock some flexibilities (e.g. purpose-compatibility analysis) but does NOT take the data out of GDPR scope.

Pseudonymization vs anonymization vs de-identification

Three terms commonly confused. Anonymization renders re-identification impossible (or essentially so) and removes data from GDPR scope. Pseudonymization reverses to identifiers via a separately-held key. De-identification is a US/HIPAA term covering both methods plus weaker variants like Safe Harbor (see /learn/de-identification). The legal consequences differ — pseudonymized data is regulated, anonymized data largely is not — and informal use of the words is rarely safe in compliance contexts.

For AI vendors

Pseudonymization is most useful in the AI context when training data must retain reference structure (so a user appears as the same person across multiple records) without retaining the actual identity. Ask: where is the pseudonymization mapping table held, who has access, what is the rotation policy on the aliases, and is the mapping itself out of scope when responding to data subject deletion requests. Several mature healthcare AI vendors offer pseudonymization as a deployment option specifically to reduce the BAA-required footprint.