Differential privacy

Differential privacy is a mathematical framework that limits how much any single record can influence a query result, providing a formal privacy guarantee for aggregate data releases.

What is differential privacy?

Differential privacy (DP) is a mathematical framework introduced by Cynthia Dwork and colleagues in 2006. It provides a formal guarantee that the output of a computation reveals essentially the same information regardless of whether any specific individual's record is in the input dataset. The strength of the guarantee is captured by a parameter ε (epsilon) — smaller ε means stronger privacy.

For AI training, the most common technique is DP-SGD (Differentially Private Stochastic Gradient Descent), which adds calibrated noise during training to bound how much any single training example influences the final model. Apple, Google, and the U.S. Census Bureau have shipped production DP systems.

When to ask about DP

Differential privacy is most relevant when an AI vendor offers analytics or model fine-tuning over your data — where the vendor will publish or share aggregate insights derived from sensitive inputs. It is less relevant for inference-only use cases where your prompts aren't aggregated. If a vendor claims DP, the meaningful follow-ups are: what is the ε used, is the privacy budget applied per-query or per-dataset, and is the implementation peer-reviewed or third-party audited.

Limitations to know

DP is not a silver bullet. The accuracy-privacy trade-off can be sharp at small ε. Composition of multiple DP queries consumes the budget cumulatively. Implementation details (clipping bounds, noise distribution, accounting method) are easy to get wrong; informal "anonymization" is not DP. For most AI procurement contexts, DP claims should be specific and citable rather than marketing-tier.