Data residency
Data residency is the geographic location where data is stored at rest. For AI vendors, this includes prompts, model outputs, training data, and backups.
What is data residency?
Data residency refers to the physical or logical geographic location where data is stored at rest. It is distinct from data sovereignty (which jurisdiction's laws govern the data) though the two are often correlated. For AI vendors, the relevant residency questions cover all data classes: prompt inputs, model outputs, training-data caches, system logs, backups, and any analytics data flowing through observability tools.
Why it matters for AI buyers
Data residency intersects with three procurement constraints. Regulatory: HIPAA, certain U.S. federal contracts, German BDSG, and various sector-specific laws restrict where covered data may flow. Contractual: enterprise customers frequently impose residency requirements on their vendors, which then flow down to the AI sub-processors those vendors use. Reputational: some buyers refuse vendors that store any data in jurisdictions they consider hostile, regardless of legal applicability.
What to ask vendors
Specific questions: where is the primary database hosted, where are backups replicated, what region do model API calls route through (this is often different from where the vendor's account data lives), can you contractually commit to a single region, are there exceptions for support sessions or debugging, and how is residency enforced technically (region-locked tenants vs trust-based commitments). For AI vendors specifically, also ask whether prompts may transit ephemeral inference infrastructure in other regions.