LLM04: Data and Model Poisoning

OWASP LLM Top 10 (2025)

Adversarial training data or fine-tuning input degrades model integrity.

What this risk means

Attackers contaminate training data, fine-tuning sets, or RAG corpora to embed backdoors, bias, or backdoored behaviours. Risk is highest where training-data provenance is opaque and customer-fine-tuning paths are weakly controlled.

How TrustAtlas dimensions address it

Data-handling captures the vendor's stance on training-data provenance and customer opt-out; transparency captures whether the vendor publishes model cards and training-data disclosure; security captures their fine-tuning pipeline integrity controls.

Data handlingTransparencySecurity

See methodology for how each dimension is scored across the catalog.

Questions to ask vendors

Drop these into RFPs, due-diligence questionnaires, or a procurement scorecard. Each question maps back to evidence visible on the vendor's TrustAtlas profile.

Do you publish a model card (or equivalent) describing training-data sources, fine-tuning pipeline, and evaluation benchmarks?
What controls prevent contaminated input from customer fine-tuning runs from leaking across tenants?
How do you validate the integrity and provenance of third-party datasets used in pre-training or RAG indexing?
Do you maintain a poisoning-detection telemetry pipeline, and will you notify affected customers of findings?

← LLM03: Supply Chain LLM05: Improper Output Handling →

Back to the full OWASP LLM Top 10 cross-walk
NIST AI RMF cross-walk — the U.S. enterprise companion framework
TrustAtlas methodology — how the 8 risk dimensions are scored
Browse the vendor directory and filter by the dimensions tied to this risk

LLM04: Data and Model Poisoning

What this risk means

How TrustAtlas dimensions address it

Questions to ask vendors

Related