LLM04: Data and Model Poisoning
OWASP LLM Top 10 (2025)
Adversarial training data or fine-tuning input degrades model integrity.
What this risk means
Attackers contaminate training data, fine-tuning sets, or RAG corpora to embed backdoors, bias, or backdoored behaviours. Risk is highest where training-data provenance is opaque and customer-fine-tuning paths are weakly controlled.
How TrustAtlas dimensions address it
Data-handling captures the vendor's stance on training-data provenance and customer opt-out; transparency captures whether the vendor publishes model cards and training-data disclosure; security captures their fine-tuning pipeline integrity controls.
See methodology for how each dimension is scored across the catalog.
Questions to ask vendors
Drop these into RFPs, due-diligence questionnaires, or a procurement scorecard. Each question maps back to evidence visible on the vendor's TrustAtlas profile.
- Do you publish a model card (or equivalent) describing training-data sources, fine-tuning pipeline, and evaluation benchmarks?
- What controls prevent contaminated input from customer fine-tuning runs from leaking across tenants?
- How do you validate the integrity and provenance of third-party datasets used in pre-training or RAG indexing?
- Do you maintain a poisoning-detection telemetry pipeline, and will you notify affected customers of findings?
Related
- Back to the full OWASP LLM Top 10 cross-walk
- NIST AI RMF cross-walk — the U.S. enterprise companion framework
- TrustAtlas methodology — how the 8 risk dimensions are scored
- Browse the vendor directory and filter by the dimensions tied to this risk