LLM04: Data and Model Poisoning

OWASP LLM Top 10 (2025)

Adversarial training data or fine-tuning input degrades model integrity.

What this risk means

Attackers contaminate training data, fine-tuning sets, or RAG corpora to embed backdoors, bias, or backdoored behaviours. Risk is highest where training-data provenance is opaque and customer-fine-tuning paths are weakly controlled.

How TrustAtlas dimensions address it

Data-handling captures the vendor's stance on training-data provenance and customer opt-out; transparency captures whether the vendor publishes model cards and training-data disclosure; security captures their fine-tuning pipeline integrity controls.

Data handlingTransparencySecurity

See methodology for how each dimension is scored across the catalog.

Questions to ask vendors

Drop these into RFPs, due-diligence questionnaires, or a procurement scorecard. Each question maps back to evidence visible on the vendor's TrustAtlas profile.

  1. Do you publish a model card (or equivalent) describing training-data sources, fine-tuning pipeline, and evaluation benchmarks?
  2. What controls prevent contaminated input from customer fine-tuning runs from leaking across tenants?
  3. How do you validate the integrity and provenance of third-party datasets used in pre-training or RAG indexing?
  4. Do you maintain a poisoning-detection telemetry pipeline, and will you notify affected customers of findings?
← LLM03: Supply Chain LLM05: Improper Output Handling →

Related