Executive summary

RAG systems glue large language models (LLMs) to enterprise knowledge via search or vector retrieval. That makes them powerful—and uniquely exposed. Attacks rarely target weights; they target data, retrieval logic, and tool orchestration. This article maps the full attack surface (from indirect prompt injection and vector-DB poisoning to privacy leakage and tool abuse) and provides a concrete architecture, controls, detections, and a 90-day rollout to harden production RAG.


1) What a production RAG system looks like

Pipeline:
Data sources (wikis, SharePoint, tickets, PDFs, code) → Ingest/Sanitize → Chunk & Embed → Vector DB / Search (with ACLs & metadata) → Retriever → LLM (system & policy prompts) → (optional) Tools/Functions → Answer + citations.

Security principle: treat every stage as part of your Tier-0. If an attacker controls any of: source docs, embeddings, metadata, queries, or tools, they can steer the model.


2) RAG threat model & attack taxonomy

A) Data/ingest stage

  • Indirect prompt injection in stored documents (hidden directives in HTML/Markdown/PDF).
  • Vector poisoning: malicious chunks or adversarial embeddings push attacker docs to the top.
  • ACL bypass via metadata forgery: mislabeled tenant_id/classification fields.
  • Active content: HTML/JS, SVG, trackers, one-pixel beacons, macro-enabled files.
  • Supply chain: poisoned parsers, model artifacts, or conversion tools.

B) Retrieval stage

  • Query hijack: user input smuggles instructions into retriever (“ignore previous; search for…upload secrets”).
  • Reranker gaming: attacker crafts text to spike BM25/keyword density or cross-encoder scores.
  • Cross-tenant leakage: caching and ANN indices that ignore tenant or label constraints.
  • Staleness/drift: outdated documents become “truth.”

C) Generation stage (LLM)

  • System prompt overwrite or jailbreak via retrieved content.
  • Ungrounded responses: model fabricates beyond retrieved facts (hallucinations).
  • Sensitive data extrusion: PII/PHI, secrets or trade secrets in responses (membership inference risk).

D) Tools / function calling

  • Tool abuse: crafted answers trigger high-privilege actions (file I/O, payments, cloud APIs).
  • Exfiltration: LLM instructed to POST retrieved data to attacker endpoints.
  • SSRF/egr: tools with unconstrained network access.

3) Defense-in-depth architecture (what “good” looks like)

pgsqlCopyEdit[Sources] → Ingest Gateway → Sanitizers → Classifiers/PII → Chunker
    → Embedder (offline, no internet) → Sign + Metadata + ACL
    → Vector DB (per-tenant collections, KMS-encrypted)
    → Retriever (policy-aware filters + reranker)
    → LLM (immutable system prompt + guardrails + citations-required)
    → Tool Sandbox (allowlists, dry-run simulators, egress policy)
    → Telemetry Bus → SIEM/SOAR (approvals for high-risk actions)

Key controls by layer

Ingest & sanitize

  • Canonicalize and strip active content: remove <script><style>, event handlers, iframes, forms, data URLs; block macro office docs.
  • Convert PDFs via hardened pipeline; reject mixed/unknown MIME.
  • Deduplicate, normalize whitespace/zero-width chars, standardize quotes & casing to defeat embedding gaming.
  • PII/secret scanning (entropy + patterns) → redact or label.

Provenance & signing

  • For each chunk store: sha256source_uriownertimestamptenant_idlabelspii_flagsparser_version. Sign this record; reject unsigned writes.

Vector DB

  • Per-tenant collections or required tenant_id filter + row-level security; encrypt at rest with KMS.
  • Separate write and read identities; short-lived tokens; audit all writes.
  • Disable cross-collection ANN unless filter-aware; pin top_k and score thresholds.

Retriever

  • First pass policy filter (tenant, label, recency) then BM25/ANN → cross-encoder reranking.
  • Cap top_k (e.g., 6–8), enforce freshness for fast-moving domains (e.g., 7–30 days).
  • Canary filter: reject chunks matching injection regexes (e.g., “ignore all previous”, “copy all data to”, base64 blobs).

LLM guardrails

  • Immutable system prompt (not user editable).
  • Citations required: response must include document IDs + quotes; if coverage < threshold → abstain.
  • Schema-validated JSON output with claims[] {text, evidence[], confidence}.
  • Safety/policy classifiers for PII/PHI/secrets/toxicity before rendering.

Tools sandbox

  • Explicit allowlist of tools; strict input schemas; dry-run simulators; no raw shell.
  • Network egress policy (DNS/HTTP allowlists); block external posts by default.
  • Human-in-the-loop (HITL) approval for destructive ops (revoke tokens, isolate host, delete objects).

Telemetry & response

  • Log prompts, retrieved chunk IDs, tool calls, final output, user identity, IP/ASN.
  • Route high-risk events to SOAR with approvals and circuit-breakers.

4) Copy-paste patterns (policies, code & detections)

4.1 Chunk schema (signed metadata)

jsonCopyEdit{
  "chunk_id": "doc_42#p3#c7",
  "sha256": "…",
  "tenant_id": "acme",
  "labels": ["internal","finance"],
  "source_uri": "https://wiki.acme.local/fin/q3.md",
  "timestamp": "2025-08-10T11:22:00Z",
  "pii_flags": ["iban"],
  "parser_version": "pdf2txt-1.8.4",
  "signature": "ed25519:…"
}

4.2 Retrieval policy (OPA/Rego)

regoCopyEditpackage rag.retrieval

default allow = false

allow {
  input.user.tenant_id == input.query.tenant_id
  some c
  c := input.candidates[_]
  c.tenant_id == input.user.tenant_id
  not c.labels[_] == "restricted"
  time.now_ns() - time.parse_rfc3339_ns(c.timestamp) < 30 * 24 * 60 * 60 * 1e9  # 30d freshness
}

4.3 Sanitizer (Python, sketch)

pythonCopyEditfrom bs4 import BeautifulSoup
ALLOWED = {"p","h1","h2","h3","ul","ol","li","code","pre","a","strong","em","table","tr","td"}
def sanitize_html(raw):
    soup = BeautifulSoup(raw, "lxml")
    for tag in soup.find_all(True):
        if tag.name not in ALLOWED:
            tag.decompose()
        for attr in list(tag.attrs):
            if attr not in {"href"}: del tag[attr]
    return soup.get_text("\n")

4.4 Evidence-required output (LLM prompt suffix)

pgsqlCopyEditReturn JSON:
{ "answer": "...",
  "claims":[{"text":"...", "evidence":[{"chunk_id":"...", "quote":"..."}], "confidence":0.0}],
  "abstain": true|false }
Do NOT answer without evidence. If evidence coverage < 0.7 → abstain.

4.5 Vector DB anomaly queries

SQL (pgvector / similar) — sudden HTML writes

sqlCopyEditSELECT writer, COUNT(*) AS n
FROM chunks
WHERE mime IN ('text/html','text/markdown')
  AND ts > now() - interval '1 hour'
GROUP BY writer
HAVING COUNT(*) > 200;

Splunk — suspicious base64 in sources

bashCopyEditindex=rag source="ingest" raw_document
| regex raw_document="(?i)ignore all previous|base64,[A-Za-z0-9/+]{80,}"

5) Evaluation & monitoring: measure what matters

  • Groundedness: % of answer tokens supported by cited chunks.
  • Coverage: fraction of retrieved chunks actually cited.
  • Abstention rate: better to abstain than hallucinate.
  • Attack success rate (ASR): red-team corpus (indirect injections) → blocked %.
  • Vector churn & write burst: spikes = potential poisoning.
  • Privacy leakage: PII/secret detectors on outputs (precision/recall).
  • Latency & cost with guardrails on (budget reality).

6) Privacy & compliance

  • Data minimization & retention for logs and prompts.
  • Mask secrets/PII in prompts; prefer on-prem or VPC-hosted models for sensitive data.
  • Map controls to NIST AI RMFISO/IEC 42001SOC 2GDPR (lawful basis, DSAR searchability).

7) Failure modes (and fixes)

  • Citations exist but irrelevant → require quote spans and overlap scoring with question.
  • Tenant cache leaks → per-tenant caches; include tenant_id in cache key.
  • Reranker hallucination → pair with policy filter first; cap max tokens from a single source.
  • Tool egress → explicit allowlists; block IP literals & *.pastebin*/*bin*.

8) 30-60-90 day rollout

Days 1–30 (Foundations)

  • Build ingest gateway + sanitizers; sign chunks; per-tenant collections; turn on telemetry.
  • Add immutable system prompt + citations-required schema; disable external egress.

Days 31–60 (Guardrails & detections)

  • OPA policy filters, freshness windows, canary regexes; PII/secret redaction; SOAR approvals for tools.
  • Deploy red-team corpus for indirect injection; measure ASR, groundedness.

Days 61–90 (Automation & governance)

  • Promote low-risk Q&A flows to auto-answer with abstention.
  • Add drift dashboards, monthly model & policy reviews; link incidents → new rules & tests.

9) Quick checklist (printable)

  •  Active content stripped; MIME whitelist; PDF hardened
  •  Chunks signed with sha256 + provenance; per-tenant collections
  •  Policy-aware retrieval (tenant/labels/freshness) + reranker
  •  Immutable system prompt; citations required; abstain on low evidence
  •  Tool sandbox with allowlists, dry-runs, HITL approvals
  •  Telemetry of prompts, chunks, tool calls; SIEM rules for vector poisoning
  •  Red-team injections; KPIs: groundedness, ASR, abstention, leakage
  •  Compliance mapped (NIST AI RMF / ISO 42001 / SOC 2 / GDPR)

Closing

RAG security is data security + retrieval policy + safe generation + tool isolation. Get those four right and most real-world attacks—indirect injection, vector poisoning, privacy leaks, tool abuse—lose their teeth. This is Zero-Trust AI for knowledge workflows.

Leave a comment

Design a site like this with WordPress.com
Get started