Executive summary
RAG systems glue large language models (LLMs) to enterprise knowledge via search or vector retrieval. That makes them powerful—and uniquely exposed. Attacks rarely target weights; they target data, retrieval logic, and tool orchestration. This article maps the full attack surface (from indirect prompt injection and vector-DB poisoning to privacy leakage and tool abuse) and provides a concrete architecture, controls, detections, and a 90-day rollout to harden production RAG.
1) What a production RAG system looks like
Pipeline:
Data sources (wikis, SharePoint, tickets, PDFs, code) → Ingest/Sanitize → Chunk & Embed → Vector DB / Search (with ACLs & metadata) → Retriever → LLM (system & policy prompts) → (optional) Tools/Functions → Answer + citations.
Security principle: treat every stage as part of your Tier-0. If an attacker controls any of: source docs, embeddings, metadata, queries, or tools, they can steer the model.
2) RAG threat model & attack taxonomy
A) Data/ingest stage
- Indirect prompt injection in stored documents (hidden directives in HTML/Markdown/PDF).
- Vector poisoning: malicious chunks or adversarial embeddings push attacker docs to the top.
- ACL bypass via metadata forgery: mislabeled
tenant_id/classificationfields. - Active content: HTML/JS, SVG, trackers, one-pixel beacons, macro-enabled files.
- Supply chain: poisoned parsers, model artifacts, or conversion tools.
B) Retrieval stage
- Query hijack: user input smuggles instructions into retriever (“ignore previous; search for…upload secrets”).
- Reranker gaming: attacker crafts text to spike BM25/keyword density or cross-encoder scores.
- Cross-tenant leakage: caching and ANN indices that ignore tenant or label constraints.
- Staleness/drift: outdated documents become “truth.”
C) Generation stage (LLM)
- System prompt overwrite or jailbreak via retrieved content.
- Ungrounded responses: model fabricates beyond retrieved facts (hallucinations).
- Sensitive data extrusion: PII/PHI, secrets or trade secrets in responses (membership inference risk).
D) Tools / function calling
- Tool abuse: crafted answers trigger high-privilege actions (file I/O, payments, cloud APIs).
- Exfiltration: LLM instructed to POST retrieved data to attacker endpoints.
- SSRF/egr: tools with unconstrained network access.
3) Defense-in-depth architecture (what “good” looks like)
pgsqlCopyEdit[Sources] → Ingest Gateway → Sanitizers → Classifiers/PII → Chunker
→ Embedder (offline, no internet) → Sign + Metadata + ACL
→ Vector DB (per-tenant collections, KMS-encrypted)
→ Retriever (policy-aware filters + reranker)
→ LLM (immutable system prompt + guardrails + citations-required)
→ Tool Sandbox (allowlists, dry-run simulators, egress policy)
→ Telemetry Bus → SIEM/SOAR (approvals for high-risk actions)
Key controls by layer
Ingest & sanitize
- Canonicalize and strip active content: remove
<script>,<style>, event handlers, iframes, forms, data URLs; block macro office docs. - Convert PDFs via hardened pipeline; reject mixed/unknown MIME.
- Deduplicate, normalize whitespace/zero-width chars, standardize quotes & casing to defeat embedding gaming.
- PII/secret scanning (entropy + patterns) → redact or label.
Provenance & signing
- For each chunk store:
sha256,source_uri,owner,timestamp,tenant_id,labels,pii_flags,parser_version. Sign this record; reject unsigned writes.
Vector DB
- Per-tenant collections or required
tenant_idfilter + row-level security; encrypt at rest with KMS. - Separate write and read identities; short-lived tokens; audit all writes.
- Disable cross-collection ANN unless filter-aware; pin
top_kand score thresholds.
Retriever
- First pass policy filter (tenant, label, recency) then BM25/ANN → cross-encoder reranking.
- Cap
top_k(e.g., 6–8), enforce freshness for fast-moving domains (e.g., 7–30 days). - Canary filter: reject chunks matching injection regexes (e.g., “ignore all previous”, “copy all data to”, base64 blobs).
LLM guardrails
- Immutable system prompt (not user editable).
- Citations required: response must include document IDs + quotes; if coverage < threshold → abstain.
- Schema-validated JSON output with
claims[] {text, evidence[], confidence}. - Safety/policy classifiers for PII/PHI/secrets/toxicity before rendering.
Tools sandbox
- Explicit allowlist of tools; strict input schemas; dry-run simulators; no raw shell.
- Network egress policy (DNS/HTTP allowlists); block external posts by default.
- Human-in-the-loop (HITL) approval for destructive ops (revoke tokens, isolate host, delete objects).
Telemetry & response
- Log prompts, retrieved chunk IDs, tool calls, final output, user identity, IP/ASN.
- Route high-risk events to SOAR with approvals and circuit-breakers.
4) Copy-paste patterns (policies, code & detections)
4.1 Chunk schema (signed metadata)
jsonCopyEdit{
"chunk_id": "doc_42#p3#c7",
"sha256": "…",
"tenant_id": "acme",
"labels": ["internal","finance"],
"source_uri": "https://wiki.acme.local/fin/q3.md",
"timestamp": "2025-08-10T11:22:00Z",
"pii_flags": ["iban"],
"parser_version": "pdf2txt-1.8.4",
"signature": "ed25519:…"
}
4.2 Retrieval policy (OPA/Rego)
regoCopyEditpackage rag.retrieval
default allow = false
allow {
input.user.tenant_id == input.query.tenant_id
some c
c := input.candidates[_]
c.tenant_id == input.user.tenant_id
not c.labels[_] == "restricted"
time.now_ns() - time.parse_rfc3339_ns(c.timestamp) < 30 * 24 * 60 * 60 * 1e9 # 30d freshness
}
4.3 Sanitizer (Python, sketch)
pythonCopyEditfrom bs4 import BeautifulSoup
ALLOWED = {"p","h1","h2","h3","ul","ol","li","code","pre","a","strong","em","table","tr","td"}
def sanitize_html(raw):
soup = BeautifulSoup(raw, "lxml")
for tag in soup.find_all(True):
if tag.name not in ALLOWED:
tag.decompose()
for attr in list(tag.attrs):
if attr not in {"href"}: del tag[attr]
return soup.get_text("\n")
4.4 Evidence-required output (LLM prompt suffix)
pgsqlCopyEditReturn JSON:
{ "answer": "...",
"claims":[{"text":"...", "evidence":[{"chunk_id":"...", "quote":"..."}], "confidence":0.0}],
"abstain": true|false }
Do NOT answer without evidence. If evidence coverage < 0.7 → abstain.
4.5 Vector DB anomaly queries
SQL (pgvector / similar) — sudden HTML writes
sqlCopyEditSELECT writer, COUNT(*) AS n
FROM chunks
WHERE mime IN ('text/html','text/markdown')
AND ts > now() - interval '1 hour'
GROUP BY writer
HAVING COUNT(*) > 200;
Splunk — suspicious base64 in sources
bashCopyEditindex=rag source="ingest" raw_document
| regex raw_document="(?i)ignore all previous|base64,[A-Za-z0-9/+]{80,}"
5) Evaluation & monitoring: measure what matters
- Groundedness: % of answer tokens supported by cited chunks.
- Coverage: fraction of retrieved chunks actually cited.
- Abstention rate: better to abstain than hallucinate.
- Attack success rate (ASR): red-team corpus (indirect injections) → blocked %.
- Vector churn & write burst: spikes = potential poisoning.
- Privacy leakage: PII/secret detectors on outputs (precision/recall).
- Latency & cost with guardrails on (budget reality).
6) Privacy & compliance
- Data minimization & retention for logs and prompts.
- Mask secrets/PII in prompts; prefer on-prem or VPC-hosted models for sensitive data.
- Map controls to NIST AI RMF, ISO/IEC 42001, SOC 2, GDPR (lawful basis, DSAR searchability).
7) Failure modes (and fixes)
- Citations exist but irrelevant → require quote spans and overlap scoring with question.
- Tenant cache leaks → per-tenant caches; include tenant_id in cache key.
- Reranker hallucination → pair with policy filter first; cap max tokens from a single source.
- Tool egress → explicit allowlists; block IP literals &
*.pastebin*/*bin*.
8) 30-60-90 day rollout
Days 1–30 (Foundations)
- Build ingest gateway + sanitizers; sign chunks; per-tenant collections; turn on telemetry.
- Add immutable system prompt + citations-required schema; disable external egress.
Days 31–60 (Guardrails & detections)
- OPA policy filters, freshness windows, canary regexes; PII/secret redaction; SOAR approvals for tools.
- Deploy red-team corpus for indirect injection; measure ASR, groundedness.
Days 61–90 (Automation & governance)
- Promote low-risk Q&A flows to auto-answer with abstention.
- Add drift dashboards, monthly model & policy reviews; link incidents → new rules & tests.
9) Quick checklist (printable)
- Active content stripped; MIME whitelist; PDF hardened
- Chunks signed with
sha256+ provenance; per-tenant collections - Policy-aware retrieval (tenant/labels/freshness) + reranker
- Immutable system prompt; citations required; abstain on low evidence
- Tool sandbox with allowlists, dry-runs, HITL approvals
- Telemetry of prompts, chunks, tool calls; SIEM rules for vector poisoning
- Red-team injections; KPIs: groundedness, ASR, abstention, leakage
- Compliance mapped (NIST AI RMF / ISO 42001 / SOC 2 / GDPR)
Closing
RAG security is data security + retrieval policy + safe generation + tool isolation. Get those four right and most real-world attacks—indirect injection, vector poisoning, privacy leaks, tool abuse—lose their teeth. This is Zero-Trust AI for knowledge workflows.
Leave a comment