Executive summary
DevSecOps promises shift-left + automate everything. In practice, many programs stall: scanners create noise, pipelines slow delivery, and attackers still breach via supply chain, secrets, or cloud drift. This guide maps where DevSecOps fails, shows how those failures turn into exploits, and gives a concrete, copy-paste blueprint to make security “boring” (paved-road automation, strong guardrails, and runtime verification).
1) The DevSecOps failure map
Culture & ownership
- Security as gatekeeper – tickets thrown over the wall; no shared SLOs.
- No paved road – dozens of tool options; teams “opt-out” of controls.
- Vanity metrics – “issues closed” and “scans run” instead of MTTR for vulns and exploitability.
Pipeline & tooling
- SAST/DAST noise – false positives drown real bugs; teams disable checks.
- Secret sprawl – hard-coded keys in code, CI logs, containers, and Slack exports.
- SBOM theater – SBOMs generated but not enforced or linked to deploy artifacts.
- IaC blind spots – Terraform/Helm passed unreviewed; drift in cloud accounts diverges from code.
- Container/K8s baseline off –
--privileged,:latest, no seccomp/AppArmor, unsigned images. - Supply-chain trust – pulling public images/libraries without provenance; mutable tags.
- Attestation gaps – no cryptographic link from source → build → image → deploy (SLSA/Sigstore).
- “Scan at PR only” – prod images drift; runtime deltas never rescanned.
Design & risk
- No threat modeling – features ship without abuse-case review or STRIDE/ATT&CK mapping.
- Hardening after GA – default configs reach the internet; “we’ll secure in v2”.
- Identity sprawl – long-lived service tokens; wildcard RBAC; no device posture for admin.
- Misplaced AI – LLMs commit code or triage alerts without evidence; prompt injection and hallucinations create risk.
Run & response
- Runtime invisibility – no eBPF/Falco policies; no canaries; no API abuse analytics.
- Change without approval – hotfixes bypass the pipeline; unsigned artifacts in prod.
- Unrehearsed IR – backups not restorable, no break-glass, no isolation playbook.
- Shadow SaaS – teams spin up unvetted SaaS with prod data.
- Compliance, not security – controls exist on paper; attackers walk through working exceptions.
2) How these failures become exploits (real chains)
A) “Modern supply chain” compromise
- Developer installs a typosquatted npm package with post-install script → CI token theft → attacker pushes trojaned image to your private registry → runtime container with :latest tag pulls it automatically → secret exfil + crypto-mining.
B) Cloud drift breach
- IaC declares private S3 bucket; engineer toggles public-read in console at 2am to debug → drift not detected → weeks later, bucket indexed and exfiltrated.
C) K8s privilege escalation
- Team runs pod with
hostPathto “debug” → symlink race writes to/etc/shadow→ node compromised → kubelet certs stolen → cluster admin.
D) Secrets everywhere
- PR includes
.npmrctoken; CI logs echo env variables; artifact ends up in image layer → incident starts with a Git history scraper.
3) Fixing DevSecOps: reference architecture
javaCopyEditSource (Git) → Signed CI (OIDC to cloud KMS) → Build SBOM + SAST + Secrets
→ Container build (rootless) + Image signing (cosign) + Provenance (SLSA)
→ Registry (signature/attestation policy) → Deploy via GitOps
→ Admission (Kyverno/Gatekeeper + Sigstore policy + Pod Security)
→ Runtime sensors (eBPF/Falco) + API/WAF → SIEM/SOAR
→ Continuous verification (drift, image digest, IaC guardrails)
Key principles
- Paved road: only one blessed pipeline with golden actions; everything else is opt-in after a security review.
- Evidence-required automation: tools must provide citations (file/line, CVE, package) and exploitability signals (KEV, reachable code).
- Cryptographic trust: sign source, builds, and images; verify at deploy.
- Runtime verification: assume scans miss things; detect abnormal behavior.
4) Copy-paste guardrails (use these today)
4.1 GitHub (or GitLab) branch protection
textCopyEdit- Require PR reviews (min 1, code owner for sensitive paths)
- Require status checks: unit, SAST, secrets, SBOM, license
- Require signed commits (GPG/Sigstore) and verified authors
- Disallow force push; require linear history
4.2 Secret prevention & revocation
Pre-commit (detect before push):
bashCopyEditpipx install detect-secrets
detect-secrets scan > .secrets.baseline
pre-commit install
CI check:
bashCopyEditdetect-secrets scan --baseline .secrets.baseline --all-files
Immediate response: revoke leaked keys; rotate CI & cloud tokens automatically via SOAR.
4.3 SBOM + image signing + policy
Build
bashCopyEditsyft packages . -o spdx-json > sbom.spdx.json
trivy image --exit-code 1 --severity CRITICAL,HIGH $IMAGE
cosign sign --key kms://projects/…/locations/global/keyRings/…/cryptoKeys/… $IMAGE
Admission (Kyverno) — require signature & immutable digest
yamlCopyEditapiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata: { name: require-signed-images }
spec:
validationFailureAction: Enforce
rules:
- name: check-signature
match: { resources: { kinds: ["Pod"] } }
verifyImages:
- imageReferences: ["registry.example.com/*"]
verifyDigest: true
attestations:
- type: cosign.sigstore.dev/attestation
4.4 Pod Security (baseline → restricted)
yamlCopyEditmetadata:
labels:
pod-security.kubernetes.io/enforce: "restricted"
pod-security.kubernetes.io/audit: "restricted"
pod-security.kubernetes.io/warn: "restricted"
4.5 Kyverno hardening snippets
yamlCopyEdit# deny privileged pods & host namespaces
- name: no-privileged
match: { resources: { kinds: ["Pod"] } }
validate:
message: "Privileged/host namespaces forbidden"
pattern:
spec:
=(hostPID): "false"
=(hostNetwork): "false"
containers:
- =(securityContext):
=(privileged): "false"
runAsNonRoot: true
readOnlyRootFilesystem: true
4.6 Docker/Compose safe defaults
yamlCopyEditsecurity_opt:
- no-new-privileges:true
- seccomp:default
cap_drop: ["ALL"]
user: "10000:10000"
read_only: true
tmpfs: ["/tmp:rw,noexec,nosuid,nodev"]
4.7 IaC guardrails & drift
- tfsec/Checkov in CI; block
0.0.0.0/0, public buckets, unencrypted volumes. - Drift: run
terraform plannightly in CI and compare to cloud; alert on delta. - Policy as code with OPA for cloud (e.g., deny public ACLs).
4.8 Runtime detection (Falco)
yamlCopyEdit- rule: Docker Socket Mount
desc: Container touched /var/run/docker.sock
condition: fd.name startswith /var/run/docker.sock
output: "Docker socket access (proc=%proc.name file=%fd.name)"
priority: CRITICAL
- rule: K8s Privileged Container
condition: container.privileged=true
output: "Privileged pod (k8s.ns=%k8s.ns k8s.pod=%k8s.pod)"
priority: CRITICAL
4.9 API abuse analytics (reverse proxy/WAF)
- Rate-limit auth endpoints; block token brute-force; detect impossible travel; attach device posture to admin routes.
5) Metrics that matter (replace vanity with value)
- Mean Time to Remediate (MTTR) for exploitable vulns (KEV, reachable).
- % signed & verified artifacts in prod; digest pinning rate.
- Secret exposure time (commit → revocation).
- Policy coverage: namespaces at restricted, images with cosign signatures.
- Drift MTTD for critical cloud resources.
- Runtime containment time (isolate pod/host).
- False-positive rate per scanner (aim <10% for blocking gates).
6) AI in DevSecOps — use with guardrails
- Evidence-required outputs: LLMs must cite file/line, CVE, and docs; allow abstain on low confidence.
- HITL approvals for destructive actions (revoke tokens, isolate nodes).
- RAG hygiene: sanitize docs, sign chunks, enforce freshness, prevent indirect prompt injection.
7) 30-60-90 day recovery plan
Days 1–30 (Stabilize)
- Lock the deployment path to a single paved pipeline; require PR reviews & signed commits.
- Turn on secret scanning, SBOM + Trivy, cosign signing; block
:latest. - Enforce Pod Security restricted in all namespaces; deploy Falco to prod.
Days 31–60 (Assurance)
- Add Kyverno/Gatekeeper policies; require signatures & immutable digests.
- Introduce IaC policy and nightly drift detection.
- Adopt SLSA provenance for builds; OIDC → KMS keys for signing.
Days 61–90 (Resilience)
- Red-team: typosquat package → CI token → registry → deploy; measure detection and rollback.
- Drill break-glass & isolation runbooks; measure runtime containment time.
- Publish a security scorecard dashboard with the metrics above.
8) Top 10 DevSecOps failure smells (print & pin)
--privilegedcontainers orhostPatheverywhere.:latesttags in prod; unsigned images.- Public registries in production without allow-lists.
- SBOMs exist but deploy still allowed when critical vulns present.
- Secrets in env vars and git history.
- Exposed CI/CD tokens; runners with long-lived credentials.
- No NetworkPolicy; flat east-west traffic.
- API server with anonymous or overly broad RBAC.
- Drift accepted as normal; Terraform plans not enforced.
- “Security” is a Jira queue, not a platform.
Final take
DevSecOps fails when it’s optional. Make the secure path the fastest path: paved pipeline, signed artifacts, enforced policy, and runtime verification. Measure outcomes, not activities. When you do, breaches become noisy, containable, and recoverable.
Leave a comment