Container & Kubernetes Security: Real-World Vulnerabilities, Exploit Paths, and a Defense Blueprint By CyberDudeBivash — Founder, CyberDudeBivash

Executive summary

Containers and Kubernetes move fast—and so do attackers. Most real incidents are misconfig + weak identity + supply-chain rather than “exotic kernel 0-days”. But container escapes and API-server flaws still happen. This guide maps the attack surface for Docker/containerd/runc and Kubernetes control/worker planes, walks through common exploits, and gives copy-paste hardening, detection, and response you can implement today.

1) How the stack fits together (threat model)

Container runtime path: image → containerd/CRI-O → runc (creates Linux namespaces/cgroups) → kernel.
Kubernetes control plane: API Server ⇄ etcd (state) ⇄ Controller Manager & Scheduler; nodes run kubelet + CNI (network) + CSI (storage) + runtime.

Trust boundaries (red zones):

Container ↔ host kernel (escapes, privilege escalation).
Kubelet / API Server (authn/authz mistakes, open ports).
etcd (cluster secrets).
Admission & Supply chain (malicious images, mutable tags).
CNI/Network (flat east-west traffic).
Volumes / hostPath (symlink & subPath tricks).

2) Container vulnerabilities & exploit patterns

2.1 Runtime escapes (runc/containerd)

runc “/proc/self/exe” & FD leaks (e.g., CVE-2019-5736; later variants like CVE-2024-21626): attacker overwrites or abuses the runc binary/FD during exec to gain host execution.
containerd CRI bugs (e.g., CVE-2022-23648): crafted image/manifests or mounts leading to unexpected host access.
Kernel bugs reachable from namespaces: UAF/overlayfs flaws to write on host.

Exploit flow (typical): run a malicious image → trigger runc/containerd bug during docker exec or pod start → obtain host shell → pivot to node credentials → cluster admin via kubelet API or cloud IAM.

2.2 Capability & privilege abuse

Containers running as root with CAP_SYS_ADMIN (or --privileged) can: mount filesystems, manipulate cgroups, or access /dev to break isolation.
No seccomp/AppArmor/SELinux → dangerous syscalls (e.g., ptrace, bpf) allowed.

2.3 hostPath & subPath volume attacks

hostPath mounts expose host directories; combined with symlink races → arbitrary host file write.
subPath volume handling historically hit symlink/TOCTOU issues (e.g., CVE-2021-25741 class).
Kubernetes Windows hostProcess containers can reach host services if misused.

2.4 Image & registry supply-chain

Typosquatted/malicious images on public registries.
Mutable tags (:latest) pull different bits over time; poisoned base images or Dockerfile FROM chains.
Embedded secrets in layers (forgotten .npmrc, SSH keys).
Build system takeover (CI tokens → push trojaned images).

3) Kubernetes vulnerabilities & exploit paths

3.1 API server & aggregated APIs

Proxy/upgrade request mishandling can forward attacker traffic to backends (e.g., CVE-2018-1002105 class), yielding privilege escalation.
Over-permissive RBAC (e.g., create pods / secrets in prod) converts any workload compromise into cluster admin.

3.2 etcd exposure

etcd stores Secrets. If reachable without TLS+auth, one GET = full cluster compromise.

3.3 Kubelet issues

Legacy readonly port (10255) exposure leaks metrics/pods; misconfigured kubelet credentials allow exec/cp into pods.

3.4 CNI / Network

Flat networks let a compromised pod scan and reach every service.
DNAT/externalIPs misconfig (e.g., CVE-2020-8554 class) can enable MITM within cluster.

3.5 Admission & policy gaps

No Pod Security / PodSecurityPolicy (legacy) replacement → pods request privileged, hostPID, hostNetwork, or unsafe capabilities and get them.
Missing image signature policy lets untrusted images in.

4) Realistic attacker playbooks

Playbook A — Malicious public image → node → cluster

Pull image company/app:latest (poisoned).
Container phones home, drops toolkit, enumerates service account token.
If pod has list/create pods or secrets, attacker spawns privileged pod / steals cluster secrets.
Uses cloud metadata IAM via kube-node role to escalate in cloud.

Playbook B — Exposed Docker API (2375) or kubelet creds

Internet scan finds unauth Docker/kubelet; attacker docker run --privileged -v /:/host or kubectl exec.
Writes SSH key to /host/root/.ssh/authorized_keys; persistence achieved.
Mass-deploy crypto-miners or exfil data.

Playbook C — hostPath/subPath write

Dev grants hostPath: /var/lib/kubelet/pods to sidecar for debugging.
Attacker symlinks to /etc/shadow or kubelet cert dir → writes controlled content.
Replaces node service or steals kubelet cert → cluster admin.

5) Hardening that actually works (copy-paste)

5.1 Container run options (least privilege)

yamlCopyEdit# pod-security.yaml (snippet)
securityContext:
  runAsNonRoot: true
  runAsUser: 10000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: ["ALL"]
  seccompProfile:
    type: RuntimeDefault   # or Localhost
# Avoid host namespaces & devices unless absolutely necessary
hostNetwork: false
hostPID: false
hostIPC: false

5.2 Block dangerous mounts

Avoid hostPath. If required, use type: DirectoryOrCreate, read-only, and path-allowlists.
Do not mount /var/run/docker.sock, kubelet dirs, or /proc//sys into pods.

5.3 Admission control: Pod Security + policy engines

Pod Security Admission (v1.25+): label namespaces to baseline/restricted:

yamlCopyEditmetadata:
  labels:
    pod-security.kubernetes.io/enforce: "restricted"
    pod-security.kubernetes.io/audit: "restricted"
    pod-security.kubernetes.io/warn: "restricted"

Kyverno/Gatekeeper examples:

Deny privileged: true, hostNetwork: true, hostPID: true.
Require runAsNonRoot, readOnlyRootFilesystem, seccompProfile: RuntimeDefault.
Enforce image: registry.example.com/* and signature required (Sigstore/cosign).

5.4 Image supply-chain controls

Build SBOMs (Syft) and scan (Trivy/Grype).
Sign images: cosign sign --key kms://… <image>; admission webhook rejects unsigned.
Pin immutable digests:

yamlCopyEditimage: registry.example.com/api@sha256:3c1f...   # not :latest

Block root in Dockerfiles: USER 10000:10000.

5.5 Node & runtime hardening

Keep runc/containerd current; enable AppArmor/SELinux; lock kernel with LSMs.
Enable seccomp default (RuntimeDefault).
Isolate node IAM roles; disable cloud metadata access from non-system pods (IMDSv2 hop-limit, firewall).

5.6 Network policies (CNI)

yamlCopyEditapiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny, namespace: prod }
spec:
  podSelector: {}
  policyTypes: ["Ingress","Egress"]

Then add allow policies per app (db only from app, egress only to needed APIs).

5.7 etcd & control plane

etcd TLS/mTLS only; isolate on control-plane network; encryption at rest.
API server: audit logging, admission webhooks mTLS, restrict anonymous auth, rate-limit.

6) Detection & response (ready-to-use)

6.1 Falco rules (container breakout attempts)

yamlCopyEdit- rule: Write below root
  desc: Container writing to sensitive host paths
  condition: write_to_known_sensitive_file
  output: "Write to sensitive file (user=%user.name proc=%proc.name file=%fd.name)"
  priority: CRITICAL

6.2 K8s audit log – privileged pod creation

jsonCopyEdit{"stage":"ResponseComplete","verb":"create","objectRef":{"resource":"pods"},"requestObject":{"spec":{"containers":[{"securityContext":{"privileged":true}}]}}}

Alert on any such event outside break-glass namespaces.

6.3 Hunt queries

Container spawning shell from web svc:
- Parent: nginx/httpd/java/w3wp → Child: bash/sh/powershell/cmd
Kubelet exec storms: audit exec count per user/node > baseline.
Image drift: deployment digest ≠ last approved digest.

6.4 Incident playbook (short)

Contain: cordon node or isolate namespace; block egress via NetworkPolicy; revoke service-account tokens.
Triage: fetch pod describe, container logs, node journalctl, Falco events, kube-audit trail, image digest & SBOM.
Scope: search for similar pods, suspicious admissions, unsigned images.
Recover: redeploy from signed images; rotate secrets; re-issue kubelet cert if node compromised.
Lessons: add/adjust admission policies; write regression tests.

7) Program plan (30–60–90)

Days 1–30 – Baseline

Enforce Pod Security restricted in all namespaces.
Default-deny NetworkPolicy + egress allow-lists.
Block :latest; require cosign signatures for prod.
Patch runc/containerd; enable seccomp RuntimeDefault.

Days 31–60 – Supply-chain & policy

SBOMs + image scanning in CI; fail builds on critical vulns.
Kyverno/Gatekeeper policies (no privileged/host namespaces; require non-root).
etct TLS/mTLS; secrets encryption at rest; rotate root certs.

Days 61–90 – Detection & drills

Deploy Falco/eBPF sensor; wire to SIEM; create dashboards for admissions, exec, privileged attempts.
Red-team: hostPath abuse, kubelet exec misuse, unsigned image admission.
Practice node compromise → cluster recovery.

8) Quick checklist

No privileged pods, no host namespaces, non-root users
Seccomp/AppArmor/SELinux enforced
NetworkPolicy default-deny + egress controls
Signed images, immutable digests, SBOM + scan
Admission control (Kyverno/Gatekeeper) enforcing baseline
etcd mTLS + encryption at rest; API audit logs on
Runtime patched (runc/containerd); kernel up-to-date
Falco/eBPF + SIEM detections; incident playbooks tested

Closing

Most “container hacks” are preventable: kill privileged pods, enforce policy at admission, sign what you run, and observe what runs. Do that, and even if a new runc/K8s CVE appears, your blast radius is small and your recovery is fast.

Cyberdudebivash