Hypothesis-Driven Threat Hunting: Garet P.

// introduction

The Problem with Anomaly Fishing

Two dominant hunting models exist in practice. Anomaly-based hunting: pull data, look for statistical outliers, hope something interesting surfaces. High resource cost, unpredictable yield, no analytical framework connecting findings to threat actors or techniques. Hypothesis-driven hunting: start with a specific, falsifiable claim about adversary behavior in your environment, then design the hunt to confirm or refute it.

Anomaly fishing persists because it feels thorough and occasionally finds real threats. But it scales poorly, cannot be measured, and produces no institutional knowledge when it finds nothing. A hunt without a hypothesis is not hunting, it is reactive analysis with extra steps. The analytical work that makes hunting valuable happens before the first query is written, not during it.

// foundation

What a Hunt Hypothesis Actually Is

A hunt hypothesis is a falsifiable, specific claim about adversary behavior that is expected to leave observable evidence in available telemetry. Three requirements: falsifiable, the hunt can produce a result that definitively confirms or refutes it; specific, it names the technique, the expected evidence, and the telemetry source; observable, the expected behavior must produce artifacts that exist in your current telemetry.

❌ Not a hypothesis

"Look for unusual PowerShell activity across endpoints."

Too broad to be falsifiable. "Unusual" is undefined. No specific technique, no expected evidence, no telemetry anchor. A hunt built on this will produce noise or nothing.

❌ Not a hypothesis

"Check if any of our vendors were in the recent breach report."

This is a threat intelligence lookup, not a hunt. It produces no observable evidence claim about your environment.

✓ A hypothesis

"If a threat actor consistent with APT29's documented tradecraft has accessed our environment, we expect to find evidence of WMIC or mshta.exe spawning cmd.exe or PowerShell on endpoints accessed by privileged accounts in the last 30 days, correlatable to authentication events outside business hours."

Specific technique. Named threat actor profile. Expected artifact type. Defined telemetry sources. Time-bounded. Falsifiable: if no such process chains exist, the hypothesis is refuted for this period.

The structure of a well-formed hypothesis: IF [adversary condition] THEN [observable behavior] IN [specific telemetry] WITHIN [scope and time bound]. The time bound matters, an unbounded hunt cannot be closed. A time-bounded hunt has a defined completion state.

// methodology

The Three Input Sources

Hypotheses are derived from three structured input sources, each with different characteristics.

Threat Intelligence

Actor profiles, campaign reports, ISAC advisories with TTP detail, direct behavioral evidence from real intrusions. IOCs are not useful here; technique procedures are.

Actor-specific hunts

MITRE ATT&CK

Structured TTP taxonomy with technique procedures and detection guidance. ATT&CK Navigator maps existing coverage against the matrix to surface gaps.

Coverage gap analysis

Incident History

Your own prior incidents and post-mortems, the highest environmental fidelity of any input source. Most organizations do not systematically convert post-mortems into hunt hypotheses.

Recurrence and spread

Red Team Findings

Techniques that succeeded undetected during authorized assessments, the most direct evidence of detection gaps in your specific environment.

Verified gap hunts

// methodology

The Six-Step Derivation Process

Select input source

CTI report, ATT&CK coverage gap, incident history, or red team finding.

Extract technique

A specific TTP, not an actor name or general behavior area. Map to an ATT&CK technique ID where possible.

Telemetry feasibility gate

The most frequently skipped and most critical step. Confirm the required telemetry is collected, retained for the hunt period, and queryable. A hypothesis about WMI persistence is not huntable if Sysmon EIDs 19–21 are not enabled, the correct output is a telemetry gap finding, not a hunt.

Define the observable

The specific evidence pattern the hunt will search for, artifact type, field values, expected combinations.

Scope and bound

Environment subset, user population, time window. A defined scope creates a completion state.

State the hypothesis in writing

IF/THEN/IN/WITHIN, falsifiable and written down before any query is written. The query is derived from the hypothesis, not the other way around.

// worked example

From CTI Report to Executed Hunt

Input: a CTI report describes a financially motivated threat actor targeting financial services organizations using Kerberoasting for credential access followed by lateral movement via PsExec to backup infrastructure.

Step 1: sector relevance: financial services, relevant. Proceed.

Step 2: technique extraction: two techniques with distinct hunt requirements. T1558.003 (Kerberoasting) and T1021.002 / T1570 (lateral movement via PsExec to backup servers).

Step 3: telemetry feasibility: Kerberoasting requires Windows Security Event ID 4769 with RC4 encryption type filter (0x17), confirm collected and retained. PsExec detection requires process creation events on destination hosts showing PSEXESVC service creation, confirm EDR coverage on backup infrastructure, which is frequently absent.

Hypothesis A: "If a threat actor consistent with this campaign has operated in our environment, we expect to find Event ID 4769 tickets requested for service accounts with RC4 (0x17) encryption type from workstations that do not typically request service tickets, within the last 60 days."

Hypothesis B: "If lateral movement to backup infrastructure occurred, we expect to find PSEXESVC service creation events or psexec process creation on backup servers from source hosts outside the defined backup administration workstation group, within the last 60 days."

Null result handling: if Hypothesis A produces no findings, document the absence as a negative finding with the query, data range, and data sources checked. If Event ID 4769 is not being collected at all, the output is a telemetry gap finding, not a clean bill of health. If backup infrastructure has no EDR coverage, document as a coverage gap. That is itself a risk finding.

// analysis

The Null Result Problem

Most hunt programs treat null results as failures. This is analytically wrong. A null result from a well-formed hypothesis is evidence, it bounds where the organization has been and informs residual risk assessment. But not all null results are equal.

True negative: the technique was not observed. Telemetry was present and complete. Confidence: high. The organization can assert this technique was not used in this scope and period.
Telemetry-limited negative: the technique may have occurred but the required telemetry was absent or incomplete. Confidence: low. Correct output: a telemetry gap finding, not a clean result.
Scope-limited negative: technique not observed within the defined scope, but the scope excluded environments where it could have occurred. Confidence: partial. The finding needs a scope caveat.

A telemetry-limited negative presented as a true negative to leadership is a false assurance, and one that will be discovered when an incident occurs in exactly the uncovered area.

// remediation

Hunt Outputs That Feed the Program

A completed hunt should produce at least one of four structured outputs, the output type determines routing and downstream program impact.

Adversary finding

Confirmed or suspected malicious activity requiring escalation

IR escalation

Detection candidate

Hunt query that produced signal worth converting to a persistent detection rule

Detection engineering

Telemetry gap

Required evidence was absent, incomplete, or not retained for the hunt period

Coverage backlog

Negative finding

Hypothesis refuted with high-confidence telemetry, absence of activity confirmed

Hunt record library

The detection candidate output is the highest long-term value product of a hunt program. Every confirmed technique finding that is not converted to a detection rule is institutional knowledge that will need to be re-discovered the next time the technique appears.

// template

Hypothesis Template

IF [actor/technique condition] is present in our environment,
THEN we expect to observe [specific artifact/behavior],
IN [telemetry source and specific fields],
WITHIN [environment scope] over [time window].

Telemetry feasibility: [confirmed / partial / blocked]
Priority: [High / Medium / Low] - [rationale]

// conclusion

The Hypothesis Is the Product, Not the Query

"A hunt without a hypothesis is not hunting. It is reactive analysis with extra steps, and no way to know what you proved when you finish."

The query is an implementation detail. The hypothesis is the analytical work. A hunter who writes the query first and derives the hypothesis afterward has done the steps in the wrong order and will produce findings that cannot be confidently interpreted.

The closing test: look at the last five hunts your team completed. Can you state the hypothesis each one was testing? Can you state whether the hypothesis was confirmed, refuted, or telemetry-limited? If not, those hunts consumed analyst time and produced findings that cannot be built upon.

Key references

MITRE ATT&CK Navigator for coverage gap analysis; Sqrrl's "A Framework for Cyber Threat Hunting" (2015); TaHiTI Threat Hunting Methodology; David Bianco's Pyramid of Pain for IOC vs. TTP hunting value comparison.