Triage Knowledge Transfer: Garet P.

// introduction

The Knowledge That Lives Only in One Person's Head

Every mature SOC has the same structural problem: triage quality is heavily analyst-dependent. A senior and junior analyst looking at the same alert produce different outcomes, different escalation rates, and different investigation paths. This is framed as an "experience gap", inevitable, closing naturally over time. That framing is accurate and an abdication. It treats the knowledge transfer problem as unsolvable rather than as an engineering problem.

The organizational cost: inconsistent triage produces missed escalations, unnecessary P1 noise, analyst burnout in junior tiers, and a SOC whose quality is directly correlated with staff retention, the single most volatile variable in the business.

Expert triage judgment is not mystical pattern recognition that cannot be articulated. It is a set of heuristics, decision rules, and environmental priors that have never been made explicit because nobody asked the senior analyst to articulate them. Making them explicit is a program design problem, and it has tractable solutions.

// cognitive

What Expert Triage Judgment Actually Is

Gary Klein's Recognition-Primed Decision (RPD) model explains it: experts do not evaluate options and choose the best one. They recognize situational patterns and match them to previously successful responses. Expert analysts are not doing slower, more rigorous analysis than junior analysts, they are doing faster, pattern-matched analysis that bypasses explicit deliberation. The result looks like intuition. The mechanism is accumulated pattern libraries.

What the expert's pattern library contains that the junior's does not:

Environmental priors: knowledge of what is normal for this specific organization's infrastructure, not just what is normal generically
False positive signatures: accumulated knowledge of which alert types reliably produce benign explanations in this specific environment
Escalation triggers: specific combinations of fields, timing, or context that have previously indicated real threats, often combinations that are not in any runbook
Investigative shortcuts: the three lookups that tell you in ninety seconds what would otherwise take twenty minutes

Sees the parent process. Recognizes the software deployment tool that fires this alert every Tuesday during patch cycles. Checks command line, matches known deployment pattern. Closes as known FP in 45 seconds.

Sees encoded PowerShell. Escalates immediately, encoded commands are in the runbook as suspicious. Spends 25 minutes documenting. A senior analyst explains the patch cycle context. Learns this instance. Doesn't learn the general principle.

The junior analyst did not make an error, they followed the runbook correctly. The runbook was incomplete because it did not encode the environmental context that made the senior analyst's judgment possible. This is a documentation failure, not a skills failure.

// methodology

Extracting Tacit Heuristics: the Elicitation Process

The Critical Decision Method (CDM) is an interview technique from naturalistic decision-making research directly applicable to SOC knowledge extraction. CDM asks the expert to walk through a specific past case in retrospective detail, probing for cues noticed, interpretations formed, and decisions made, with attention to moments where the expert's path diverged from what a novice would have done.

A practical elicitation approach without formal CDM training: select five to ten recent alerts the senior analyst triaged quickly that a junior had previously escalated incorrectly or slowly. For each, ask the senior analyst to narrate their triage process aloud, pausing at each decision point to explain what they noticed and why it changed their assessment. The facilitator's job is to resist the answer "I just knew" and probe for the specific observable that triggered the recognition.

The contrast case technique: present the senior analyst with a confirmed malicious version and a confirmed benign version of the same alert type simultaneously. Ask: what is different between these two? What single field most changes your assessment? The differences the senior analyst identifies are the discriminating features their pattern library uses.

Practical note

Elicitation sessions work best as recorded conversations, not written exercises. Senior analysts articulate reasoning more naturally in speech. Transcribe and structure afterward, don't ask them to produce structured heuristics directly, because that asks them to do the cognitive work that makes articulation hard.

// worked examples

Six Explicit Heuristics: with Teach-As Framing

Each heuristic follows the same structure: observable condition → interpretive weight → decision guidance. The interpretive weight is what runbooks typically omit.

H01 Parent process anomaly as stronger signal than child identity

Rule

IF suspicious child AND parent is expected for that child → weight: low. IF suspicious child AND parent is unexpected → weight: high. The identity of the child matters less than whether the parent could plausibly have spawned it. cmd.exe from explorer.exe is unremarkable. cmd.exe from a PDF reader is not.

Teach as

"Before asking what the child is doing, ask why this parent would have created it."

H02 Execution timing relative to business patterns

Rule

IF alert matches known automation type AND execution is within the known automation window → weight: low, verify pattern match. IF same alert AND execution is outside the automation window → weight: elevated, treat as anomalous even if technically identical. The same process at 02:15 during the patch window and at 14:30 on a Wednesday are not the same event.

Teach as

"Would this activity make sense if a legitimate administrator were doing it right now? If not, why is it happening now?"

H03 User context as risk multiplier

Rule

IF suspicious activity AND standard user → assess at face value. IF same activity AND privileged account → escalate one severity tier regardless of activity-level assessment. A standard user running a suspicious script may be a misconfigured application. The same script from a domain admin is a potential lateral movement or credential access event. The blast radius if malicious is not the same.

Teach as

"Who is doing this? Now ask yourself what an attacker could do from that account if this isn't what it looks like."

H04 Uniqueness as signal within the alert population

Rule

IF the alert type has fired frequently AND this instance matches the prior pattern → weight: low. IF the alert type has fired frequently AND this instance is the only one with a specific field value → weight: high. A hundred PowerShell alerts from a dozen hosts is alert fatigue. One PowerShell alert with a command line that appears in no other alert is a signal. Junior analysts see alert type. Senior analysts see population distribution.

Teach as

"Is this one different from all the others? If yes, why?"

H05 Sequence coherence across host/user activity

Rule

IF an alert is individually low severity AND recent alert history for the host or user shows a reconnaissance → execution → persistence pattern → treat the sequence as high severity regardless of individual alert ratings. A failed login, followed by a successful login, followed by PowerShell execution, followed by a new scheduled task is a kill chain. Junior analysts triage each alert independently. Senior analysts maintain a running mental model of recent host and user activity.

Teach as

"What happened on this host or for this user in the last two hours? Does this alert make more sense as part of a sequence?"

H06 The "explain it benignly" test

Rule

IF you can construct a specific plausible benign explanation AND the evidence supports it → document and close. IF no plausible benign explanation survives contact with the evidence → escalate regardless of individual alert severity. Senior analysts constantly construct and test benign hypotheses: "could this be the backup job?", "is this user a developer who legitimately runs scripts?" When no benign explanation fits all the evidence, that absence is the signal.

Teach as

"Tell me the benign story for this alert. If you can't tell one that fits all the evidence, escalate."

// transfer mechanisms

Getting Heuristics into the Analyst Workflow

Individual heuristics are useful but scatter-shot. The output of elicitation should be structured into alert-category-specific triage decision trees, lightweight reasoning scaffolds that apply heuristics in the order they most efficiently resolve ambiguity for a given alert type.

Alert-embedded decision guides

Triage trees embedded directly in the alert template or SOAR playbook, heuristics appear at the point of use, not in a separate wiki

High / Low

False positive catalog

Documented catalog of known FP patterns by alert type and environment context. Every senior analyst FP close takes 30 seconds longer to produce one catalog entry

Highest ROI

Escalation debrief protocol

Every escalation triggers a structured two-minute debrief: what did the junior see, what did the senior see differently, what heuristic resolved it

High / Very low

Annotated case library

Searchable library of past cases with senior analyst triage commentary, junior analysts look up similar cases and read the expert reasoning

High / Medium

Think-aloud pair triage

Senior and junior triage the same alert simultaneously, verbalizing reasoning. Structured debrief focuses on where reasoning diverged

High / Intensive

Retrospective case review

Weekly structured review of a closed case: what was found, what the junior's initial assessment was, and what the senior would have done differently and why

Systematic / Medium

The false positive catalog deserves specific attention. It is the most direct encoding of environmental prior knowledge and the highest-ROI knowledge transfer artifact a SOC team can produce. Format: alert type → specific condition combination → known benign explanation → confirmation step that verifies the benign explanation applies to this instance. A catalog with 200 well-documented FP patterns eliminates more unnecessary junior analyst investigation time than any other single investment.

// conclusion

The Senior Analyst's Real Leverage Is What They Write Down

"Every undocumented heuristic is one resignation letter away from being lost."

A senior analyst who triages five hundred alerts a week is generating individual value. A senior analyst who spends two hours extracting and documenting the heuristics behind those decisions is generating institutional value that scales across every junior analyst on the team indefinitely.

The closing question: if the two most experienced analysts on your team left tomorrow, how long before your junior triage quality reached current levels again? If the answer is "years", the knowledge transfer problem is also a business continuity problem, and it has been sitting in plain sight the whole time.

Key references

Gary Klein, "Sources of Power: How People Make Decisions" (1998), the foundational RPD model; Klein et al., "The Critical Decision Method" for the elicitation protocol; SANS analyst development resources on structured triage methodology.