Opening Framing: The Art of Triage
Alerts flood in constantly. Dozens, hundreds, sometimes thousands per day. Most are noise—false positives, benign activity, or low-priority events. Hidden among them are real threats that require immediate action.
Triage is the skill of quickly separating signal from noise. It's not about being fast for speed's sake—it's about efficiently identifying what matters so you can focus your investigation efforts where they'll have impact.
This week covers the triage mindset, systematic investigation methodology, and the practical skills needed to turn alerts into answers.
Key insight: Good triage isn't about closing tickets quickly. It's about making accurate decisions quickly. A fast wrong decision is worse than a slightly slower right one.
1) The Triage Mindset
Effective triage requires a specific mental approach:
Triage Questions (in order):
1. What triggered this alert?
- Understand the detection logic
- What was the SIEM/tool looking for?
2. Is this expected behavior?
- For this user/system/time?
- Is there a legitimate explanation?
3. What's the potential impact?
- If this is real, how bad is it?
- What assets are at risk?
4. What additional context do I need?
- What other logs/data would help?
- What questions remain unanswered?
5. What's my decision?
- False positive → Close with documentation
- Needs investigation → Dig deeper
- Confirmed threat → Escalate/respond
Classification Framework:
Alert Classifications:
True Positive (TP):
- Alert correctly identified malicious activity
- Action: Investigate and respond
False Positive (FP):
- Alert triggered on benign activity
- Action: Close, consider tuning rule
Benign True Positive (BTP):
- Alert correctly identified activity that looks suspicious
but is actually authorized/expected
- Example: Pen test, authorized admin activity
- Action: Close, document exception
True Negative (TN):
- No alert, no threat (working as expected)
- Not visible in alert queue
False Negative (FN):
- Threat present but no alert
- Only discovered through hunting or incident
- Action: Improve detection
Priority Matrix:
Impact
Low Medium High
┌────────┬────────┬────────┐
High │ P3 │ P2 │ P1 │
Confidence ├────────┼────────┼────────┤
Medium │ P4 │ P3 │ P2 │
├────────┼────────┼────────┤
Low │ P5 │ P4 │ P3 │
└────────┴────────┴────────┘
P1: Immediate response (confirmed critical threat)
P2: Urgent investigation (high-impact possible threat)
P3: Standard investigation (moderate concern)
P4: Low priority (investigate when time permits)
P5: Minimal concern (quick review, likely close)
Key insight: Prioritization prevents thrashing. Without clear priority, analysts bounce between alerts without completing investigations.
2) Initial Alert Assessment
The first 2-5 minutes determine your path forward:
Initial Assessment Checklist:
□ Read the alert details
- What rule/signature triggered?
- What are the key fields?
□ Identify the entities
- Source: Who/what initiated?
- Destination: Who/what was targeted?
- User: Which account involved?
□ Check timestamps
- When did this occur?
- Business hours or off-hours?
- One-time or repeated?
□ Assess asset criticality
- Is the affected system important?
- What data/access does it have?
□ Quick context lookup
- Is this user/system known?
- Any recent tickets for same entity?
- Known maintenance or testing?
Entity Analysis:
For Source IP/Host:
Internal:
- Who owns this system?
- What's its normal function?
- Who normally uses it?
- Any recent alerts for it?
External:
- Reputation check (threat intel)
- Geolocation (expected region?)
- ASN/ownership
- Historical activity in logs
For User Account:
- What's their role?
- Normal working hours?
- Normal systems accessed?
- Recent password changes?
- Privileged account?
For Process/File:
- Known good or suspicious?
- Hash reputation
- Signed/unsigned?
- Normal for this system?
Quick Wins - Fast Closure Patterns:
Patterns that often indicate false positives:
1. Known scanning/testing
- Source is authorized scanner
- Scheduled vulnerability assessment
- Red team exercise
2. Administrative activity
- IT admin doing expected work
- Matches change ticket
- Normal tool for that role
3. Repeated known FP
- Same alert type closed as FP before
- Same entities, same context
- Note: Still document, consider tuning
4. Automated systems
- Backup jobs
- Monitoring systems
- Patch management
Always verify—don't assume!
Key insight: Initial assessment should take 2-5 minutes. If you can't classify in that time, it needs deeper investigation.
3) Investigation Methodology
When initial assessment indicates a real threat, investigate systematically:
Investigation Framework:
1. SCOPE
- What systems/users are involved?
- What's the timeframe?
- What data sources do I need?
2. EVIDENCE
- Collect relevant logs
- Preserve artifacts
- Document everything
3. ANALYZE
- Build timeline of events
- Identify attack progression
- Determine root cause
4. CONCLUDE
- What happened?
- What's the impact?
- What action is needed?
Building the Timeline:
Timeline Template:
Time (UTC) Source Event Significance
─────────────────────────────────────────────────────────────────
09:15:32 Email GW Phishing email received Initial delivery
09:17:45 Endpoint Attachment opened User interaction
09:17:48 Endpoint PowerShell executed Payload execution
09:18:02 Firewall Outbound to C2 IP C2 established
09:25:33 AD Logs Service account auth Lateral movement
09:28:15 File Server Large file access Data staging
09:35:00 Firewall Large outbound transfer Exfiltration
Timeline reveals:
- Attack progression
- Time to detect (gap between first event and alert)
- Scope of compromise
- Response urgency
Pivot Analysis:
Pivoting: Using one finding to discover more
Found suspicious IP? Pivot to find:
→ All connections to/from that IP
→ All users who connected to it
→ All systems that contacted it
→ DNS queries for associated domains
Found malicious file hash? Pivot to find:
→ All systems with that hash
→ Parent process that created it
→ Child processes it spawned
→ Network connections it made
Found compromised user? Pivot to find:
→ All authentications by that user
→ All systems accessed
→ All files touched
→ Any new accounts created
Each pivot may reveal new IOCs to pivot on again
Key insight: Investigation is iterative. Each finding opens new questions. Keep pivoting until you understand the full scope.
4) Common Alert Scenarios
Learn patterns for frequent alert types:
Brute Force / Failed Logins:
Alert: Multiple failed logins detected
Triage questions:
- How many failures? Over what time?
- Same source hitting multiple accounts? (spray)
- Same account from multiple sources? (distributed)
- Any successes after failures? (compromised!)
Investigation:
1. Query all auth events for source IP/user
2. Check if any succeeded
3. If success: investigate post-auth activity
4. Check threat intel for source IP
5. Determine if targeted or opportunistic
Common outcomes:
- External scanning → Block IP, close
- Credential spray with success → Incident!
- User forgot password → Close as BTP
- Service account misconfigured → Fix config
Malware Detection:
Alert: Malware detected on endpoint
Triage questions:
- Was it blocked or just detected?
- Known malware family or generic detection?
- How did it arrive? (email, web, USB)
- Was it executed?
Investigation:
1. Check EDR for full process tree
2. Identify delivery mechanism
3. Check for persistence mechanisms
4. Look for lateral movement
5. Search for same hash elsewhere
Common outcomes:
- Blocked before execution → Improve prevention, close
- Executed but contained → Clean system, investigate scope
- Active infection → Major incident response
Data Exfiltration Alert:
Alert: Large outbound data transfer
Triage questions:
- What system initiated?
- What was the destination?
- How much data?
- What type of data?
Investigation:
1. Identify user and process
2. Determine destination reputation
3. Check if business-justified
4. Review what data was accessed
5. Look for prior suspicious activity
Common outcomes:
- Backup to cloud service → Close as BTP
- Developer uploading to GitHub → Policy reminder
- Unknown destination, sensitive data → Incident!
Suspicious Process Execution:
Alert: Suspicious PowerShell/CMD activity
Triage questions:
- What command was executed?
- Who ran it?
- What was the parent process?
- Is this normal for this user/system?
Investigation:
1. Full command line analysis
2. Check parent process chain
3. Review user's recent activity
4. Check for encoded commands
5. Look for network connections
Common outcomes:
- Admin running script → Verify authorization
- Encoded download command → Likely malicious
- IT automation tool → Close as BTP
Key insight: Pattern recognition speeds triage. As you see more alerts, you'll recognize scenarios faster.
5) Documentation and Handoff
Investigation without documentation is incomplete:
Ticket Documentation Standard:
Summary:
- One-line description of what happened
Classification:
- TP / FP / BTP
- Severity / Priority
Timeline:
- Key events in chronological order
Investigation Steps:
- What you searched
- What you found
- Queries used (for reproducibility)
Findings:
- Root cause (if determined)
- Scope of impact
- IOCs identified
Actions Taken:
- Containment measures
- Remediation steps
- Escalations made
Recommendations:
- Detection improvements
- Prevention measures
- Follow-up needed
Writing Good Notes:
Bad note:
"Checked logs, looks fine, closing as FP"
Good note:
"Alert triggered by user jsmith authenticating to
server SQL01 from IP 10.0.1.50 at 14:32 UTC.
Verified: jsmith is DBA, SQL01 is their assigned server,
source IP is their workstation per CMDB. Activity occurred
during business hours and matches normal pattern.
Classification: Benign True Positive
Action: No action needed, closing.
Recommendation: Consider excluding DBA group from this rule
for known database servers."
The good note:
- Explains what was checked
- Shows reasoning
- Enables future reference
- Suggests improvement
Escalation Communication:
When escalating, provide:
1. What happened (brief summary)
2. Why it's being escalated (severity/complexity)
3. What's been done (actions taken)
4. What's needed (specific ask)
5. Urgency level (how fast is response needed)
Example escalation:
"Escalating to Tier 2: Confirmed malware execution on
FINANCE-WS-42 (user: jdoe, Finance Dept).
Malware: Emotet dropper (hash: abc123...)
Delivery: Phishing email at 10:15 UTC
Execution: 10:23 UTC
C2 connection: Observed to 185.x.x.x
Actions taken: Endpoint isolated via EDR
Need: Deep malware analysis, scope assessment across
Finance department, potential IR escalation.
Urgency: High - active threat, user has access to
sensitive financial data."
Key insight: Your notes are the organizational memory. Future analysts (including future you) will rely on them.
Real-World Context: Triage Under Pressure
Real SOC triage has unique challenges:
Alert Fatigue: When 90% of alerts are false positives, it's tempting to close quickly without investigating. Discipline matters—the one you skip might be real.
Time Pressure: Queue is growing, shift is ending, management wants metrics. Resist the urge to rush. Accurate triage is more valuable than fast triage.
Uncertainty: Many alerts end with "probably benign but not certain." Document your uncertainty and reasoning. If it comes back, you'll know what was checked.
MITRE ATT&CK Application:
- Technique Identification: Map alerts to ATT&CK techniques
- Attack Chain Analysis: Use tactics to understand progression
- Gap Identification: What techniques led to this alert?
Key insight: Triage is a skill developed over thousands of alerts. Every investigation teaches you something.
Guided Lab: Alert Triage Simulation
Practice triaging alerts using systematic methodology.