Opening Framing: When Alerts Become Incidents
Triage confirmed a real threat. Now what? An alert becomes an incident when it requires coordinated response beyond routine handling. Incidents are high-stakes situations where speed and accuracy both matter—and they often conflict.
Incident response is a discipline with established frameworks, roles, and processes. When done well, it limits damage, preserves evidence, and enables recovery. When done poorly, it extends outages, destroys evidence, and lets attackers maintain access.
This week covers the foundations: IR frameworks, phases, roles, and the critical decisions made in the first hours of an incident.
Key insight: Incidents are won or lost in preparation. The team that's practiced and planned responds effectively. The team that's improvising makes costly mistakes.
1) Incident Response Frameworks
Multiple frameworks guide incident response:
NIST SP 800-61 (Most Common):
┌──────────────┐
│ 1.Preparation│ → Policies, tools, training, planning
└──────┬───────┘
↓
┌──────────────────────┐
│ 2.Detection & │ → Identify and analyze incidents
│ Analysis │
└──────┬───────────────┘
↓
┌──────────────────────┐
│ 3.Containment, │ → Stop the bleeding, remove threat,
│ Eradication, │ restore systems
│ Recovery │
└──────┬───────────────┘
↓
┌──────────────────────┐
│ 4.Post-Incident │ → Lessons learned, improvements
│ Activity │
└──────────────────────┘
SANS Incident Response Process:
Six phases (more granular):
1. Preparation
- IR plan and policies
- Team training
- Tools and resources
2. Identification
- Detect potential incidents
- Determine scope
- Assign severity
3. Containment
- Short-term: Stop immediate damage
- Long-term: Temporary fixes while investigating
4. Eradication
- Remove malware
- Close vulnerabilities
- Eliminate attacker access
5. Recovery
- Restore systems
- Return to operations
- Monitor for recurrence
6. Lessons Learned
- What happened
- What worked
- What to improve
Incident Severity Levels:
Severity 1 - Critical:
- Business operations severely impacted
- Data breach confirmed
- Ransomware active
- Executive involvement required
- All hands on deck
Severity 2 - High:
- Significant system compromise
- Sensitive data potentially exposed
- Major business function affected
- Dedicated IR team response
Severity 3 - Medium:
- Limited compromise
- No sensitive data affected
- Single system or small scope
- Standard IR procedures
Severity 4 - Low:
- Minor security event
- Quickly contained
- Minimal business impact
- SOC handles within normal operations
Key insight: Severity drives resource allocation. Over-escalate and you waste resources. Under-escalate and the incident grows.
2) Incident Response Roles
Clear roles prevent chaos during incidents:
Core IR Team Roles:
Incident Commander (IC):
- Overall incident leadership
- Makes key decisions
- Coordinates resources
- Manages communication
- Single point of authority
Technical Lead:
- Leads technical investigation
- Directs analysis efforts
- Recommends containment actions
- Reports findings to IC
Communications Lead:
- Internal communications
- External communications (if needed)
- Status updates
- Stakeholder management
Scribe/Documenter:
- Records all actions
- Maintains timeline
- Tracks decisions
- Preserves evidence log
Subject Matter Experts (SMEs):
- Network analyst
- Endpoint analyst
- Malware analyst
- System administrators
- Application owners
RACI Matrix for IR:
R = Responsible (does the work)
A = Accountable (makes decisions)
C = Consulted (provides input)
I = Informed (kept updated)
IC Tech Comms Legal IT Exec
Lead Lead
─────────────────────────────────────────────────────────
Declare incident A/R C I I I I
Containment A R I C R I
Communication A C R C I I
Recovery A C I I R I
Legal decisions C I C A/R I I
Business decisions C I C C I A/R
Escalation Path:
Typical escalation:
SOC Analyst → SOC Lead → IR Team → IR Manager → CISO → Executive Team
When to escalate:
- Severity increases
- Scope expands
- Business impact grows
- Legal/regulatory implications
- Media attention likely
- Resources insufficient
Escalation information:
- Current situation summary
- Actions taken
- Resources needed
- Decisions required
- Recommended actions
Key insight: Role clarity matters most under stress. When everyone knows their job, the team functions. When roles overlap or are unclear, chaos ensues.
3) Detection and Analysis Phase
Before you can respond, you must understand what happened:
Incident Detection Sources:
Automated:
- SIEM alerts
- EDR detections
- IDS/IPS alerts
- Antivirus alerts
- Log analysis anomalies
Manual:
- User reports (very common!)
- Helpdesk tickets
- Threat hunting findings
- External notification (Law enforcement, ISPs)
Analysis and Validation:
Is it real? (False Positive Elimination):
- Check multiple sources
- Correlate events
- Verify with users
- Check for authorized changes (change management)
Scope Determination:
- Who/what is affected?
- How did they get in? (Entry vector)
- Persistence mechanisms?
- Lateral movement?
- Data exfiltration?
Documentation (The Start):
Start the Incident Log immediately:
- Time of detection
- Who detected it
- Initial observations
- Systems involved
- Data potentially involved
Don't touch evidence systems yet!
- RAM capture first (volatile)
- Disk image second
- Isolate but keep running (if analyzing memory)
Key insight: Properly scoping the incident is crucial. Under-scoping leads to re-infection because you didn't find all the backdoors.
4) Containment, Eradication, and Recovery
Stopping the bleeding and fixing the root cause:
Containment Strategies:
Isolation (Air Gap):
- Physical disconnect (pull the cable)
- Virtual disconnect (vSwitch / VLAN isolation)
- Advantages: Stops spread immediately
- Disadvantages: Kills active connections, alerts attacker
Segmentation:
- Firewall rules to block C2
- Block specific ports/IPs
- Advantages: Keeps system analyzing
- Disadvantages: Attacker might have other paths
Account Containment:
- Disable compromised accounts
- Force password resets
- Revoke tokens
- Disable Golden Tickets
Eradication:
Getting rid of the bad:
- Reimage infected systems (Safest!)
- Restore from known-good backups
- Remediate vulnerability (Apply patch)
- Remove malware artifacts (If cleaning)
- Reset all credentials involved
Ideally, rebuild rather than clean. You can rarely be 100% sure you removed everything.
Recovery:
Bringing business back:
- Restore systems to production
- Verify functionality
- Enhanced monitoring (watch for return)
- Phased rollout
- Communicate with stakeholders
Criteria for recovery:
- Root cause identified and fixed?
- All malware removed/systems rebuilt?
- Monitoring in place?
- Business sign-off?
5) Post-Incident Activity
Learning from the crisis:
The "After Action Report" (AAR):
1. Executive Summary
- High-level overview
- Impact statement
- Cost estimate
2. Timeline
- Detailed sequence of events
- Dwell time (detection to containment)
3. Technical Findings
- Root cause analysis
- IOCs found
- Vulnerabilities exploited
4. Lessons Learned (The "Hot Wash"):
- What went well?
- What went wrong?
- Were procedures followed?
- Were tools effective?
5. Recommendations
- Policy changes
- Technical controls needed
- Training gaps
- Process improvements
Key insight: Never waste a good incident. The improvements driven by a real incident are often the most valuable security controls.
Real-World Context: Ransomware Response
How IR principles apply to modern threats:
The Ransomware Crisis: Modern ransomware isn't just malware; it's a human-operated attack. Attackers dwell for days, exfiltrate data for blackmail, destroy backups, and *then* encrypt.
Response Challenges:
- Containment vs. Preservation: Do you shut down to save files, or keep running to capture RAM keys?
- Communication: How to communicate when email system is encrypted?
- Legal: Is paying the ransom legal in your jurisdiction?
Case Study (Maersk): During NotPetya, Maersk had to rebuild their entire AD infrastructure. They succeeded because they found one surviving domain controller in a remote office that was offline during the attack. Lesson: Resilience matters.
Guided Lab: Scenario Tabletop
Practice applying the IR framework to a scenario.