Opening Framing: When Alerts Become Incidents
Triage confirmed a real threat. Now what? An alert becomes an incident when it requires coordinated response beyond routine handling. Incidents are high-stakes situations where speed and accuracy both matter—and they often conflict.
Incident response is a discipline with established frameworks, roles, and processes. When done well, it limits damage, preserves evidence, and enables recovery. When done poorly, it extends outages, destroys evidence, and lets attackers maintain access.
This week covers the foundations: IR frameworks, phases, roles, and the critical decisions made in the first hours of an incident.
Key insight: Incidents are won or lost in preparation. The team that's practiced and planned responds effectively. The team that's improvising makes costly mistakes.
1) Incident Response Frameworks
Multiple frameworks guide incident response:
NIST SP 800-61 (Most Common):
┌──────────────┐
│ 1.Preparation│ → Policies, tools, training, planning
└──────┬───────┘
↓
┌──────────────────────┐
│ 2.Detection & │ → Identify and analyze incidents
│ Analysis │
└──────┬───────────────┘
↓
┌──────────────────────┐
│ 3.Containment, │ → Stop the bleeding, remove threat,
│ Eradication, │ restore systems
│ Recovery │
└──────┬───────────────┘
↓
┌──────────────────────┐
│ 4.Post-Incident │ → Lessons learned, improvements
│ Activity │
└──────────────────────┘
SANS Incident Response Process:
Six phases (more granular):
1. Preparation
- IR plan and policies
- Team training
- Tools and resources
2. Identification
- Detect potential incidents
- Determine scope
- Assign severity
3. Containment
- Short-term: Stop immediate damage
- Long-term: Temporary fixes while investigating
4. Eradication
- Remove malware
- Close vulnerabilities
- Eliminate attacker access
5. Recovery
- Restore systems
- Return to operations
- Monitor for recurrence
6. Lessons Learned
- What happened
- What worked
- What to improve
Incident Severity Levels:
Severity 1 - Critical:
- Business operations severely impacted
- Data breach confirmed
- Ransomware active
- Executive involvement required
- All hands on deck
Severity 2 - High:
- Significant system compromise
- Sensitive data potentially exposed
- Major business function affected
- Dedicated IR team response
Severity 3 - Medium:
- Limited compromise
- No sensitive data affected
- Single system or small scope
- Standard IR procedures
Severity 4 - Low:
- Minor security event
- Quickly contained
- Minimal business impact
- SOC handles within normal operations
Key insight: Severity drives resource allocation. Over-escalate and you waste resources. Under-escalate and the incident grows.
2) Incident Response Roles
Clear roles prevent chaos during incidents:
Core IR Team Roles:
Incident Commander (IC):
- Overall incident leadership
- Makes key decisions
- Coordinates resources
- Manages communication
- Single point of authority
Technical Lead:
- Leads technical investigation
- Directs analysis efforts
- Recommends containment actions
- Reports findings to IC
Communications Lead:
- Internal communications
- External communications (if needed)
- Status updates
- Stakeholder management
Scribe/Documenter:
- Records all actions
- Maintains timeline
- Tracks decisions
- Preserves evidence log
Subject Matter Experts (SMEs):
- Network analyst
- Endpoint analyst
- Malware analyst
- System administrators
- Application owners
RACI Matrix for IR:
R = Responsible (does the work)
A = Accountable (makes decisions)
C = Consulted (provides input)
I = Informed (kept updated)
IC Tech Comms Legal IT Exec
Lead Lead
─────────────────────────────────────────────────────────
Declare incident A/R C I I I I
Containment A R I C R I
Communication A C R C I I
Recovery A C I I R I
Legal decisions C I C A/R I I
Business decisions C I C C I A/R
Escalation Path:
Typical escalation:
SOC Analyst → SOC Lead → IR Team → IR Manager → CISO → Executive Team
When to escalate:
- Severity increases
- Scope expands
- Business impact grows
- Legal/regulatory implications
- Media attention likely
- Resources insufficient
Escalation information:
- Current situation summary
- Actions taken
- Resources needed
- Decisions required
- Recommended actions
Key insight: Role clarity matters most under stress. When everyone knows their job, the team functions. When roles overlap or are unclear, chaos ensues.
3) Detection and Analysis Phase
Before you can respond, you must understand what happened:
Incident Detection Sources:
Automated:
- SIEM alerts
- EDR detections
- IDS/IPS alerts
- Antivirus alerts
- Log analysis anomalies
Manual:
- User reports (very common!)
- Helpdesk tickets
- Threat hunting findings
- External notification (Law enforcement, ISPs)
Analysis and Validation:
Is it real? (False Positive Elimination):
- Check multiple sources
- Correlate events
- Verify with users
- Check for authorized changes (change management)
Scope Determination:
- Who/what is affected?
- How did they get in? (Entry vector)
- Persistence mechanisms?
- Lateral movement?
- Data exfiltration?
Documentation (The Start):
Start the Incident Log immediately:
- Time of detection
- Who detected it
- Initial observations
- Systems involved
- Data potentially involved
Don't touch evidence systems yet!
- RAM capture first (volatile)
- Disk image second
- Isolate but keep running (if analyzing memory)
Key insight: Properly scoping the incident is crucial. Under-scoping leads to re-infection because you didn't find all the backdoors.
4) Containment, Eradication, and Recovery
Stopping the bleeding and fixing the root cause:
Containment Strategies:
Isolation (Air Gap):
- Physical disconnect (pull the cable)
- Virtual disconnect (vSwitch / VLAN isolation)
- Advantages: Stops spread immediately
- Disadvantages: Kills active connections, alerts attacker
Segmentation:
- Firewall rules to block C2
- Block specific ports/IPs
- Advantages: Keeps system analyzing
- Disadvantages: Attacker might have other paths
Account Containment:
- Disable compromised accounts
- Force password resets
- Revoke tokens
- Disable Golden Tickets
Eradication:
Getting rid of the bad:
- Reimage infected systems (Safest!)
- Restore from known-good backups
- Remediate vulnerability (Apply patch)
- Remove malware artifacts (If cleaning)
- Reset all credentials involved
Ideally, rebuild rather than clean. You can rarely be 100% sure you removed everything.
Recovery:
Bringing business back:
- Restore systems to production
- Verify functionality
- Enhanced monitoring (watch for return)
- Phased rollout
- Communicate with stakeholders
Criteria for recovery:
- Root cause identified and fixed?
- All malware removed/systems rebuilt?
- Monitoring in place?
- Business sign-off?
5) Post-Incident Activity
Learning from the crisis:
The "After Action Report" (AAR):
1. Executive Summary
- High-level overview
- Impact statement
- Cost estimate
2. Timeline
- Detailed sequence of events
- Dwell time (detection to containment)
3. Technical Findings
- Root cause analysis
- IOCs found
- Vulnerabilities exploited
4. Lessons Learned (The "Hot Wash"):
- What went well?
- What went wrong?
- Were procedures followed?
- Were tools effective?
5. Recommendations
- Policy changes
- Technical controls needed
- Training gaps
- Process improvements
Key insight: Never waste a good incident. The improvements driven by a real incident are often the most valuable security controls.
Real-World Context: Ransomware Response
How IR principles apply to modern threats:
The Ransomware Crisis: Modern ransomware isn't just malware; it's a human-operated attack. Attackers dwell for days, exfiltrate data for blackmail, destroy backups, and *then* encrypt.
Response Challenges:
- Containment vs. Preservation: Do you shut down to save files, or keep running to capture RAM keys?
- Communication: How to communicate when email system is encrypted?
- Legal: Is paying the ransom legal in your jurisdiction?
Case Study (Maersk): During NotPetya, Maersk had to rebuild their entire AD infrastructure. They succeeded because they found one surviving domain controller in a remote office that was offline during the attack. Lesson: Resilience matters.
Guided Lab: Scenario Tabletop
Practice applying the IR framework to a scenario.
Scenario: The "Slow" PC
Input: Finance user reports PC is "slow" and fans are spinning loud. Endpoint is meant for spreadsheets only.
Phase 1: Identification
- Triage: Check CPU usage. Process `svchost.exe` running from `C:\Temp\` using 90% CPU.
- Analysis: `C:\Temp\` is not a valid path for `svchost`. Hash check reveals a crypto-miner.
- Scope: Creating SIEM query for that hash. 5 other machines found.
Phase 2: Containment
- Isolate the 6 affected hosts from network (VLAN quarantine).
- Prevent C2 communication at firewall.
Phase 3: Eradication
- Reimage affected machines.
- Identify entry point: Phishing email sent yesterday.
- Block sender domain. Remove email from other inboxes.
Phase 4: Recovery
- Restore user data from OneDrive.
- Re-enable network access.
- Reset user passwords.
Reflection
- Why quarantine instead of just killing the process?
- How did finding 5 other machines change the severity?
Week 6 Outcome Check
By the end of this week, you should be able to:
- Explain the NIST and SANS incident response life cycles
- Map roles and responsibilities during an incident
- Differentiate between an event and an incident
- Describe the goals of containment, eradication, and recovery
- Understand the importance of the post-incident review
🎯 Hands-On Labs (Free & Essential)
Rehearse the incident response life cycle before moving to reading resources.
🎮 TryHackMe: Incident Response
What you'll do: Walk through IR phases, roles, and real-world decision points.
Why it matters: IR is a structured process, not improvised firefighting.
Time estimate: 1.5-2 hours
📝 Lab Exercise: IR Timeline + Phase Mapping
Task: Build a timeline from a short incident narrative and map actions to NIST phases.
Deliverable: Timeline with phase labels + containment/eradication/recovery steps.
Why it matters: Clear timelines make response coordination possible.
Time estimate: 60-90 minutes
🏁 PicoCTF Practice: Forensics (Evidence Basics)
What you'll do: Solve beginner forensics challenges focused on evidence artifacts.
Why it matters: Evidence integrity is essential for incident response.
Time estimate: 1-2 hours
💡 Lab Tip: Document every action with timestamps to preserve evidence and decision context.
🛡️ Alert Tuning & False Positive Reduction
A noisy SOC burns out analysts. Tuning detections is defensive engineering: you reduce alert fatigue without losing coverage.
Tuning workflow:
- Baseline normal behavior first
- Add allowlists for known-good activity
- Use thresholds and suppressions carefully
- Track false positives by rule and source
📚 Building on CSY101 Week-14: Align tuning with documented control objectives and evidence.
Resources
- NIST SP 800-61 Rev 2 · The Computer Security Incident Handling Guide · Resource ID: csy201_w6_r1
- SANS Incident Handler's Handbook · Practical guide to the 6 steps · Resource ID: csy201_w6_r2
Checkpoint Questions
- What is the difference between an event and an incident?
- Why is "Preparation" considered the most important phase?
- What is the main goal of the Containment phase?
- Who is responsible for the overall management of an incident?
- Why should you avoid rebooting a compromised machine immediately?
Week 06 Quiz
Test your understanding of incident response frameworks, roles, and containment decisions.
Format: 10 multiple-choice questions. Passing score: 70%. Time: Untimed.
Take QuizWeekly Reflection
Reflection Prompt:
Incident response is often described as "emergency services for digital assets." Reflect on how the structured approach (Preparation -> Detection -> Containment...) helps manage the panic and chaos of a real cyber attack. Why is strict adherence to process (like Chain of Custody) so critical even when time is short?