CSY201 Week 06 - Rehearse the incident response life cycle before moving to reading resources.

Opening Framing: When Alerts Become Incidents

Triage confirmed a real threat. Now what? An alert becomes an incident when it requires coordinated response beyond routine handling. Incidents are high-stakes situations where speed and accuracy both matter—and they often conflict.

Incident response is a discipline with established frameworks, roles, and processes. When done well, it limits damage, preserves evidence, and enables recovery. When done poorly, it extends outages, destroys evidence, and lets attackers maintain access.

This week covers the foundations: IR frameworks, phases, roles, and the critical decisions made in the first hours of an incident.

Key insight: Incidents are won or lost in preparation. The team that's practiced and planned responds effectively. The team that's improvising makes costly mistakes.

1) Incident Response Frameworks

Multiple frameworks guide incident response:

NIST SP 800-61 (Most Common):

┌──────────────┐
│ 1.Preparation│ → Policies, tools, training, planning
└──────┬───────┘
       ↓
┌──────────────────────┐
│ 2.Detection &        │ → Identify and analyze incidents
│   Analysis           │
└──────┬───────────────┘
       ↓
┌──────────────────────┐
│ 3.Containment,       │ → Stop the bleeding, remove threat,
│   Eradication,       │   restore systems
│   Recovery           │
└──────┬───────────────┘
       ↓
┌──────────────────────┐
│ 4.Post-Incident      │ → Lessons learned, improvements
│   Activity           │
└──────────────────────┘

SANS Incident Response Process:

Six phases (more granular):

1. Preparation
   - IR plan and policies
   - Team training
   - Tools and resources

2. Identification
   - Detect potential incidents
   - Determine scope
   - Assign severity

3. Containment
   - Short-term: Stop immediate damage
   - Long-term: Temporary fixes while investigating

4. Eradication
   - Remove malware
   - Close vulnerabilities
   - Eliminate attacker access

5. Recovery
   - Restore systems
   - Return to operations
   - Monitor for recurrence

6. Lessons Learned
   - What happened
   - What worked
   - What to improve

Incident Severity Levels:

Severity 1 - Critical:
- Business operations severely impacted
- Data breach confirmed
- Ransomware active
- Executive involvement required
- All hands on deck

Severity 2 - High:
- Significant system compromise
- Sensitive data potentially exposed
- Major business function affected
- Dedicated IR team response

Severity 3 - Medium:
- Limited compromise
- No sensitive data affected
- Single system or small scope
- Standard IR procedures

Severity 4 - Low:
- Minor security event
- Quickly contained
- Minimal business impact
- SOC handles within normal operations

Key insight: Severity drives resource allocation. Over-escalate and you waste resources. Under-escalate and the incident grows.

2) Incident Response Roles

Clear roles prevent chaos during incidents:

Core IR Team Roles:

Incident Commander (IC):
- Overall incident leadership
- Makes key decisions
- Coordinates resources
- Manages communication
- Single point of authority

Technical Lead:
- Leads technical investigation
- Directs analysis efforts
- Recommends containment actions
- Reports findings to IC

Communications Lead:
- Internal communications
- External communications (if needed)
- Status updates
- Stakeholder management

Scribe/Documenter:
- Records all actions
- Maintains timeline
- Tracks decisions
- Preserves evidence log

Subject Matter Experts (SMEs):
- Network analyst
- Endpoint analyst
- Malware analyst
- System administrators
- Application owners

RACI Matrix for IR:

R = Responsible (does the work)
A = Accountable (makes decisions)
C = Consulted (provides input)
I = Informed (kept updated)

                    IC    Tech   Comms  Legal  IT    Exec
                          Lead   Lead
─────────────────────────────────────────────────────────
Declare incident    A/R    C      I      I     I     I
Containment         A      R      I      C     R     I
Communication       A      C      R      C     I     I
Recovery            A      C      I      I     R     I
Legal decisions     C      I      C      A/R   I     I
Business decisions  C      I      C      C     I     A/R

Escalation Path:

Typical escalation:

SOC Analyst → SOC Lead → IR Team → IR Manager → CISO → Executive Team

When to escalate:
- Severity increases
- Scope expands
- Business impact grows
- Legal/regulatory implications
- Media attention likely
- Resources insufficient

Escalation information:
- Current situation summary
- Actions taken
- Resources needed
- Decisions required
- Recommended actions

Key insight: Role clarity matters most under stress. When everyone knows their job, the team functions. When roles overlap or are unclear, chaos ensues.

3) Detection and Analysis Phase

Before you can respond, you must understand what happened:

Incident Detection Sources:

Automated:
- SIEM alerts
- EDR detections
- IDS/IPS alerts
- Antivirus alerts
- Log analysis anomalies

Manual:
- User reports (very common!)
- Helpdesk tickets
- Threat hunting findings
- External notification (Law enforcement, ISPs)

Analysis and Validation:

Is it real? (False Positive Elimination):
- Check multiple sources
- Correlate events
- Verify with users
- Check for authorized changes (change management)

Scope Determination:
- Who/what is affected?
- How did they get in? (Entry vector)
- Persistence mechanisms?
- Lateral movement?
- Data exfiltration?

Documentation (The Start):

Start the Incident Log immediately:
- Time of detection
- Who detected it
- Initial observations
- Systems involved
- Data potentially involved

Don't touch evidence systems yet!
- RAM capture first (volatile)
- Disk image second
- Isolate but keep running (if analyzing memory)

Key insight: Properly scoping the incident is crucial. Under-scoping leads to re-infection because you didn't find all the backdoors.

4) Containment, Eradication, and Recovery

Stopping the bleeding and fixing the root cause:

Containment Strategies:

Isolation (Air Gap):
- Physical disconnect (pull the cable)
- Virtual disconnect (vSwitch / VLAN isolation)
- Advantages: Stops spread immediately
- Disadvantages: Kills active connections, alerts attacker

Segmentation:
- Firewall rules to block C2
- Block specific ports/IPs
- Advantages: Keeps system analyzing
- Disadvantages: Attacker might have other paths

Account Containment:
- Disable compromised accounts
- Force password resets
- Revoke tokens
- Disable Golden Tickets

Eradication:

Getting rid of the bad:
- Reimage infected systems (Safest!)
- Restore from known-good backups
- Remediate vulnerability (Apply patch)
- Remove malware artifacts (If cleaning)
- Reset all credentials involved

Ideally, rebuild rather than clean. You can rarely be 100% sure you removed everything.

Recovery:

Bringing business back:
- Restore systems to production
- Verify functionality
- Enhanced monitoring (watch for return)
- Phased rollout
- Communicate with stakeholders

Criteria for recovery:
- Root cause identified and fixed?
- All malware removed/systems rebuilt?
- Monitoring in place?
- Business sign-off?

5) Post-Incident Activity

Learning from the crisis:

The "After Action Report" (AAR):

1. Executive Summary
   - High-level overview
   - Impact statement
   - Cost estimate

2. Timeline
   - Detailed sequence of events
   - Dwell time (detection to containment)

3. Technical Findings
   - Root cause analysis
   - IOCs found
   - Vulnerabilities exploited

4. Lessons Learned (The "Hot Wash"):
   - What went well?
   - What went wrong?
   - Were procedures followed?
   - Were tools effective?

5. Recommendations
   - Policy changes
   - Technical controls needed
   - Training gaps
   - Process improvements

Key insight: Never waste a good incident. The improvements driven by a real incident are often the most valuable security controls.

Real-World Context: Ransomware Response

How IR principles apply to modern threats:

The Ransomware Crisis: Modern ransomware isn't just malware; it's a human-operated attack. Attackers dwell for days, exfiltrate data for blackmail, destroy backups, and *then* encrypt.

Response Challenges:

Containment vs. Preservation: Do you shut down to save files, or keep running to capture RAM keys?
Communication: How to communicate when email system is encrypted?
Legal: Is paying the ransom legal in your jurisdiction?

Case Study (Maersk): During NotPetya, Maersk had to rebuild their entire AD infrastructure. They succeeded because they found one surviving domain controller in a remote office that was offline during the attack. Lesson: Resilience matters.

Guided Lab: Scenario Tabletop

Practice applying the IR framework to a scenario.

Scenario: The "Slow" PC

Input: Finance user reports PC is "slow" and fans are spinning loud. Endpoint is meant for spreadsheets only.

Phase 1: Identification

Triage: Check CPU usage. Process `svchost.exe` running from `C:\Temp\` using 90% CPU.
Analysis: `C:\Temp\` is not a valid path for `svchost`. Hash check reveals a crypto-miner.
Scope: Creating SIEM query for that hash. 5 other machines found.

Phase 2: Containment

Isolate the 6 affected hosts from network (VLAN quarantine).
Prevent C2 communication at firewall.

Phase 3: Eradication

Reimage affected machines.
Identify entry point: Phishing email sent yesterday.
Block sender domain. Remove email from other inboxes.

Phase 4: Recovery

Restore user data from OneDrive.
Re-enable network access.
Reset user passwords.

Reflection

Why quarantine instead of just killing the process?
How did finding 5 other machines change the severity?

Week 6 Outcome Check

By the end of this week, you should be able to:

Explain the NIST and SANS incident response life cycles
Map roles and responsibilities during an incident
Differentiate between an event and an incident
Describe the goals of containment, eradication, and recovery
Understand the importance of the post-incident review

🎯 Hands-On Labs (Free & Essential)

Rehearse the incident response life cycle before moving to reading resources.

🎮 TryHackMe: Incident Response

What you'll do: Walk through IR phases, roles, and real-world decision points.
Why it matters: IR is a structured process, not improvised firefighting.
Time estimate: 1.5-2 hours

Start TryHackMe Incident Response →

📝 Lab Exercise: IR Timeline + Phase Mapping

Task: Build a timeline from a short incident narrative and map actions to NIST phases.
Deliverable: Timeline with phase labels + containment/eradication/recovery steps.
Why it matters: Clear timelines make response coordination possible.
Time estimate: 60-90 minutes

🏁 PicoCTF Practice: Forensics (Evidence Basics)

What you'll do: Solve beginner forensics challenges focused on evidence artifacts.
Why it matters: Evidence integrity is essential for incident response.
Time estimate: 1-2 hours

Start PicoCTF Forensics →

💡 Lab Tip: Document every action with timestamps to preserve evidence and decision context.

🛡️ Alert Tuning & False Positive Reduction

A noisy SOC burns out analysts. Tuning detections is defensive engineering: you reduce alert fatigue without losing coverage.

Tuning workflow:
- Baseline normal behavior first
- Add allowlists for known-good activity
- Use thresholds and suppressions carefully
- Track false positives by rule and source

📚 Building on CSY101 Week-14: Align tuning with documented control objectives and evidence.

Resources

NIST SP 800-61 Rev 2 · The Computer Security Incident Handling Guide · Resource ID: csy201_w6_r1
SANS Incident Handler's Handbook · Practical guide to the 6 steps · Resource ID: csy201_w6_r2

Checkpoint Questions

What is the difference between an event and an incident?
Why is "Preparation" considered the most important phase?
What is the main goal of the Containment phase?
Who is responsible for the overall management of an incident?
Why should you avoid rebooting a compromised machine immediately?

Week 06 Quiz

Test your understanding of incident response frameworks, roles, and containment decisions.

Format: 10 multiple-choice questions. Passing score: 70%. Time: Untimed.

Take Quiz

Weekly Reflection

Reflection Prompt:

Incident response is often described as "emergency services for digital assets." Reflect on how the structured approach (Preparation -> Detection -> Containment...) helps manage the panic and chaos of a real cyber attack. Why is strict adherence to process (like Chain of Custody) so critical even when time is short?