Opening Framing: Why Frameworks Matter
Imagine responding to a ransomware attack with no playbook, no escalation path, and no defined roles. Who makes decisions? What gets done first? Who communicates with leadership? Chaos ensues, and the attacker wins.
Frameworks provide structure that enables speed. When everyone knows their role, understands the process, and follows established procedures, response becomes coordinated and effective. Frameworks don't constrain— they empower.
This week covers the frameworks that guide security operations: organizational models, industry standards, and the processes that turn a group of analysts into an effective security team.
Key insight: The best SOCs aren't those with the most tools or the biggest budgets. They're the ones with mature processes, clear roles, and continuous improvement.
1) SOC Maturity Models
SOC maturity describes how well-developed security operations are:
SOC-CMM (Capability Maturity Model) Levels:
Level 1 - Initial:
- Ad-hoc processes
- Reactive only
- No documentation
- Hero-dependent (relies on individuals)
Level 2 - Managed:
- Basic processes defined
- Some documentation
- Inconsistent execution
- Limited metrics
Level 3 - Defined:
- Documented procedures
- Consistent execution
- Regular training
- Basic metrics tracked
Level 4 - Quantitatively Managed:
- Metrics-driven decisions
- Process optimization
- Continuous improvement
- Predictable performance
Level 5 - Optimizing:
- Proactive improvement
- Innovation focus
- Industry-leading practices
- Measurable business impact
Assessing Your SOC:
Key assessment areas:
People:
- Staffing levels adequate?
- Skills match requirements?
- Training program exists?
- Career paths defined?
Process:
- Procedures documented?
- Playbooks for common scenarios?
- Escalation paths clear?
- Metrics tracked and used?
Technology:
- Tools integrated?
- Visibility adequate?
- Automation implemented?
- Technical debt managed?
Governance:
- Charter and scope defined?
- Stakeholder relationships managed?
- Compliance requirements met?
- Budget and resources adequate?
Maturity Improvement Path:
Common progression:
Year 1: Foundation
- Establish basic monitoring
- Define core processes
- Train initial team
- Implement primary tools
Year 2: Standardization
- Document all procedures
- Create playbooks
- Establish metrics
- Improve tool integration
Year 3: Optimization
- Automate routine tasks
- Enhance detection coverage
- Develop threat hunting
- Measure and improve
Year 4+: Excellence
- Continuous improvement
- Advanced capabilities
- Industry collaboration
- Innovation leadership
Key insight: Most SOCs operate at Level 2-3. Reaching Level 4-5 requires sustained investment in process improvement, not just technology.
2) Industry Frameworks
Several frameworks guide security operations:
NIST Cybersecurity Framework (CSF):
Five core functions:
IDENTIFY:
- Asset management
- Risk assessment
- Governance
PROTECT:
- Access control
- Training
- Data security
DETECT: ← SOC primary focus
- Anomaly detection
- Continuous monitoring
- Detection processes
RESPOND: ← SOC primary focus
- Response planning
- Communications
- Analysis and mitigation
RECOVER:
- Recovery planning
- Improvements
- Communications
SOC maps primarily to DETECT and RESPOND
MITRE ATT&CK:
Adversary tactics and techniques knowledge base:
Tactics (the "why"):
- Reconnaissance
- Resource Development
- Initial Access
- Execution
- Persistence
- Privilege Escalation
- Defense Evasion
- Credential Access
- Discovery
- Lateral Movement
- Collection
- Command and Control
- Exfiltration
- Impact
Each tactic contains techniques (the "how"):
- T1566: Phishing
- T1059: Command and Scripting Interpreter
- T1003: OS Credential Dumping
- etc.
SOC uses ATT&CK to:
- Understand adversary behavior
- Map detection coverage
- Prioritize detection engineering
- Communicate about threats
NIST SP 800-61 (Incident Response):
Incident response lifecycle:
1. Preparation
- Policies and procedures
- Communication plans
- Tools and resources
- Training
2. Detection and Analysis
- Monitoring and alerting
- Alert triage
- Investigation
- Prioritization
3. Containment, Eradication, Recovery
- Contain the threat
- Remove attacker presence
- Restore systems
- Verify clean state
4. Post-Incident Activity
- Lessons learned
- Process improvements
- Documentation
- Metrics
Key insight: Frameworks aren't bureaucracy—they're accumulated wisdom from thousands of organizations. Use them as starting points, then adapt to your context.
3) SOC Processes and Procedures
Effective SOCs run on documented processes:
Alert Triage Process:
Standard triage workflow:
1. Alert arrives in queue
↓
2. Analyst claims alert
↓
3. Initial assessment (2-5 min)
- What triggered the alert?
- What system/user affected?
- What's the potential impact?
↓
4. Classification decision
├─ False Positive → Document, close, consider tuning
├─ Benign True Positive → Document, close
├─ True Positive (Low) → Investigate, remediate
└─ True Positive (High) → Escalate immediately
↓
5. Investigation (if needed)
- Gather additional context
- Check related alerts
- Query additional data sources
↓
6. Resolution
- Document findings
- Take action (block, contain, escalate)
- Close ticket with details
Escalation Procedures:
Clear escalation criteria:
Escalate to Tier 2 when:
- Investigation exceeds 30 minutes
- Multiple systems affected
- Malware confirmed
- Data exfiltration suspected
- Analyst uncertain
Escalate to Tier 3/IR when:
- Active intrusion confirmed
- Ransomware detected
- Critical system compromised
- Executive/VIP involved
- Legal/compliance implications
Escalate to Management when:
- Business impact significant
- External communication needed
- Resource decisions required
- Policy exceptions needed
Escalation information:
- What happened (summary)
- What's been done (actions taken)
- What's needed (specific ask)
- Urgency level (timeframe)
Shift Handoff:
Effective handoff includes:
Open incidents:
- Status of each active incident
- Next steps needed
- Who to contact if questions
Alert queue status:
- Volume and trends
- Any backlogs
- Problem alerts/rules
Environmental context:
- Any maintenance windows
- Known issues
- VIP activities
Threat context:
- Active campaigns to watch for
- New IOCs received
- Recent threat intel
Written handoff + verbal briefing = best practice
Key insight: Good processes reduce cognitive load. When the process is clear, analysts can focus their mental energy on the actual investigation, not on figuring out what to do next.
4) Playbooks and Runbooks
Playbooks codify response procedures for specific scenarios:
Playbook vs. Runbook:
Playbook:
- Higher-level guidance
- Decision trees
- "What to do and why"
- Allows analyst judgment
Runbook:
- Step-by-step instructions
- Specific commands/actions
- "Exactly how to do it"
- Minimal interpretation needed
Example: Phishing Response Playbook
Trigger: User reports suspicious email
Step 1: Initial Assessment
- Is it actually phishing? (check indicators)
- Did user click/open anything?
- Forward original email to analysis
Step 2: Analyze Email
- Extract IOCs (sender, URLs, attachments)
- Check threat intel for known indicators
- Sandbox any attachments
Step 3: Scope Assessment
- Search for other recipients
- Check if anyone clicked
- Check if malware executed
Step 4: Containment
- Block sender/domain
- Block malicious URLs
- Quarantine emails from other mailboxes
- If clicked: isolate endpoint, scan
Step 5: Remediation
- Remove remaining emails
- Reset credentials if compromised
- Clean infected systems
Step 6: Documentation
- Record all IOCs
- Update threat intel
- Document timeline and actions
Common Playbook Types:
Detection playbooks:
- Phishing email reported
- Malware alert
- Suspicious login
- Data exfiltration alert
- Ransomware indicators
- Brute force attack
- Insider threat indicators
Response playbooks:
- Endpoint containment
- Account compromise
- Business email compromise
- DDoS response
- Third-party breach notification
Operational playbooks:
- New IOC processing
- Threat intel integration
- Detection rule deployment
- False positive tuning
Playbook Best Practices:
Writing effective playbooks:
1. Clear trigger conditions
- When exactly does this apply?
2. Decision points explicit
- If X, do Y; if not X, do Z
3. Actions specific and actionable
- Not "investigate further"
- But "query SIEM for related events using..."
4. Escalation criteria clear
- When to escalate, to whom
5. Documentation requirements
- What to record, where
6. Regular review and update
- At least quarterly
- After major incidents
7. Accessible during incidents
- Don't bury in SharePoint
- Quick reference cards helpful
Key insight: Playbooks aren't about removing analyst judgment— they're about ensuring critical steps aren't missed under pressure and enabling consistent response quality.
5) SOC Metrics and KPIs
Metrics drive improvement and demonstrate value:
Operational Metrics:
Volume metrics:
- Alerts per day/week/month
- Incidents per day/week/month
- Alerts per analyst
Time metrics:
- MTTD (Mean Time to Detect)
- MTTR (Mean Time to Respond)
- MTTC (Mean Time to Contain)
- Alert dwell time (time in queue)
Quality metrics:
- False positive rate
- Escalation rate
- Reopen rate
- Customer satisfaction
Key Performance Indicators:
MTTD (Mean Time to Detect):
- Time from attack start to detection
- Lower is better
- Industry average: hours to days
- Target: minutes to hours
MTTR (Mean Time to Respond):
- Time from detection to response initiation
- Lower is better
- Target: under 1 hour for critical
MTTC (Mean Time to Contain):
- Time from detection to containment
- Lower is better
- Target: under 4 hours for critical
False Positive Rate:
- Percentage of alerts that aren't real threats
- Lower is better
- Reality: 70-90% in many SOCs
- Target: under 50% with good tuning
Detection Coverage:
- Percentage of ATT&CK techniques with detection
- Higher is better
- Track and improve over time
Using Metrics Effectively:
Good metric practices:
1. Measure what matters
- Tied to security outcomes
- Not just activity metrics
2. Context is critical
- Trends more important than absolutes
- Compare to yourself, not others
3. Avoid gaming
- Don't incentivize closing fast over closing well
- Balance efficiency with quality
4. Regular review
- Weekly operational review
- Monthly leadership review
- Quarterly strategic review
5. Action-oriented
- Metrics should drive decisions
- If you won't act on it, don't measure it
Common mistakes:
- Measuring only volume (more alerts ≠ better)
- Ignoring quality metrics
- Not tracking trends
- Using metrics punitively
Key insight: Metrics tell you if you're improving. Without measurement, "getting better" is just a feeling, not a fact.
Real-World Context: Frameworks in Practice
Frameworks guide real SOC operations:
Compliance Requirements: Many organizations must demonstrate security operations capabilities to auditors. NIST CSF and similar frameworks provide the structure for these conversations. "We follow NIST CSF" is meaningful to auditors and customers.
Incident Communication: When reporting to executives, frameworks provide common language. "We detected this at the Initial Access phase and contained before Lateral Movement" tells a clear story using ATT&CK terminology.
Team Development: Maturity models help SOC managers justify investment. "We're at Level 2 and need X to reach Level 3" is more compelling than vague requests for more resources.
MITRE ATT&CK Practical Use:
- Detection Engineering: Map rules to techniques to find gaps
- Threat Intel: CTI reports reference ATT&CK techniques
- Red Team/Blue Team: Common language for exercises
Key insight: Frameworks aren't academic exercises. They're practical tools that improve communication, guide investment, and enable measurement.
Guided Lab: Building a Phishing Playbook
Let's create a practical phishing response playbook that you could use in a real SOC environment.