Opening Framing
Incident response in cloud environments differs fundamentally from traditional on-premises response. You can't physically access servers, but you gain capabilities impossible in traditional environments: instant isolation through security groups, forensic snapshots without downtime, comprehensive API audit logs, and the ability to preserve evidence while immediately deploying clean replacement infrastructure.
Cloud incident response requires understanding both traditional IR methodology and cloud-specific techniques. The shared responsibility model means you handle application and data incidents while the cloud provider handles infrastructure incidents. Effective response requires preparation: logging must be enabled before an incident, response procedures must be documented, and teams must be trained on cloud-specific tools and techniques.
This week covers cloud incident response methodology, evidence collection and preservation, containment strategies, forensic analysis, recovery procedures, and building incident response capabilities. You'll learn to respond effectively to security incidents in cloud environments.
Key insight: Cloud enables faster response—if you've prepared. Without preparation, cloud complexity slows response.
1) Cloud Incident Response Fundamentals
Understanding how cloud changes incident response is essential for effective security operations:
Cloud IR vs Traditional IR:
KEY DIFFERENCES:
┌─────────────────────────────────────────────────────────────┐
│ TRADITIONAL IR │ CLOUD IR │
├─────────────────────────────┼───────────────────────────────┤
│ Physical access to servers │ API-based access only │
│ Image hard drives │ Snapshot volumes │
│ Unplug network cable │ Modify security groups │
│ Limited audit logs │ Comprehensive API logs │
│ Evidence in single location │ Evidence across regions/svcs │
│ On-site investigation │ Remote investigation │
│ Hardware seizure │ Snapshot and terminate │
│ Rebuild from backup │ Redeploy from IaC │
└─────────────────────────────┴───────────────────────────────┘
CLOUD IR ADVANTAGES:
┌─────────────────────────────────────────────────────────────┐
│ ✓ Instant network isolation (security group changes) │
│ ✓ Live forensic snapshots (no downtime) │
│ ✓ Comprehensive audit trail (CloudTrail, etc.) │
│ ✓ Rapid replacement (terminate and redeploy) │
│ ✓ Immutable evidence (snapshots, S3 versioning) │
│ ✓ Automated response (Lambda, EventBridge) │
│ ✓ Scale investigation across many resources │
│ ✓ Cross-region visibility │
└─────────────────────────────────────────────────────────────┘
CLOUD IR CHALLENGES:
┌─────────────────────────────────────────────────────────────┐
│ ✗ No physical access │
│ ✗ Shared responsibility (some data with provider) │
│ ✗ Ephemeral resources (containers, Lambda) │
│ ✗ Multi-region/multi-account complexity │
│ ✗ Third-party service dependencies │
│ ✗ Provider cooperation required for some data │
│ ✗ Log volume can be overwhelming │
│ ✗ Different skills required │
└─────────────────────────────────────────────────────────────┘
Incident Response Phases:
NIST Incident Response Lifecycle (Cloud Adapted):
┌─────────────────────────────────────────────────────────────┐
│ │
│ ┌──────────────┐ │
│ │ PREPARATION │◄──────────────────────────────┐ │
│ └──────┬───────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────┐ │ │
│ │ DETECTION │ │ │
│ │ & ANALYSIS │ │ │
│ └──────┬───────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────┐ │ │
│ │ CONTAINMENT │ │ │
│ │ ERADICATION │ │ │
│ │ RECOVERY │ │ │
│ └──────┬───────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────┐ │ │
│ │POST-INCIDENT │───────────────────────────────┘ │
│ │ ACTIVITY │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
CLOUD-SPECIFIC ACTIVITIES BY PHASE:
PREPARATION:
┌─────────────────────────────────────────────────────────────┐
│ - Enable comprehensive logging (CloudTrail, VPC Flow) │
│ - Configure log retention and protection │
│ - Create isolation security groups │
│ - Document response procedures │
│ - Train team on cloud forensics │
│ - Establish communication channels │
│ - Create forensic accounts with limited scope │
│ - Test response procedures regularly │
└─────────────────────────────────────────────────────────────┘
DETECTION & ANALYSIS:
┌─────────────────────────────────────────────────────────────┐
│ - GuardDuty/Security Hub alerts │
│ - CloudTrail analysis │
│ - VPC Flow Log analysis │
│ - Identify affected resources │
│ - Determine scope and impact │
│ - Establish timeline │
│ - Preserve evidence (snapshots, log exports) │
└─────────────────────────────────────────────────────────────┘
CONTAINMENT, ERADICATION, RECOVERY:
┌─────────────────────────────────────────────────────────────┐
│ - Isolate compromised resources (security groups) │
│ - Disable compromised credentials │
│ - Create forensic snapshots │
│ - Terminate compromised instances │
│ - Rotate all potentially exposed secrets │
│ - Redeploy from known-good IaC │
│ - Verify clean state │
└─────────────────────────────────────────────────────────────┘
POST-INCIDENT:
┌─────────────────────────────────────────────────────────────┐
│ - Complete forensic analysis │
│ - Document lessons learned │
│ - Update detection rules │
│ - Improve preventive controls │
│ - Update response procedures │
│ - Brief stakeholders │
└─────────────────────────────────────────────────────────────┘
Key insight: Most cloud IR preparation happens before the incident. You can't enable logging during an incident.
2) Evidence Collection and Preservation
Proper evidence handling ensures investigation integrity and potential legal admissibility:
Evidence Sources in AWS:
LOG-BASED EVIDENCE:
┌─────────────────────────────────────────────────────────────┐
│ CloudTrail: │
│ - All API calls (who, what, when, from where) │
│ - Management and data events │
│ - Stored in S3 with integrity validation │
│ │
│ VPC Flow Logs: │
│ - Network traffic metadata │
│ - Source/destination IPs and ports │
│ - Accept/reject actions │
│ │
│ CloudWatch Logs: │
│ - Application logs │
│ - System logs (via agent) │
│ - Lambda execution logs │
│ │
│ S3 Access Logs: │
│ - Object-level access │
│ - Requester identity │
│ │
│ Load Balancer Logs: │
│ - Request details │
│ - Client IP, response codes │
│ │
│ DNS Query Logs: │
│ - Route 53 Resolver logs │
│ - DNS requests from VPC │
└─────────────────────────────────────────────────────────────┘
RESOURCE-BASED EVIDENCE:
┌─────────────────────────────────────────────────────────────┐
│ EC2: │
│ - EBS volume snapshots │
│ - Instance metadata │
│ - Security group configurations │
│ - Memory (requires live instance) │
│ │
│ S3: │
│ - Object versions │
│ - Deleted object markers │
│ - Bucket policies and ACLs │
│ │
│ IAM: │
│ - User and role configurations │
│ - Policy documents │
│ - Access key metadata │
│ │
│ Lambda: │
│ - Function code versions │
│ - Configuration history │
│ - Execution logs │
└─────────────────────────────────────────────────────────────┘
Evidence Collection Procedures:
Evidence Collection Commands:
EC2 INSTANCE EVIDENCE:
# 1. Document instance state
aws ec2 describe-instances \
--instance-ids i-1234567890abcdef0 \
--output json > instance-details.json
# 2. Get console output
aws ec2 get-console-output \
--instance-id i-1234567890abcdef0 \
--output json > console-output.json
# 3. Create forensic snapshots of all volumes
VOLUMES=$(aws ec2 describe-volumes \
--filters Name=attachment.instance-id,Values=i-1234567890abcdef0 \
--query 'Volumes[*].VolumeId' --output text)
for vol in $VOLUMES; do
aws ec2 create-snapshot \
--volume-id $vol \
--description "Forensic-$(date +%Y%m%d)-incident-123" \
--tag-specifications \
"ResourceType=snapshot,Tags=[{Key=Forensic,Value=true},
{Key=IncidentId,Value=123}]"
done
# 4. Capture memory (requires SSM agent)
# Using EC2 Rescue or custom memory capture tool
aws ssm send-command \
--instance-ids i-1234567890abcdef0 \
--document-name "AWS-RunShellScript" \
--parameters 'commands=["lime -o /tmp/memory.lime"]'
CLOUDTRAIL EVIDENCE:
# Export CloudTrail logs for timeframe
aws s3 sync \
s3://cloudtrail-bucket/AWSLogs/123456789012/CloudTrail/us-east-1/2024/01/ \
./evidence/cloudtrail/ \
--exclude "*" \
--include "2024-01-15*"
# Query with CloudTrail Lake or Athena
SELECT eventtime, eventsource, eventname, useridentity.arn,
sourceipaddress, requestparameters
FROM cloudtrail_logs
WHERE useridentity.arn LIKE '%compromised-user%'
AND eventtime BETWEEN '2024-01-14' AND '2024-01-16'
ORDER BY eventtime
IAM EVIDENCE:
# Get credential report
aws iam generate-credential-report
aws iam get-credential-report --output text \
--query 'Content' | base64 -d > credential-report.csv
# Document user details
aws iam get-user --user-name compromised-user > user-details.json
aws iam list-user-policies --user-name compromised-user
aws iam list-attached-user-policies --user-name compromised-user
aws iam list-access-keys --user-name compromised-user
# Get policy documents
aws iam get-policy-version \
--policy-arn arn:aws:iam::123456789012:policy/SuspiciousPolicy \
--version-id v1
Evidence Preservation:
Evidence Integrity:
SNAPSHOT PROTECTION:
┌─────────────────────────────────────────────────────────────┐
│ # Prevent snapshot deletion │
│ aws ec2 modify-snapshot-attribute \ │
│ --snapshot-id snap-1234567890abcdef0 \ │
│ --attribute createVolumePermission \ │
│ --operation-type remove │
│ │
│ # Copy to separate forensic account │
│ aws ec2 copy-snapshot \ │
│ --source-region us-east-1 \ │
│ --source-snapshot-id snap-1234567890abcdef0 \ │
│ --destination-region us-east-1 \ │
│ --description "Forensic copy - incident 123" │
│ │
│ # Share with forensic account │
│ aws ec2 modify-snapshot-attribute \ │
│ --snapshot-id snap-1234567890abcdef0 \ │
│ --attribute createVolumePermission \ │
│ --operation-type add \ │
│ --user-ids 999888777666 │
└─────────────────────────────────────────────────────────────┘
S3 EVIDENCE PROTECTION:
┌─────────────────────────────────────────────────────────────┐
│ # Enable object lock on evidence bucket │
│ aws s3api put-object-lock-configuration \ │
│ --bucket forensic-evidence \ │
│ --object-lock-configuration '{ │
│ "ObjectLockEnabled": "Enabled", │
│ "Rule": { │
│ "DefaultRetention": { │
│ "Mode": "GOVERNANCE", │
│ "Days": 365 │
│ } │
│ } │
│ }' │
│ │
│ # Copy with legal hold │
│ aws s3api copy-object \ │
│ --copy-source source-bucket/evidence.log \ │
│ --bucket forensic-evidence \ │
│ --key incident-123/evidence.log \ │
│ --object-lock-legal-hold-status ON │
└─────────────────────────────────────────────────────────────┘
CHAIN OF CUSTODY:
┌─────────────────────────────────────────────────────────────┐
│ Document for each evidence item: │
│ - What was collected │
│ - When it was collected │
│ - Who collected it │
│ - How it was collected (commands/tools) │
│ - Where it is stored │
│ - Hash values for integrity │
│ │
│ # Calculate hash │
│ sha256sum evidence-file.tar.gz > evidence-file.sha256 │
│ │
│ # Store metadata │
│ aws s3api put-object \ │
│ --bucket forensic-evidence \ │
│ --key incident-123/evidence-file.tar.gz \ │
│ --body evidence-file.tar.gz \ │
│ --metadata "collector=analyst@company.com, │
│ collected-at=2024-01-15T14:30:00Z, │
│ sha256=abc123..., │
│ incident-id=123" │
└─────────────────────────────────────────────────────────────┘
Key insight: Collect evidence before containment when possible. Isolating may alter or destroy volatile evidence.
3) Containment Strategies
Containment stops the incident from spreading while preserving evidence and enabling investigation:
Containment Options:
EC2 INSTANCE CONTAINMENT:
┌─────────────────────────────────────────────────────────────┐
│ OPTION 1: Security Group Isolation (Preferred) │
│ │
│ # Create isolation security group (if not pre-created) │
│ aws ec2 create-security-group \ │
│ --group-name forensic-isolation \ │
│ --description "No ingress or egress" \ │
│ --vpc-id vpc-123 │
│ │
│ # Remove all rules (deny all by default) │
│ # Note: SG has no rules = deny all │
│ │
│ # Replace instance security groups │
│ aws ec2 modify-instance-attribute \ │
│ --instance-id i-1234567890abcdef0 \ │
│ --groups sg-isolation123 │
│ │
│ Benefits: │
│ - Instant network isolation │
│ - Instance stays running for investigation │
│ - Memory preserved │
│ - Can allow specific forensic access if needed │
├─────────────────────────────────────────────────────────────┤
│ OPTION 2: Network ACL Block │
│ │
│ # Add deny rules to subnet NACL │
│ aws ec2 create-network-acl-entry \ │
│ --network-acl-id acl-123 \ │
│ --rule-number 1 \ │
│ --protocol -1 \ │
│ --rule-action deny \ │
│ --cidr-block 0.0.0.0/0 │
│ │
│ Use when: Need to block at subnet level │
├─────────────────────────────────────────────────────────────┤
│ OPTION 3: Stop Instance (Evidence Risk) │
│ │
│ aws ec2 stop-instances --instance-ids i-123 │
│ │
│ Warning: Loses memory contents │
│ Use when: Immediate threat, evidence less important │
├─────────────────────────────────────────────────────────────┤
│ OPTION 4: Terminate Instance (Last Resort) │
│ │
│ # Only after snapshots taken! │
│ aws ec2 terminate-instances --instance-ids i-123 │
│ │
│ Use when: Active attack causing damage │
└─────────────────────────────────────────────────────────────┘
IAM CREDENTIAL CONTAINMENT:
┌─────────────────────────────────────────────────────────────┐
│ USER ACCESS KEYS: │
│ │
│ # Deactivate access key (reversible) │
│ aws iam update-access-key \ │
│ --user-name compromised-user \ │
│ --access-key-id AKIA123 \ │
│ --status Inactive │
│ │
│ # Delete access key (permanent) │
│ aws iam delete-access-key \ │
│ --user-name compromised-user \ │
│ --access-key-id AKIA123 │
│ │
│ CONSOLE ACCESS: │
│ │
│ # Delete login profile │
│ aws iam delete-login-profile \ │
│ --user-name compromised-user │
│ │
│ # Or add deny-all policy │
│ aws iam attach-user-policy \ │
│ --user-name compromised-user \ │
│ --policy-arn arn:aws:iam::123456789012:policy/DenyAll │
│ │
│ ROLE SESSIONS: │
│ │
│ # Revoke all active sessions │
│ aws iam put-role-policy \ │
│ --role-name compromised-role \ │
│ --policy-name RevokeOldSessions \ │
│ --policy-document '{ │
│ "Version": "2012-10-17", │
│ "Statement": [{ │
│ "Effect": "Deny", │
│ "Action": "*", │
│ "Resource": "*", │
│ "Condition": { │
│ "DateLessThan": { │
│ "aws:TokenIssueTime": "2024-01-15T14:00:00Z" │
│ } │
│ } │
│ }] │
│ }' │
└─────────────────────────────────────────────────────────────┘
Automated Containment:
Automated Response with Lambda:
GUARDDUTY TO LAMBDA CONTAINMENT:
import boto3
import json
def lambda_handler(event, context):
"""Auto-contain EC2 instances from GuardDuty findings"""
finding = event['detail']
finding_type = finding['type']
severity = finding['severity']
# Only auto-contain for high severity EC2 findings
if severity < 7:
return {'action': 'skipped', 'reason': 'severity below threshold'}
if not finding_type.startswith(('Trojan:', 'Backdoor:',
'CryptoCurrency:')):
return {'action': 'skipped', 'reason': 'finding type not critical'}
# Get instance ID
instance_id = finding['resource']['instanceDetails']['instanceId']
vpc_id = finding['resource']['instanceDetails']\
['networkInterfaces'][0]['vpcId']
ec2 = boto3.client('ec2')
# Step 1: Create forensic snapshot
volumes = ec2.describe_volumes(
Filters=[{'Name': 'attachment.instance-id',
'Values': [instance_id]}]
)
snapshot_ids = []
for vol in volumes['Volumes']:
snapshot = ec2.create_snapshot(
VolumeId=vol['VolumeId'],
Description=f'Auto-forensic-{finding["id"][:8]}',
TagSpecifications=[{
'ResourceType': 'snapshot',
'Tags': [
{'Key': 'Forensic', 'Value': 'true'},
{'Key': 'FindingId', 'Value': finding['id']},
{'Key': 'InstanceId', 'Value': instance_id}
]
}]
)
snapshot_ids.append(snapshot['SnapshotId'])
# Step 2: Get or create isolation security group
try:
isolation_sg = ec2.describe_security_groups(
Filters=[
{'Name': 'group-name', 'Values': ['forensic-isolation']},
{'Name': 'vpc-id', 'Values': [vpc_id]}
]
)['SecurityGroups'][0]['GroupId']
except IndexError:
isolation_sg = ec2.create_security_group(
GroupName='forensic-isolation',
Description='No ingress or egress - forensic isolation',
VpcId=vpc_id
)['GroupId']
# Remove default egress rule
ec2.revoke_security_group_egress(
GroupId=isolation_sg,
IpPermissions=[{'IpProtocol': '-1',
'IpRanges': [{'CidrIp': '0.0.0.0/0'}]}]
)
# Step 3: Isolate instance
ec2.modify_instance_attribute(
InstanceId=instance_id,
Groups=[isolation_sg]
)
# Step 4: Tag instance
ec2.create_tags(
Resources=[instance_id],
Tags=[
{'Key': 'SecurityStatus', 'Value': 'Isolated'},
{'Key': 'IsolatedAt', 'Value': datetime.utcnow().isoformat()},
{'Key': 'FindingId', 'Value': finding['id']}
]
)
# Step 5: Notify
sns = boto3.client('sns')
sns.publish(
TopicArn='arn:aws:sns:us-east-1:123456789012:security-incidents',
Subject=f'CRITICAL: EC2 Instance {instance_id} Auto-Isolated',
Message=json.dumps({
'finding_type': finding_type,
'instance_id': instance_id,
'snapshots': snapshot_ids,
'action': 'isolated',
'finding_id': finding['id']
}, indent=2)
)
return {
'action': 'contained',
'instance_id': instance_id,
'snapshots': snapshot_ids
}
Key insight: Pre-create isolation security groups in every VPC. During an incident, you need to act fast.
4) Investigation and Analysis
Systematic investigation determines scope, impact, and root cause of the incident:
Investigation Framework:
KEY QUESTIONS TO ANSWER:
┌─────────────────────────────────────────────────────────────┐
│ 1. WHAT happened? │
│ - Type of incident │
│ - Specific actions taken by attacker │
│ - Resources affected │
│ │
│ 2. WHEN did it happen? │
│ - Initial compromise time │
│ - Timeline of attacker activities │
│ - Duration of compromise │
│ │
│ 3. HOW did it happen? │
│ - Initial access vector │
│ - Vulnerabilities exploited │
│ - Credentials compromised │
│ │
│ 4. WHO is responsible? │
│ - Compromised accounts │
│ - Attacker infrastructure │
│ - Attribution (if possible) │
│ │
│ 5. WHAT is the impact? │
│ - Data accessed or exfiltrated │
│ - Systems compromised │
│ - Business impact │
│ │
│ 6. ARE they still in? │
│ - Persistence mechanisms │
│ - Ongoing access │
│ - Backdoors installed │
└─────────────────────────────────────────────────────────────┘
CLOUDTRAIL INVESTIGATION:
# Timeline of compromised credential activity
fields @timestamp, eventSource, eventName, sourceIPAddress,
requestParameters, responseElements, errorCode
| filter userIdentity.arn = 'arn:aws:iam::123456789012:user/compromised'
| sort @timestamp asc
# Look for reconnaissance
fields @timestamp, eventName, requestParameters
| filter userIdentity.arn like /compromised/
| filter eventName in ['ListBuckets', 'DescribeInstances',
'ListUsers', 'ListRoles', 'GetCallerIdentity',
'DescribeSecurityGroups', 'ListSecrets']
# Look for persistence
fields @timestamp, eventName, requestParameters, responseElements
| filter eventName in ['CreateUser', 'CreateAccessKey',
'CreateRole', 'AttachUserPolicy',
'PutRolePolicy', 'CreateLoginProfile']
| sort @timestamp asc
# Look for privilege escalation
fields @timestamp, eventName, requestParameters
| filter eventName in ['AttachUserPolicy', 'AttachRolePolicy',
'PutUserPolicy', 'PutRolePolicy',
'CreatePolicyVersion', 'UpdateAssumeRolePolicy']
# Look for data access
fields @timestamp, eventName, requestParameters.bucketName,
requestParameters.key, sourceIPAddress
| filter eventSource = 's3.amazonaws.com'
| filter eventName in ['GetObject', 'ListObjects', 'ListObjectsV2']
| filter userIdentity.arn like /compromised/
# Look for data exfiltration
fields @timestamp, eventName, requestParameters
| filter eventName in ['CreateSnapshot', 'CopySnapshot',
'ModifySnapshotAttribute', 'CopyObject',
'PutBucketReplication']
Forensic Analysis:
Disk Forensics from Snapshots:
FORENSIC ANALYSIS WORKFLOW:
┌─────────────────────────────────────────────────────────────┐
│ 1. Create volume from forensic snapshot │
│ aws ec2 create-volume \ │
│ --snapshot-id snap-forensic123 \ │
│ --availability-zone us-east-1a \ │
│ --volume-type gp3 │
│ │
│ 2. Launch forensic workstation │
│ - Dedicated forensic AMI with tools │
│ - In isolated forensic VPC │
│ - No internet access │
│ │
│ 3. Attach volume (read-only if possible) │
│ aws ec2 attach-volume \ │
│ --volume-id vol-forensic123 \ │
│ --instance-id i-forensicworkstation \ │
│ --device /dev/sdf │
│ │
│ 4. Mount read-only │
│ mount -o ro,noexec /dev/xvdf1 /mnt/evidence │
│ │
│ 5. Analyze │
│ - Timeline analysis │
│ - Log review │
│ - Malware analysis │
│ - File system artifacts │
└─────────────────────────────────────────────────────────────┘
KEY ARTIFACTS TO EXAMINE (Linux):
┌─────────────────────────────────────────────────────────────┐
│ Authentication: │
│ /var/log/auth.log or /var/log/secure │
│ /var/log/lastlog │
│ /var/log/wtmp (last command) │
│ │
│ Command History: │
│ /home/*/.bash_history │
│ /root/.bash_history │
│ │
│ Persistence: │
│ /etc/crontab, /etc/cron.* │
│ /var/spool/cron/crontabs/* │
│ /etc/systemd/system/*.service │
│ /etc/rc.local │
│ /home/*/.ssh/authorized_keys │
│ │
│ Network: │
│ /etc/hosts │
│ /etc/resolv.conf │
│ netstat/ss output (if captured) │
│ │
│ Applications: │
│ /var/log/nginx/access.log │
│ /var/log/apache2/access.log │
│ Application-specific logs │
└─────────────────────────────────────────────────────────────┘
MEMORY ANALYSIS (if captured):
┌─────────────────────────────────────────────────────────────┐
│ Tools: Volatility, Rekall │
│ │
│ # Process listing │
│ volatility -f memory.lime --profile=Linux pslist │
│ │
│ # Network connections │
│ volatility -f memory.lime --profile=Linux netscan │
│ │
│ # Bash history from memory │
│ volatility -f memory.lime --profile=Linux bash │
│ │
│ # Find malware │
│ volatility -f memory.lime --profile=Linux malfind │
└─────────────────────────────────────────────────────────────┘
Key insight: Build your timeline from CloudTrail first. API logs provide the authoritative record of cloud actions.
5) Recovery and Post-Incident
Recovery restores normal operations while ensuring the attacker cannot regain access:
Recovery Procedures:
CREDENTIAL ROTATION:
┌─────────────────────────────────────────────────────────────┐
│ MUST ROTATE: │
│ │
│ 1. All access keys for compromised users │
│ aws iam create-access-key --user-name user │
│ aws iam delete-access-key --user-name user \ │
│ --access-key-id OLD_KEY │
│ │
│ 2. Passwords for compromised users │
│ Force password reset on next login │
│ │
│ 3. Service account credentials │
│ Rotate in Secrets Manager │
│ Update all consuming applications │
│ │
│ 4. KMS keys (if CMK compromise suspected) │
│ Create new key, re-encrypt data │
│ │
│ 5. Database passwords │
│ Rotate in Secrets Manager │
│ Update RDS master password │
│ │
│ 6. API keys for external services │
│ │
│ 7. SSH keys if host compromised │
│ New key pairs, update authorized_keys │
│ │
│ CONSIDER ROTATING: │
│ - All credentials that COULD have been accessed │
│ - Credentials stored on compromised systems │
│ - Credentials in same secret manager/vault │
└─────────────────────────────────────────────────────────────┘
INFRASTRUCTURE RECOVERY:
┌─────────────────────────────────────────────────────────────┐
│ Option 1: Redeploy from IaC (Preferred) │
│ │
│ # Terminate compromised infrastructure │
│ # Redeploy from known-good IaC │
│ terraform apply │
│ │
│ Benefits: │
│ - Guaranteed clean state │
│ - No hidden persistence │
│ - Fast and repeatable │
│ │
│ Option 2: Restore from clean backup │
│ │
│ # Use backup from before compromise │
│ # Verify backup is clean │
│ # Apply missing data carefully │
│ │
│ Option 3: Clean compromised system (Risky) │
│ │
│ # Only if IaC/backup not available │
│ # Very thorough cleaning required │
│ # High risk of missing persistence │
│ # Generally NOT recommended │
└─────────────────────────────────────────────────────────────┘
VERIFICATION CHECKLIST:
┌─────────────────────────────────────────────────────────────┐
│ Before declaring recovery complete: │
│ │
│ □ All compromised credentials rotated │
│ □ All compromised resources replaced │
│ □ Persistence mechanisms removed/prevented │
│ □ Root cause addressed │
│ □ Detection improved for similar attacks │
│ □ No ongoing suspicious activity │
│ □ Stakeholders notified │
│ □ Documentation complete │
└─────────────────────────────────────────────────────────────┘
Post-Incident Activities:
Lessons Learned:
POST-INCIDENT REVIEW:
┌─────────────────────────────────────────────────────────────┐
│ Meeting within 1-2 weeks of incident closure: │
│ │
│ 1. INCIDENT SUMMARY │
│ - What happened │
│ - Timeline of events │
│ - Impact assessment │
│ │
│ 2. WHAT WENT WELL │
│ - Detection that worked │
│ - Response that was effective │
│ - Tools that helped │
│ │
│ 3. WHAT COULD BE IMPROVED │
│ - Detection gaps │
│ - Response delays │
│ - Missing capabilities │
│ - Process failures │
│ │
│ 4. ACTION ITEMS │
│ - Specific improvements │
│ - Owners and deadlines │
│ - Priority ranking │
│ │
│ Blameless culture: Focus on systems, not individuals │
└─────────────────────────────────────────────────────────────┘
COMMON IMPROVEMENTS:
┌─────────────────────────────────────────────────────────────┐
│ Detection: │
│ - Add new detection rules │
│ - Tune existing rules │
│ - Enable additional logging │
│ - Expand monitoring coverage │
│ │
│ Prevention: │
│ - Fix vulnerability that enabled access │
│ - Improve IAM policies │
│ - Add network controls │
│ - Update security configurations │
│ │
│ Response: │
│ - Update runbooks │
│ - Improve tooling │
│ - Additional training │
│ - Better communication │
└─────────────────────────────────────────────────────────────┘
METRICS TO TRACK:
┌─────────────────────────────────────────────────────────────┐
│ - Time to detect │
│ - Time to contain │
│ - Time to eradicate │
│ - Time to recover │
│ - Total incident duration │
│ - Number of systems affected │
│ - Data exposure scope │
│ - Business impact │
│ │
│ Track trends over time to measure improvement │
└─────────────────────────────────────────────────────────────┘
Key insight: Every incident is a learning opportunity. Post-incident improvements prevent future incidents.
Real-World Context
Case Study: AWS Access Key Compromise
A developer accidentally committed AWS access keys to a public GitHub repository. Within minutes, automated scanners detected the keys and attackers began using them to launch EC2 instances for cryptomining. The organization's response: (1) immediately deactivated the compromised keys via IAM, (2) used CloudTrail to identify all actions taken with the keys, (3) terminated unauthorized EC2 instances, (4) rotated all credentials that the compromised user had access to, (5) enabled GuardDuty for future detection. The incident highlighted the need for secret scanning in CI/CD and the speed required for cloud IR.
Case Study: EC2 Instance Compromise via SSRF
An attacker exploited an SSRF vulnerability to access the EC2 metadata service, obtaining IAM role credentials. The response team: (1) isolated the instance using security groups, (2) created forensic snapshots, (3) analyzed CloudTrail to determine scope of credential use, (4) revoked all sessions for the role, (5) patched the SSRF vulnerability, (6) deployed IMDSv2 requirement account-wide, (7) reduced IAM role permissions. The incident demonstrated the importance of IMDSv2 and least privilege IAM roles.
Cloud IR Checklist:
Cloud Incident Response Checklist:
PREPARATION (Before Incident):
□ CloudTrail enabled all regions
□ VPC Flow Logs enabled
□ Log retention configured
□ Forensic account ready
□ Isolation security groups created
□ Response runbooks documented
□ Team trained on cloud IR
□ Communication channels established
□ Legal/compliance contacts identified
DETECTION:
□ Alert validated (true positive?)
□ Initial scope determined
□ Severity assessed
□ Stakeholders notified
□ Incident ticket created
CONTAINMENT:
□ Evidence preserved BEFORE containment
□ Compromised credentials disabled
□ Affected instances isolated
□ Network access restricted
□ Ongoing monitoring for attacker activity
ERADICATION:
□ Root cause identified
□ All persistence mechanisms found
□ Compromised resources identified
□ Credentials to rotate identified
RECOVERY:
□ Infrastructure redeployed from IaC
□ All credentials rotated
□ Verification testing complete
□ Monitoring enhanced
□ Normal operations restored
POST-INCIDENT:
□ Timeline documented
□ Lessons learned meeting held
□ Improvements identified and assigned
□ Detection rules updated
□ Runbooks updated
□ Final report completed
Effective IR is about preparation and practice. Run tabletop exercises regularly to test your procedures.
Guided Lab: Incident Response Simulation
In this lab, you'll practice responding to a simulated cloud security incident.
Lab Environment:
- AWS account with EC2, IAM, CloudTrail access
- Pre-configured "compromised" environment
- CloudTrail logs with attack activity
- AWS CLI configured
Exercise Steps:
- Receive simulated GuardDuty alert
- Perform initial triage and assessment
- Preserve evidence (snapshots, log exports)
- Contain affected resources
- Investigate using CloudTrail queries
- Determine scope and timeline
- Identify root cause and persistence
- Execute recovery procedures
- Document findings and lessons learned
Reflection Questions:
- What would you do differently if this were a real incident?
- What preparation would have made response faster?
- How would you improve detection for this attack type?
Week Outcome Check
By the end of this week, you should be able to:
- Explain how cloud IR differs from traditional IR
- Collect and preserve evidence from AWS resources
- Contain compromised EC2 instances using security groups
- Disable and rotate compromised IAM credentials
- Investigate incidents using CloudTrail and VPC Flow Logs
- Implement automated containment with Lambda
- Execute recovery procedures including credential rotation
- Conduct post-incident reviews and implement improvements
📚 Building on Prior Knowledge
Cloud incident response builds on core detection and risk skills:
- CSY204 (IR Workflows): Use containment and evidence-handling steps in cloud contexts.
- CSY201 (OS + Logs): Apply log triage habits to CloudTrail and VPC Flow Logs.
- CSY104 (Networking): Network paths explain lateral movement and exfiltration.
- CSY104 Week 11 (CVSS): Use severity to prioritize response actions.
🎯 Hands-On Labs (Free & Essential)
Practice cloud incident response with realistic scenarios and forensic analysis.
🚨 TryHackMe: AWS Incident Response
What you'll do: Respond to simulated AWS security incidents—analyze CloudTrail
logs, isolate compromised instances, and recover.
Why it matters: Practice under pressure builds muscle memory for real
incidents.
Time estimate: 3-4 hours
🔍 AWS Skill Builder: Security Incident Response
What you'll do: Learn AWS IR tools—CloudTrail forensics, snapshot preservation,
and automated response with Lambda.
Why it matters: AWS provides powerful IR capabilities—know how to use them.
Time estimate: 2-3 hours
🧪 Cloud Forensics Practice
What you'll do: Analyze EC2 memory dumps and disk snapshots—extract artifacts,
find indicators of compromise, build timelines.
Why it matters: Digital forensics skills translate to cloud—memory is still
memory.
Time estimate: 3-4 hours
💡 Lab Strategy: Build IR runbooks BEFORE incidents—panic is not a good time to figure out how to snapshot an instance.
Resources
Lab
Complete the following lab exercises to practice cloud incident response.