CSY302 Week 10 - Cloud incident response builds on core detection and risk skills:

Opening Framing

Incident response in cloud environments differs fundamentally from traditional on-premises response. You can't physically access servers, but you gain capabilities impossible in traditional environments: instant isolation through security groups, forensic snapshots without downtime, comprehensive API audit logs, and the ability to preserve evidence while immediately deploying clean replacement infrastructure.

Cloud incident response requires understanding both traditional IR methodology and cloud-specific techniques. The shared responsibility model means you handle application and data incidents while the cloud provider handles infrastructure incidents. Effective response requires preparation: logging must be enabled before an incident, response procedures must be documented, and teams must be trained on cloud-specific tools and techniques.

This week covers cloud incident response methodology, evidence collection and preservation, containment strategies, forensic analysis, recovery procedures, and building incident response capabilities. You'll learn to respond effectively to security incidents in cloud environments.

Key insight: Cloud enables faster response—if you've prepared. Without preparation, cloud complexity slows response.

1) Cloud Incident Response Fundamentals

Understanding how cloud changes incident response is essential for effective security operations:

Cloud IR vs Traditional IR:

KEY DIFFERENCES:
┌─────────────────────────────────────────────────────────────┐
│ TRADITIONAL IR              │ CLOUD IR                      │
├─────────────────────────────┼───────────────────────────────┤
│ Physical access to servers  │ API-based access only         │
│ Image hard drives           │ Snapshot volumes              │
│ Unplug network cable        │ Modify security groups        │
│ Limited audit logs          │ Comprehensive API logs        │
│ Evidence in single location │ Evidence across regions/svcs  │
│ On-site investigation       │ Remote investigation          │
│ Hardware seizure            │ Snapshot and terminate        │
│ Rebuild from backup         │ Redeploy from IaC             │
└─────────────────────────────┴───────────────────────────────┘

CLOUD IR ADVANTAGES:
┌─────────────────────────────────────────────────────────────┐
│ ✓ Instant network isolation (security group changes)        │
│ ✓ Live forensic snapshots (no downtime)                     │
│ ✓ Comprehensive audit trail (CloudTrail, etc.)              │
│ ✓ Rapid replacement (terminate and redeploy)                │
│ ✓ Immutable evidence (snapshots, S3 versioning)             │
│ ✓ Automated response (Lambda, EventBridge)                  │
│ ✓ Scale investigation across many resources                 │
│ ✓ Cross-region visibility                                   │
└─────────────────────────────────────────────────────────────┘

CLOUD IR CHALLENGES:
┌─────────────────────────────────────────────────────────────┐
│ ✗ No physical access                                        │
│ ✗ Shared responsibility (some data with provider)           │
│ ✗ Ephemeral resources (containers, Lambda)                  │
│ ✗ Multi-region/multi-account complexity                     │
│ ✗ Third-party service dependencies                          │
│ ✗ Provider cooperation required for some data               │
│ ✗ Log volume can be overwhelming                            │
│ ✗ Different skills required                                 │
└─────────────────────────────────────────────────────────────┘

Incident Response Phases:

NIST Incident Response Lifecycle (Cloud Adapted):

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│     ┌──────────────┐                                        │
│     │ PREPARATION  │◄──────────────────────────────┐        │
│     └──────┬───────┘                               │        │
│            │                                       │        │
│            ▼                                       │        │
│     ┌──────────────┐                               │        │
│     │  DETECTION   │                               │        │
│     │ & ANALYSIS   │                               │        │
│     └──────┬───────┘                               │        │
│            │                                       │        │
│            ▼                                       │        │
│     ┌──────────────┐                               │        │
│     │ CONTAINMENT  │                               │        │
│     │ ERADICATION  │                               │        │
│     │  RECOVERY    │                               │        │
│     └──────┬───────┘                               │        │
│            │                                       │        │
│            ▼                                       │        │
│     ┌──────────────┐                               │        │
│     │POST-INCIDENT │───────────────────────────────┘        │
│     │   ACTIVITY   │                                        │
│     └──────────────┘                                        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

CLOUD-SPECIFIC ACTIVITIES BY PHASE:

PREPARATION:
┌─────────────────────────────────────────────────────────────┐
│ - Enable comprehensive logging (CloudTrail, VPC Flow)       │
│ - Configure log retention and protection                    │
│ - Create isolation security groups                          │
│ - Document response procedures                              │
│ - Train team on cloud forensics                             │
│ - Establish communication channels                          │
│ - Create forensic accounts with limited scope               │
│ - Test response procedures regularly                        │
└─────────────────────────────────────────────────────────────┘

DETECTION & ANALYSIS:
┌─────────────────────────────────────────────────────────────┐
│ - GuardDuty/Security Hub alerts                             │
│ - CloudTrail analysis                                       │
│ - VPC Flow Log analysis                                     │
│ - Identify affected resources                               │
│ - Determine scope and impact                                │
│ - Establish timeline                                        │
│ - Preserve evidence (snapshots, log exports)                │
└─────────────────────────────────────────────────────────────┘

CONTAINMENT, ERADICATION, RECOVERY:
┌─────────────────────────────────────────────────────────────┐
│ - Isolate compromised resources (security groups)           │
│ - Disable compromised credentials                           │
│ - Create forensic snapshots                                 │
│ - Terminate compromised instances                           │
│ - Rotate all potentially exposed secrets                    │
│ - Redeploy from known-good IaC                              │
│ - Verify clean state                                        │
└─────────────────────────────────────────────────────────────┘

POST-INCIDENT:
┌─────────────────────────────────────────────────────────────┐
│ - Complete forensic analysis                                │
│ - Document lessons learned                                  │
│ - Update detection rules                                    │
│ - Improve preventive controls                               │
│ - Update response procedures                                │
│ - Brief stakeholders                                        │
└─────────────────────────────────────────────────────────────┘

Key insight: Most cloud IR preparation happens before the incident. You can't enable logging during an incident.

2) Evidence Collection and Preservation

Proper evidence handling ensures investigation integrity and potential legal admissibility:

Evidence Sources in AWS:

LOG-BASED EVIDENCE:
┌─────────────────────────────────────────────────────────────┐
│ CloudTrail:                                                 │
│ - All API calls (who, what, when, from where)               │
│ - Management and data events                                │
│ - Stored in S3 with integrity validation                    │
│                                                             │
│ VPC Flow Logs:                                              │
│ - Network traffic metadata                                  │
│ - Source/destination IPs and ports                          │
│ - Accept/reject actions                                     │
│                                                             │
│ CloudWatch Logs:                                            │
│ - Application logs                                          │
│ - System logs (via agent)                                   │
│ - Lambda execution logs                                     │
│                                                             │
│ S3 Access Logs:                                             │
│ - Object-level access                                       │
│ - Requester identity                                        │
│                                                             │
│ Load Balancer Logs:                                         │
│ - Request details                                           │
│ - Client IP, response codes                                 │
│                                                             │
│ DNS Query Logs:                                             │
│ - Route 53 Resolver logs                                    │
│ - DNS requests from VPC                                     │
└─────────────────────────────────────────────────────────────┘

RESOURCE-BASED EVIDENCE:
┌─────────────────────────────────────────────────────────────┐
│ EC2:                                                        │
│ - EBS volume snapshots                                      │
│ - Instance metadata                                         │
│ - Security group configurations                             │
│ - Memory (requires live instance)                           │
│                                                             │
│ S3:                                                         │
│ - Object versions                                           │
│ - Deleted object markers                                    │
│ - Bucket policies and ACLs                                  │
│                                                             │
│ IAM:                                                        │
│ - User and role configurations                              │
│ - Policy documents                                          │
│ - Access key metadata                                       │
│                                                             │
│ Lambda:                                                     │
│ - Function code versions                                    │
│ - Configuration history                                     │
│ - Execution logs                                            │
└─────────────────────────────────────────────────────────────┘

Evidence Collection Procedures:

Evidence Collection Commands:

EC2 INSTANCE EVIDENCE:

# 1. Document instance state
aws ec2 describe-instances \
  --instance-ids i-1234567890abcdef0 \
  --output json > instance-details.json

# 2. Get console output
aws ec2 get-console-output \
  --instance-id i-1234567890abcdef0 \
  --output json > console-output.json

# 3. Create forensic snapshots of all volumes
VOLUMES=$(aws ec2 describe-volumes \
  --filters Name=attachment.instance-id,Values=i-1234567890abcdef0 \
  --query 'Volumes[*].VolumeId' --output text)

for vol in $VOLUMES; do
  aws ec2 create-snapshot \
    --volume-id $vol \
    --description "Forensic-$(date +%Y%m%d)-incident-123" \
    --tag-specifications \
      "ResourceType=snapshot,Tags=[{Key=Forensic,Value=true},
       {Key=IncidentId,Value=123}]"
done

# 4. Capture memory (requires SSM agent)
# Using EC2 Rescue or custom memory capture tool
aws ssm send-command \
  --instance-ids i-1234567890abcdef0 \
  --document-name "AWS-RunShellScript" \
  --parameters 'commands=["lime -o /tmp/memory.lime"]'

CLOUDTRAIL EVIDENCE:

# Export CloudTrail logs for timeframe
aws s3 sync \
  s3://cloudtrail-bucket/AWSLogs/123456789012/CloudTrail/us-east-1/2024/01/ \
  ./evidence/cloudtrail/ \
  --exclude "*" \
  --include "2024-01-15*"

# Query with CloudTrail Lake or Athena
SELECT eventtime, eventsource, eventname, useridentity.arn,
       sourceipaddress, requestparameters
FROM cloudtrail_logs
WHERE useridentity.arn LIKE '%compromised-user%'
  AND eventtime BETWEEN '2024-01-14' AND '2024-01-16'
ORDER BY eventtime

IAM EVIDENCE:

# Get credential report
aws iam generate-credential-report
aws iam get-credential-report --output text \
  --query 'Content' | base64 -d > credential-report.csv

# Document user details
aws iam get-user --user-name compromised-user > user-details.json
aws iam list-user-policies --user-name compromised-user
aws iam list-attached-user-policies --user-name compromised-user
aws iam list-access-keys --user-name compromised-user

# Get policy documents
aws iam get-policy-version \
  --policy-arn arn:aws:iam::123456789012:policy/SuspiciousPolicy \
  --version-id v1

Evidence Preservation:

Evidence Integrity:

SNAPSHOT PROTECTION:
┌─────────────────────────────────────────────────────────────┐
│ # Prevent snapshot deletion                                 │
│ aws ec2 modify-snapshot-attribute \                         │
│   --snapshot-id snap-1234567890abcdef0 \                    │
│   --attribute createVolumePermission \                      │
│   --operation-type remove                                   │
│                                                             │
│ # Copy to separate forensic account                         │
│ aws ec2 copy-snapshot \                                     │
│   --source-region us-east-1 \                               │
│   --source-snapshot-id snap-1234567890abcdef0 \             │
│   --destination-region us-east-1 \                          │
│   --description "Forensic copy - incident 123"              │
│                                                             │
│ # Share with forensic account                               │
│ aws ec2 modify-snapshot-attribute \                         │
│   --snapshot-id snap-1234567890abcdef0 \                    │
│   --attribute createVolumePermission \                      │
│   --operation-type add \                                    │
│   --user-ids 999888777666                                   │
└─────────────────────────────────────────────────────────────┘

S3 EVIDENCE PROTECTION:
┌─────────────────────────────────────────────────────────────┐
│ # Enable object lock on evidence bucket                     │
│ aws s3api put-object-lock-configuration \                   │
│   --bucket forensic-evidence \                              │
│   --object-lock-configuration '{                            │
│     "ObjectLockEnabled": "Enabled",                         │
│     "Rule": {                                               │
│       "DefaultRetention": {                                 │
│         "Mode": "GOVERNANCE",                               │
│         "Days": 365                                         │
│       }                                                     │
│     }                                                       │
│   }'                                                        │
│                                                             │
│ # Copy with legal hold                                      │
│ aws s3api copy-object \                                     │
│   --copy-source source-bucket/evidence.log \                │
│   --bucket forensic-evidence \                              │
│   --key incident-123/evidence.log \                         │
│   --object-lock-legal-hold-status ON                        │
└─────────────────────────────────────────────────────────────┘

CHAIN OF CUSTODY:
┌─────────────────────────────────────────────────────────────┐
│ Document for each evidence item:                            │
│ - What was collected                                        │
│ - When it was collected                                     │
│ - Who collected it                                          │
│ - How it was collected (commands/tools)                     │
│ - Where it is stored                                        │
│ - Hash values for integrity                                 │
│                                                             │
│ # Calculate hash                                            │
│ sha256sum evidence-file.tar.gz > evidence-file.sha256       │
│                                                             │
│ # Store metadata                                            │
│ aws s3api put-object \                                      │
│   --bucket forensic-evidence \                              │
│   --key incident-123/evidence-file.tar.gz \                 │
│   --body evidence-file.tar.gz \                             │
│   --metadata "collector=analyst@company.com,                │
│               collected-at=2024-01-15T14:30:00Z,            │
│               sha256=abc123...,                             │
│               incident-id=123"                              │
└─────────────────────────────────────────────────────────────┘

Key insight: Collect evidence before containment when possible. Isolating may alter or destroy volatile evidence.

3) Containment Strategies

Containment stops the incident from spreading while preserving evidence and enabling investigation:

Containment Options:

EC2 INSTANCE CONTAINMENT:
┌─────────────────────────────────────────────────────────────┐
│ OPTION 1: Security Group Isolation (Preferred)              │
│                                                             │
│ # Create isolation security group (if not pre-created)      │
│ aws ec2 create-security-group \                             │
│   --group-name forensic-isolation \                         │
│   --description "No ingress or egress" \                    │
│   --vpc-id vpc-123                                          │
│                                                             │
│ # Remove all rules (deny all by default)                    │
│ # Note: SG has no rules = deny all                          │
│                                                             │
│ # Replace instance security groups                          │
│ aws ec2 modify-instance-attribute \                         │
│   --instance-id i-1234567890abcdef0 \                       │
│   --groups sg-isolation123                                  │
│                                                             │
│ Benefits:                                                   │
│ - Instant network isolation                                 │
│ - Instance stays running for investigation                  │
│ - Memory preserved                                          │
│ - Can allow specific forensic access if needed              │
├─────────────────────────────────────────────────────────────┤
│ OPTION 2: Network ACL Block                                 │
│                                                             │
│ # Add deny rules to subnet NACL                             │
│ aws ec2 create-network-acl-entry \                          │
│   --network-acl-id acl-123 \                                │
│   --rule-number 1 \                                         │
│   --protocol -1 \                                           │
│   --rule-action deny \                                      │
│   --cidr-block 0.0.0.0/0                                    │
│                                                             │
│ Use when: Need to block at subnet level                     │
├─────────────────────────────────────────────────────────────┤
│ OPTION 3: Stop Instance (Evidence Risk)                     │
│                                                             │
│ aws ec2 stop-instances --instance-ids i-123                 │
│                                                             │
│ Warning: Loses memory contents                              │
│ Use when: Immediate threat, evidence less important         │
├─────────────────────────────────────────────────────────────┤
│ OPTION 4: Terminate Instance (Last Resort)                  │
│                                                             │
│ # Only after snapshots taken!                               │
│ aws ec2 terminate-instances --instance-ids i-123            │
│                                                             │
│ Use when: Active attack causing damage                      │
└─────────────────────────────────────────────────────────────┘

IAM CREDENTIAL CONTAINMENT:
┌─────────────────────────────────────────────────────────────┐
│ USER ACCESS KEYS:                                           │
│                                                             │
│ # Deactivate access key (reversible)                        │
│ aws iam update-access-key \                                 │
│   --user-name compromised-user \                            │
│   --access-key-id AKIA123 \                                 │
│   --status Inactive                                         │
│                                                             │
│ # Delete access key (permanent)                             │
│ aws iam delete-access-key \                                 │
│   --user-name compromised-user \                            │
│   --access-key-id AKIA123                                   │
│                                                             │
│ CONSOLE ACCESS:                                             │
│                                                             │
│ # Delete login profile                                      │
│ aws iam delete-login-profile \                              │
│   --user-name compromised-user                              │
│                                                             │
│ # Or add deny-all policy                                    │
│ aws iam attach-user-policy \                                │
│   --user-name compromised-user \                            │
│   --policy-arn arn:aws:iam::123456789012:policy/DenyAll     │
│                                                             │
│ ROLE SESSIONS:                                              │
│                                                             │
│ # Revoke all active sessions                                │
│ aws iam put-role-policy \                                   │
│   --role-name compromised-role \                            │
│   --policy-name RevokeOldSessions \                         │
│   --policy-document '{                                      │
│     "Version": "2012-10-17",                                │
│     "Statement": [{                                         │
│       "Effect": "Deny",                                     │
│       "Action": "*",                                        │
│       "Resource": "*",                                      │
│       "Condition": {                                        │
│         "DateLessThan": {                                   │
│           "aws:TokenIssueTime": "2024-01-15T14:00:00Z"      │
│         }                                                   │
│       }                                                     │
│     }]                                                      │
│   }'                                                        │
└─────────────────────────────────────────────────────────────┘

Automated Containment:

Automated Response with Lambda:

GUARDDUTY TO LAMBDA CONTAINMENT:

import boto3
import json

def lambda_handler(event, context):
    """Auto-contain EC2 instances from GuardDuty findings"""
    
    finding = event['detail']
    finding_type = finding['type']
    severity = finding['severity']
    
    # Only auto-contain for high severity EC2 findings
    if severity < 7:
        return {'action': 'skipped', 'reason': 'severity below threshold'}
    
    if not finding_type.startswith(('Trojan:', 'Backdoor:', 
                                     'CryptoCurrency:')):
        return {'action': 'skipped', 'reason': 'finding type not critical'}
    
    # Get instance ID
    instance_id = finding['resource']['instanceDetails']['instanceId']
    vpc_id = finding['resource']['instanceDetails']\
             ['networkInterfaces'][0]['vpcId']
    
    ec2 = boto3.client('ec2')
    
    # Step 1: Create forensic snapshot
    volumes = ec2.describe_volumes(
        Filters=[{'Name': 'attachment.instance-id', 
                  'Values': [instance_id]}]
    )
    
    snapshot_ids = []
    for vol in volumes['Volumes']:
        snapshot = ec2.create_snapshot(
            VolumeId=vol['VolumeId'],
            Description=f'Auto-forensic-{finding["id"][:8]}',
            TagSpecifications=[{
                'ResourceType': 'snapshot',
                'Tags': [
                    {'Key': 'Forensic', 'Value': 'true'},
                    {'Key': 'FindingId', 'Value': finding['id']},
                    {'Key': 'InstanceId', 'Value': instance_id}
                ]
            }]
        )
        snapshot_ids.append(snapshot['SnapshotId'])
    
    # Step 2: Get or create isolation security group
    try:
        isolation_sg = ec2.describe_security_groups(
            Filters=[
                {'Name': 'group-name', 'Values': ['forensic-isolation']},
                {'Name': 'vpc-id', 'Values': [vpc_id]}
            ]
        )['SecurityGroups'][0]['GroupId']
    except IndexError:
        isolation_sg = ec2.create_security_group(
            GroupName='forensic-isolation',
            Description='No ingress or egress - forensic isolation',
            VpcId=vpc_id
        )['GroupId']
        # Remove default egress rule
        ec2.revoke_security_group_egress(
            GroupId=isolation_sg,
            IpPermissions=[{'IpProtocol': '-1', 
                           'IpRanges': [{'CidrIp': '0.0.0.0/0'}]}]
        )
    
    # Step 3: Isolate instance
    ec2.modify_instance_attribute(
        InstanceId=instance_id,
        Groups=[isolation_sg]
    )
    
    # Step 4: Tag instance
    ec2.create_tags(
        Resources=[instance_id],
        Tags=[
            {'Key': 'SecurityStatus', 'Value': 'Isolated'},
            {'Key': 'IsolatedAt', 'Value': datetime.utcnow().isoformat()},
            {'Key': 'FindingId', 'Value': finding['id']}
        ]
    )
    
    # Step 5: Notify
    sns = boto3.client('sns')
    sns.publish(
        TopicArn='arn:aws:sns:us-east-1:123456789012:security-incidents',
        Subject=f'CRITICAL: EC2 Instance {instance_id} Auto-Isolated',
        Message=json.dumps({
            'finding_type': finding_type,
            'instance_id': instance_id,
            'snapshots': snapshot_ids,
            'action': 'isolated',
            'finding_id': finding['id']
        }, indent=2)
    )
    
    return {
        'action': 'contained',
        'instance_id': instance_id,
        'snapshots': snapshot_ids
    }

Key insight: Pre-create isolation security groups in every VPC. During an incident, you need to act fast.

4) Investigation and Analysis

Systematic investigation determines scope, impact, and root cause of the incident:

Investigation Framework:

KEY QUESTIONS TO ANSWER:
┌─────────────────────────────────────────────────────────────┐
│ 1. WHAT happened?                                           │
│    - Type of incident                                       │
│    - Specific actions taken by attacker                     │
│    - Resources affected                                     │
│                                                             │
│ 2. WHEN did it happen?                                      │
│    - Initial compromise time                                │
│    - Timeline of attacker activities                        │
│    - Duration of compromise                                 │
│                                                             │
│ 3. HOW did it happen?                                       │
│    - Initial access vector                                  │
│    - Vulnerabilities exploited                              │
│    - Credentials compromised                                │
│                                                             │
│ 4. WHO is responsible?                                      │
│    - Compromised accounts                                   │
│    - Attacker infrastructure                                │
│    - Attribution (if possible)                              │
│                                                             │
│ 5. WHAT is the impact?                                      │
│    - Data accessed or exfiltrated                           │
│    - Systems compromised                                    │
│    - Business impact                                        │
│                                                             │
│ 6. ARE they still in?                                       │
│    - Persistence mechanisms                                 │
│    - Ongoing access                                         │
│    - Backdoors installed                                    │
└─────────────────────────────────────────────────────────────┘

CLOUDTRAIL INVESTIGATION:

# Timeline of compromised credential activity
fields @timestamp, eventSource, eventName, sourceIPAddress,
       requestParameters, responseElements, errorCode
| filter userIdentity.arn = 'arn:aws:iam::123456789012:user/compromised'
| sort @timestamp asc

# Look for reconnaissance
fields @timestamp, eventName, requestParameters
| filter userIdentity.arn like /compromised/
| filter eventName in ['ListBuckets', 'DescribeInstances',
                        'ListUsers', 'ListRoles', 'GetCallerIdentity',
                        'DescribeSecurityGroups', 'ListSecrets']

# Look for persistence
fields @timestamp, eventName, requestParameters, responseElements
| filter eventName in ['CreateUser', 'CreateAccessKey',
                        'CreateRole', 'AttachUserPolicy',
                        'PutRolePolicy', 'CreateLoginProfile']
| sort @timestamp asc

# Look for privilege escalation
fields @timestamp, eventName, requestParameters
| filter eventName in ['AttachUserPolicy', 'AttachRolePolicy',
                        'PutUserPolicy', 'PutRolePolicy',
                        'CreatePolicyVersion', 'UpdateAssumeRolePolicy']

# Look for data access
fields @timestamp, eventName, requestParameters.bucketName,
       requestParameters.key, sourceIPAddress
| filter eventSource = 's3.amazonaws.com'
| filter eventName in ['GetObject', 'ListObjects', 'ListObjectsV2']
| filter userIdentity.arn like /compromised/

# Look for data exfiltration
fields @timestamp, eventName, requestParameters
| filter eventName in ['CreateSnapshot', 'CopySnapshot',
                        'ModifySnapshotAttribute', 'CopyObject',
                        'PutBucketReplication']

Forensic Analysis:

Disk Forensics from Snapshots:

FORENSIC ANALYSIS WORKFLOW:
┌─────────────────────────────────────────────────────────────┐
│ 1. Create volume from forensic snapshot                     │
│    aws ec2 create-volume \                                  │
│      --snapshot-id snap-forensic123 \                       │
│      --availability-zone us-east-1a \                       │
│      --volume-type gp3                                      │
│                                                             │
│ 2. Launch forensic workstation                              │
│    - Dedicated forensic AMI with tools                      │
│    - In isolated forensic VPC                               │
│    - No internet access                                     │
│                                                             │
│ 3. Attach volume (read-only if possible)                    │
│    aws ec2 attach-volume \                                  │
│      --volume-id vol-forensic123 \                          │
│      --instance-id i-forensicworkstation \                  │
│      --device /dev/sdf                                      │
│                                                             │
│ 4. Mount read-only                                          │
│    mount -o ro,noexec /dev/xvdf1 /mnt/evidence              │
│                                                             │
│ 5. Analyze                                                  │
│    - Timeline analysis                                      │
│    - Log review                                             │
│    - Malware analysis                                       │
│    - File system artifacts                                  │
└─────────────────────────────────────────────────────────────┘

KEY ARTIFACTS TO EXAMINE (Linux):
┌─────────────────────────────────────────────────────────────┐
│ Authentication:                                             │
│ /var/log/auth.log or /var/log/secure                        │
│ /var/log/lastlog                                            │
│ /var/log/wtmp (last command)                                │
│                                                             │
│ Command History:                                            │
│ /home/*/.bash_history                                       │
│ /root/.bash_history                                         │
│                                                             │
│ Persistence:                                                │
│ /etc/crontab, /etc/cron.*                                   │
│ /var/spool/cron/crontabs/*                                  │
│ /etc/systemd/system/*.service                               │
│ /etc/rc.local                                               │
│ /home/*/.ssh/authorized_keys                                │
│                                                             │
│ Network:                                                    │
│ /etc/hosts                                                  │
│ /etc/resolv.conf                                            │
│ netstat/ss output (if captured)                             │
│                                                             │
│ Applications:                                               │
│ /var/log/nginx/access.log                                   │
│ /var/log/apache2/access.log                                 │
│ Application-specific logs                                   │
└─────────────────────────────────────────────────────────────┘

MEMORY ANALYSIS (if captured):
┌─────────────────────────────────────────────────────────────┐
│ Tools: Volatility, Rekall                                   │
│                                                             │
│ # Process listing                                           │
│ volatility -f memory.lime --profile=Linux pslist            │
│                                                             │
│ # Network connections                                       │
│ volatility -f memory.lime --profile=Linux netscan           │
│                                                             │
│ # Bash history from memory                                  │
│ volatility -f memory.lime --profile=Linux bash              │
│                                                             │
│ # Find malware                                              │
│ volatility -f memory.lime --profile=Linux malfind           │
└─────────────────────────────────────────────────────────────┘

Key insight: Build your timeline from CloudTrail first. API logs provide the authoritative record of cloud actions.

5) Recovery and Post-Incident

Recovery restores normal operations while ensuring the attacker cannot regain access:

Recovery Procedures:

CREDENTIAL ROTATION:
┌─────────────────────────────────────────────────────────────┐
│ MUST ROTATE:                                                │
│                                                             │
│ 1. All access keys for compromised users                    │
│    aws iam create-access-key --user-name user               │
│    aws iam delete-access-key --user-name user \             │
│      --access-key-id OLD_KEY                                │
│                                                             │
│ 2. Passwords for compromised users                          │
│    Force password reset on next login                       │
│                                                             │
│ 3. Service account credentials                              │
│    Rotate in Secrets Manager                                │
│    Update all consuming applications                        │
│                                                             │
│ 4. KMS keys (if CMK compromise suspected)                   │
│    Create new key, re-encrypt data                          │
│                                                             │
│ 5. Database passwords                                       │
│    Rotate in Secrets Manager                                │
│    Update RDS master password                               │
│                                                             │
│ 6. API keys for external services                           │
│                                                             │
│ 7. SSH keys if host compromised                             │
│    New key pairs, update authorized_keys                    │
│                                                             │
│ CONSIDER ROTATING:                                          │
│ - All credentials that COULD have been accessed             │
│ - Credentials stored on compromised systems                 │
│ - Credentials in same secret manager/vault                  │
└─────────────────────────────────────────────────────────────┘

INFRASTRUCTURE RECOVERY:
┌─────────────────────────────────────────────────────────────┐
│ Option 1: Redeploy from IaC (Preferred)                     │
│                                                             │
│ # Terminate compromised infrastructure                      │
│ # Redeploy from known-good IaC                              │
│ terraform apply                                             │
│                                                             │
│ Benefits:                                                   │
│ - Guaranteed clean state                                    │
│ - No hidden persistence                                     │
│ - Fast and repeatable                                       │
│                                                             │
│ Option 2: Restore from clean backup                         │
│                                                             │
│ # Use backup from before compromise                         │
│ # Verify backup is clean                                    │
│ # Apply missing data carefully                              │
│                                                             │
│ Option 3: Clean compromised system (Risky)                  │
│                                                             │
│ # Only if IaC/backup not available                          │
│ # Very thorough cleaning required                           │
│ # High risk of missing persistence                          │
│ # Generally NOT recommended                                 │
└─────────────────────────────────────────────────────────────┘

VERIFICATION CHECKLIST:
┌─────────────────────────────────────────────────────────────┐
│ Before declaring recovery complete:                         │
│                                                             │
│ □ All compromised credentials rotated                       │
│ □ All compromised resources replaced                        │
│ □ Persistence mechanisms removed/prevented                  │
│ □ Root cause addressed                                      │
│ □ Detection improved for similar attacks                    │
│ □ No ongoing suspicious activity                            │
│ □ Stakeholders notified                                     │
│ □ Documentation complete                                    │
└─────────────────────────────────────────────────────────────┘

Post-Incident Activities:

Lessons Learned:

POST-INCIDENT REVIEW:
┌─────────────────────────────────────────────────────────────┐
│ Meeting within 1-2 weeks of incident closure:               │
│                                                             │
│ 1. INCIDENT SUMMARY                                         │
│    - What happened                                          │
│    - Timeline of events                                     │
│    - Impact assessment                                      │
│                                                             │
│ 2. WHAT WENT WELL                                           │
│    - Detection that worked                                  │
│    - Response that was effective                            │
│    - Tools that helped                                      │
│                                                             │
│ 3. WHAT COULD BE IMPROVED                                   │
│    - Detection gaps                                         │
│    - Response delays                                        │
│    - Missing capabilities                                   │
│    - Process failures                                       │
│                                                             │
│ 4. ACTION ITEMS                                             │
│    - Specific improvements                                  │
│    - Owners and deadlines                                   │
│    - Priority ranking                                       │
│                                                             │
│ Blameless culture: Focus on systems, not individuals        │
└─────────────────────────────────────────────────────────────┘

COMMON IMPROVEMENTS:
┌─────────────────────────────────────────────────────────────┐
│ Detection:                                                  │
│ - Add new detection rules                                   │
│ - Tune existing rules                                       │
│ - Enable additional logging                                 │
│ - Expand monitoring coverage                                │
│                                                             │
│ Prevention:                                                 │
│ - Fix vulnerability that enabled access                     │
│ - Improve IAM policies                                      │
│ - Add network controls                                      │
│ - Update security configurations                            │
│                                                             │
│ Response:                                                   │
│ - Update runbooks                                           │
│ - Improve tooling                                           │
│ - Additional training                                       │
│ - Better communication                                      │
└─────────────────────────────────────────────────────────────┘

METRICS TO TRACK:
┌─────────────────────────────────────────────────────────────┐
│ - Time to detect                                            │
│ - Time to contain                                           │
│ - Time to eradicate                                         │
│ - Time to recover                                           │
│ - Total incident duration                                   │
│ - Number of systems affected                                │
│ - Data exposure scope                                       │
│ - Business impact                                           │
│                                                             │
│ Track trends over time to measure improvement               │
└─────────────────────────────────────────────────────────────┘

Key insight: Every incident is a learning opportunity. Post-incident improvements prevent future incidents.

Real-World Context

Case Study: AWS Access Key Compromise

A developer accidentally committed AWS access keys to a public GitHub repository. Within minutes, automated scanners detected the keys and attackers began using them to launch EC2 instances for cryptomining. The organization's response: (1) immediately deactivated the compromised keys via IAM, (2) used CloudTrail to identify all actions taken with the keys, (3) terminated unauthorized EC2 instances, (4) rotated all credentials that the compromised user had access to, (5) enabled GuardDuty for future detection. The incident highlighted the need for secret scanning in CI/CD and the speed required for cloud IR.

Case Study: EC2 Instance Compromise via SSRF

An attacker exploited an SSRF vulnerability to access the EC2 metadata service, obtaining IAM role credentials. The response team: (1) isolated the instance using security groups, (2) created forensic snapshots, (3) analyzed CloudTrail to determine scope of credential use, (4) revoked all sessions for the role, (5) patched the SSRF vulnerability, (6) deployed IMDSv2 requirement account-wide, (7) reduced IAM role permissions. The incident demonstrated the importance of IMDSv2 and least privilege IAM roles.

Cloud IR Checklist:

Cloud Incident Response Checklist:

PREPARATION (Before Incident):
□ CloudTrail enabled all regions
□ VPC Flow Logs enabled
□ Log retention configured
□ Forensic account ready
□ Isolation security groups created
□ Response runbooks documented
□ Team trained on cloud IR
□ Communication channels established
□ Legal/compliance contacts identified

DETECTION:
□ Alert validated (true positive?)
□ Initial scope determined
□ Severity assessed
□ Stakeholders notified
□ Incident ticket created

CONTAINMENT:
□ Evidence preserved BEFORE containment
□ Compromised credentials disabled
□ Affected instances isolated
□ Network access restricted
□ Ongoing monitoring for attacker activity

ERADICATION:
□ Root cause identified
□ All persistence mechanisms found
□ Compromised resources identified
□ Credentials to rotate identified

RECOVERY:
□ Infrastructure redeployed from IaC
□ All credentials rotated
□ Verification testing complete
□ Monitoring enhanced
□ Normal operations restored

POST-INCIDENT:
□ Timeline documented
□ Lessons learned meeting held
□ Improvements identified and assigned
□ Detection rules updated
□ Runbooks updated
□ Final report completed

Effective IR is about preparation and practice. Run tabletop exercises regularly to test your procedures.

Guided Lab: Incident Response Simulation

In this lab, you'll practice responding to a simulated cloud security incident.

Lab Environment:

AWS account with EC2, IAM, CloudTrail access
Pre-configured "compromised" environment
CloudTrail logs with attack activity
AWS CLI configured

Exercise Steps:

Receive simulated GuardDuty alert
Perform initial triage and assessment
Preserve evidence (snapshots, log exports)
Contain affected resources
Investigate using CloudTrail queries
Determine scope and timeline
Identify root cause and persistence
Execute recovery procedures
Document findings and lessons learned

Reflection Questions:

What would you do differently if this were a real incident?
What preparation would have made response faster?
How would you improve detection for this attack type?

Week Outcome Check

By the end of this week, you should be able to:

Explain how cloud IR differs from traditional IR
Collect and preserve evidence from AWS resources
Contain compromised EC2 instances using security groups
Disable and rotate compromised IAM credentials
Investigate incidents using CloudTrail and VPC Flow Logs
Implement automated containment with Lambda
Execute recovery procedures including credential rotation
Conduct post-incident reviews and implement improvements

📚 Building on Prior Knowledge

Cloud incident response builds on core detection and risk skills:

CSY204 (IR Workflows): Use containment and evidence-handling steps in cloud contexts.
CSY201 (OS + Logs): Apply log triage habits to CloudTrail and VPC Flow Logs.
CSY104 (Networking): Network paths explain lateral movement and exfiltration.
CSY104 Week 11 (CVSS): Use severity to prioritize response actions.

🎯 Hands-On Labs (Free & Essential)

Practice cloud incident response with realistic scenarios and forensic analysis.

🚨 TryHackMe: AWS Incident Response

What you'll do: Respond to simulated AWS security incidents—analyze CloudTrail logs, isolate compromised instances, and recover.
Why it matters: Practice under pressure builds muscle memory for real incidents.
Time estimate: 3-4 hours

Start AWS IR Lab →

🔍 AWS Skill Builder: Security Incident Response

What you'll do: Learn AWS IR tools—CloudTrail forensics, snapshot preservation, and automated response with Lambda.
Why it matters: AWS provides powerful IR capabilities—know how to use them.
Time estimate: 2-3 hours

Open AWS Security Training →

🧪 Cloud Forensics Practice

What you'll do: Analyze EC2 memory dumps and disk snapshots—extract artifacts, find indicators of compromise, build timelines.
Why it matters: Digital forensics skills translate to cloud—memory is still memory.
Time estimate: 3-4 hours

Open Cloud Forensics Tools →

💡 Lab Strategy: Build IR runbooks BEFORE incidents—panic is not a good time to figure out how to snapshot an instance.

Resources

Required: AWS Security Incident Response Guide — Official AWS IR documentation (60 min)

Required: AWS Automated Incident Response — Automation patterns for IR (30 min)

Optional: AWS Incident Response Playbooks — Sample runbooks and procedures (45 min)

Lab

Complete the following lab exercises to practice cloud incident response.

Part 1: Evidence Collection (LO7)

Practice evidence collection: (a) create forensic snapshots of EC2 volumes, (b) export CloudTrail logs for timeframe, (c) capture instance metadata, (d) document chain of custody.

Deliverable: Evidence collection documentation with snapshots, logs, and custody records.

Part 2: Containment (LO7)

Implement containment: (a) create isolation security group, (b) isolate EC2 instance, (c) disable IAM access key, (d) revoke role sessions, (e) verify containment effective.

Deliverable: Containment commands and verification that isolated resources cannot communicate.

Part 3: Investigation (LO7)

Investigate simulated incident: (a) query CloudTrail for suspicious activity, (b) build timeline of events, (c) identify scope of compromise, (d) find persistence mechanisms.

Deliverable: Investigation report with timeline, scope, and findings.

Part 4: Automated Response (LO7)

Build automated containment: (a) create Lambda function for EC2 isolation, (b) create EventBridge rule for GuardDuty, (c) test automation with simulated finding, (d) add SNS notification.

Deliverable: Lambda function, EventBridge rule, and test results.

Part 5: Recovery (LO7)

Execute recovery: (a) rotate compromised credentials, (b) redeploy infrastructure from IaC, (c) verify clean state, (d) document lessons learned.

Deliverable: Recovery documentation and post-incident report.

Checkpoint Questions

How does incident response in cloud differ from traditional on-premises IR? What new capabilities does cloud provide?
Describe the process of collecting forensic evidence from an EC2 instance without terminating it.
What are the containment options for a compromised EC2 instance? What are the tradeoffs of each?
How do you revoke active sessions for an IAM role? Why is just deleting the role insufficient?
What CloudTrail queries would you use to investigate a compromised IAM user? What are you looking for?
What credentials must be rotated after a cloud incident? How do you determine the scope of rotation needed?

Week 10 Quiz

Test your understanding of Cloud Incident Response, evidence collection, and containment.

Format: 10 multiple-choice questions. Passing score: 70%. Time: Untimed.

Take Quiz

Weekly Reflection

Cloud incident response requires different techniques but follows the same principles as traditional IR. This week explored responding effectively to cloud security incidents.

Reflect on the following in 200-300 words:

Cloud enables faster response but also faster damage. How should organizations balance automated response (which might have false positives) with human review (which takes longer)?
Evidence in cloud is distributed across many services and potentially many accounts. How would you ensure you have complete visibility for investigations?
The ability to quickly terminate and redeploy from IaC is a major cloud advantage. What might prevent an organization from using this approach?
How has this week changed your understanding of incident response in cloud environments?

A strong reflection demonstrates understanding of both the new capabilities and new challenges that cloud brings to incident response.

Verified Resources & Videos

AWS re:Invent - Incident Response in AWS — Cloud IR best practices (50 min)
Cloud Forensics and IR — Practical cloud forensics (45 min)
Automating Security Response — Lambda-based IR automation (40 min)
AWS Security Automation — Automated response tools
AWS IR Reference Architecture — IR design patterns