CSY302 Week 08 - Build cloud security monitoring with CloudTrail, GuardDuty, and SIEM integration.

Opening Framing

Prevention eventually fails. Attackers find vulnerabilities, credentials get compromised, and misconfigurations slip through. When prevention fails, detection becomes your last line of defense. In cloud environments, the API-driven nature of everything creates unprecedented visibility opportunities—every action generates logs, every resource change is recorded, and security services can analyze patterns across your entire environment.

Effective cloud security monitoring requires understanding what to log, how to centralize and retain logs, what constitutes suspicious activity, and how to respond when threats are detected. Cloud-native services like CloudTrail, GuardDuty, and Security Hub provide building blocks, but transforming raw data into actionable security intelligence requires deliberate design and continuous tuning.

This week covers cloud logging services, security monitoring with GuardDuty and Security Hub, log analysis and SIEM integration, alerting strategies, and building detection capabilities. You'll learn to build comprehensive visibility into your cloud security posture.

Key insight: The average time to detect a breach is still measured in months. Effective monitoring can reduce this to minutes.

1) Cloud Logging Fundamentals

Understanding what logs are available and how to collect them is the foundation of cloud security monitoring:

AWS Logging Landscape:

LOG SOURCES:
┌─────────────────────────────────────────────────────────────┐
│ CONTROL PLANE LOGS (Who did what to AWS):                   │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ CloudTrail                                              │ │
│ │ - API calls to AWS services                             │ │
│ │ - Management events (create, modify, delete)            │ │
│ │ - Data events (S3 object access, Lambda invocations)    │ │
│ │ - Insights events (unusual API activity)                │ │
│ └─────────────────────────────────────────────────────────┘ │
│                                                             │
│ DATA PLANE LOGS (What happened in resources):               │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ VPC Flow Logs - Network traffic metadata                │ │
│ │ S3 Access Logs - Object-level access                    │ │
│ │ ELB Access Logs - Load balancer requests                │ │
│ │ CloudFront Logs - CDN requests                          │ │
│ │ RDS Logs - Database queries and connections             │ │
│ │ Lambda Logs - Function execution output                 │ │
│ └─────────────────────────────────────────────────────────┘ │
│                                                             │
│ APPLICATION LOGS:                                           │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ CloudWatch Logs - Application and system logs           │ │
│ │ Custom metrics and logs from your code                  │ │
│ └─────────────────────────────────────────────────────────┘ │
│                                                             │
│ SECURITY SERVICE LOGS:                                      │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ GuardDuty Findings - Threat detection                   │ │
│ │ Security Hub Findings - Aggregated security findings    │ │
│ │ WAF Logs - Web attack attempts                          │ │
│ │ Config - Resource configuration changes                 │ │
│ │ Inspector - Vulnerability findings                      │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

CloudTrail Deep Dive:

AWS CloudTrail:

CLOUDTRAIL EVENT TYPES:
┌─────────────────────────────────────────────────────────────┐
│ Management Events (default enabled):                        │
│ - Control plane operations                                  │
│ - CreateBucket, RunInstances, CreateUser                    │
│ - Enabled by default in all regions                         │
│ - Essential for security monitoring                         │
│                                                             │
│ Data Events (must enable):                                  │
│ - Data plane operations                                     │
│ - S3: GetObject, PutObject, DeleteObject                    │
│ - Lambda: Invoke                                            │
│ - DynamoDB: GetItem, PutItem                                │
│ - Higher volume, additional cost                            │
│                                                             │
│ Insights Events (must enable):                              │
│ - Unusual API activity detection                            │
│ - Baseline normal behavior, alert on anomalies              │
│ - write management events                                   │
└─────────────────────────────────────────────────────────────┘

CLOUDTRAIL EVENT STRUCTURE:
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "AIDAEXAMPLE",
        "arn": "arn:aws:iam::123456789012:user/alice",
        "accountId": "123456789012",
        "userName": "alice"
    },
    "eventTime": "2024-01-15T14:30:00Z",
    "eventSource": "s3.amazonaws.com",
    "eventName": "GetObject",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "192.0.2.1",
    "userAgent": "aws-cli/2.0",
    "requestParameters": {
        "bucketName": "sensitive-data",
        "key": "customer-records.csv"
    },
    "responseElements": null,
    "requestID": "EXAMPLE123",
    "eventID": "EXAMPLE456",
    "readOnly": true,
    "resources": [{
        "type": "AWS::S3::Object",
        "ARN": "arn:aws:s3:::sensitive-data/customer-records.csv"
    }],
    "eventType": "AwsApiCall"
}

KEY FIELDS FOR SECURITY:
┌─────────────────────────────────────────────────────────────┐
│ userIdentity    - WHO performed the action                  │
│ eventTime       - WHEN it happened                          │
│ eventSource     - WHICH service                             │
│ eventName       - WHAT action                               │
│ sourceIPAddress - WHERE from                                │
│ errorCode       - Did it succeed/fail?                      │
│ errorMessage    - Why did it fail?                          │
└─────────────────────────────────────────────────────────────┘

CLOUDTRAIL BEST PRACTICES:
┌─────────────────────────────────────────────────────────────┐
│ ✓ Enable in ALL regions (attacks happen in unused regions)  │
│ ✓ Enable for all accounts in organization                   │
│ ✓ Centralize logs to security account                       │
│ ✓ Enable log file validation (integrity)                    │
│ ✓ Encrypt logs with KMS                                     │
│ ✓ Set S3 bucket policy to prevent deletion                  │
│ ✓ Enable data events for sensitive buckets                  │
│ ✓ Retain logs for compliance period (often 1+ years)        │
└─────────────────────────────────────────────────────────────┘

VPC Flow Logs:

VPC Flow Logs:

FLOW LOG RECORD:
┌─────────────────────────────────────────────────────────────┐
│ version account-id interface-id srcaddr dstaddr srcport    │
│ dstport protocol packets bytes start end action log-status │
│                                                             │
│ Example:                                                    │
│ 2 123456789012 eni-abc123 10.0.1.5 52.94.76.10             │
│ 34892 443 6 10 840 1620000000 1620000060 ACCEPT OK          │
│                                                             │
│ Interpretation:                                             │
│ - Source: 10.0.1.5:34892 (internal instance)                │
│ - Dest: 52.94.76.10:443 (external HTTPS)                    │
│ - Protocol 6 = TCP                                          │
│ - 10 packets, 840 bytes                                     │
│ - Action: ACCEPT (traffic allowed)                          │
└─────────────────────────────────────────────────────────────┘

CUSTOM FLOW LOG FORMAT:
┌─────────────────────────────────────────────────────────────┐
│ Include additional fields for security analysis:            │
│                                                             │
│ ${version} ${account-id} ${interface-id} ${srcaddr}         │
│ ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets}     │
│ ${bytes} ${start} ${end} ${action} ${log-status}            │
│ ${vpc-id} ${subnet-id} ${instance-id} ${tcp-flags}          │
│ ${pkt-srcaddr} ${pkt-dstaddr} ${traffic-path}               │
│                                                             │
│ traffic-path shows routing through NAT, IGW, etc.           │
└─────────────────────────────────────────────────────────────┘

SECURITY USE CASES:
┌─────────────────────────────────────────────────────────────┐
│ - Detect unauthorized network access attempts               │
│ - Identify data exfiltration (large outbound transfers)     │
│ - Find scanning activity (many REJECT entries)              │
│ - Investigate lateral movement                              │
│ - Verify network segmentation effectiveness                 │
│ - Detect C2 communication patterns                          │
└─────────────────────────────────────────────────────────────┘

Key insight: Enable ALL log sources before you need them. During an incident, you can't go back and enable logging.

2) AWS Security Services

AWS provides native security monitoring services that analyze logs and detect threats:

Amazon GuardDuty:

WHAT GUARDDUTY DOES:
┌─────────────────────────────────────────────────────────────┐
│ Intelligent threat detection using:                         │
│ - CloudTrail logs (management and S3 data events)           │
│ - VPC Flow Logs                                             │
│ - DNS logs                                                  │
│ - EKS audit logs                                            │
│ - RDS login activity                                        │
│ - Lambda network activity                                   │
│ - S3 data events                                            │
│ - Runtime monitoring (EC2, ECS, EKS)                        │
│                                                             │
│ Detection Methods:                                          │
│ - Threat intelligence (known bad IPs, domains)              │
│ - Anomaly detection (ML-based baseline)                     │
│ - Pattern matching (attack signatures)                      │
└─────────────────────────────────────────────────────────────┘

GUARDDUTY FINDING TYPES:
┌─────────────────────────────────────────────────────────────┐
│ EC2 Findings:                                               │
│ - Backdoor:EC2/DenialOfService                              │
│ - CryptoCurrency:EC2/BitcoinTool                            │
│ - Trojan:EC2/BlackholeTraffic                               │
│ - UnauthorizedAccess:EC2/SSHBruteForce                      │
│ - Recon:EC2/PortProbeUnprotectedPort                        │
│                                                             │
│ IAM Findings:                                               │
│ - CredentialAccess:IAMUser/AnomalousBehavior                │
│ - PenTest:IAMUser/KaliLinux                                 │
│ - UnauthorizedAccess:IAMUser/ConsoleLogin                   │
│ - Persistence:IAMUser/AnomalousBehavior                     │
│                                                             │
│ S3 Findings:                                                │
│ - Exfiltration:S3/MaliciousIPCaller                         │
│ - Discovery:S3/MaliciousIPCaller                            │
│ - UnauthorizedAccess:S3/MaliciousIPCaller                   │
│                                                             │
│ Kubernetes Findings:                                        │
│ - PrivilegeEscalation:Kubernetes/PrivilegedContainer        │
│ - Persistence:Kubernetes/ContainerWithSensitiveMount        │
└─────────────────────────────────────────────────────────────┘

GUARDDUTY FINDING EXAMPLE:
{
    "schemaVersion": "2.0",
    "id": "123456789012-1234-abcd-1234",
    "type": "UnauthorizedAccess:IAMUser/ConsoleLogin",
    "severity": 5,
    "title": "Console login from unusual location",
    "description": "IAM user alice logged in from 
                    an unusual geographic location",
    "resource": {
        "resourceType": "AccessKey",
        "accessKeyDetails": {
            "userName": "alice",
            "userType": "IAMUser"
        }
    },
    "service": {
        "action": {
            "actionType": "AWS_API_CALL",
            "awsApiCallAction": {
                "api": "ConsoleLogin",
                "remoteIpDetails": {
                    "ipAddressV4": "192.0.2.1",
                    "country": {"countryName": "Russia"}
                }
            }
        }
    }
}

AWS Security Hub:

AWS Security Hub:

SECURITY HUB CAPABILITIES:
┌─────────────────────────────────────────────────────────────┐
│ 1. Aggregated Security Findings:                            │
│    - GuardDuty findings                                     │
│    - Inspector vulnerability findings                       │
│    - IAM Access Analyzer findings                           │
│    - Firewall Manager findings                              │
│    - Macie findings                                         │
│    - Third-party tool findings                              │
│                                                             │
│ 2. Compliance Standards:                                    │
│    - AWS Foundational Security Best Practices               │
│    - CIS AWS Foundations Benchmark                          │
│    - PCI DSS                                                │
│    - NIST 800-53                                            │
│                                                             │
│ 3. Automated Response:                                      │
│    - Custom actions                                         │
│    - EventBridge integration                                │
│    - Automated remediation                                  │
└─────────────────────────────────────────────────────────────┘

SECURITY HUB ARCHITECTURE:
┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐        │
│  │GuardDuty │ │Inspector │ │  Macie   │ │3rd Party │        │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘        │
│       │            │            │            │              │
│       └────────────┼────────────┼────────────┘              │
│                    ▼                                        │
│           ┌─────────────────┐                               │
│           │  Security Hub   │                               │
│           │                 │                               │
│           │ - Normalize     │                               │
│           │ - Aggregate     │                               │
│           │ - Prioritize    │                               │
│           │ - Compliance    │                               │
│           └────────┬────────┘                               │
│                    │                                        │
│          ┌─────────┼─────────┐                              │
│          ▼         ▼         ▼                              │
│    ┌──────────┐ ┌──────┐ ┌──────────┐                       │
│    │Dashboard │ │EventBr│ │  SIEM    │                      │
│    │          │ │idge   │ │          │                      │
│    └──────────┘ └──────┘ └──────────┘                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

AUTOMATED REMEDIATION EXAMPLE:
# EventBridge rule triggered by Security Hub finding
# Lambda function to auto-remediate

def lambda_handler(event, context):
    finding = event['detail']['findings'][0]
    
    # Check finding type
    if finding['Type'] == 'Software and Configuration Checks/\
                          AWS Security Best Practices/\
                          S3 Bucket Public Access':
        
        bucket_name = finding['Resources'][0]['Id'].split(':')[-1]
        
        # Block public access
        s3 = boto3.client('s3')
        s3.put_public_access_block(
            Bucket=bucket_name,
            PublicAccessBlockConfiguration={
                'BlockPublicAcls': True,
                'IgnorePublicAcls': True,
                'BlockPublicPolicy': True,
                'RestrictPublicBuckets': True
            }
        )
        
        # Update finding status
        securityhub = boto3.client('securityhub')
        securityhub.batch_update_findings(
            FindingIdentifiers=[{
                'Id': finding['Id'],
                'ProductArn': finding['ProductArn']
            }],
            Workflow={'Status': 'RESOLVED'}
        )

Key insight: Security Hub normalizes findings into a common format (ASFF), enabling consistent processing regardless of source.

3) Log Analysis and SIEM Integration

Raw logs must be analyzed to identify threats and support investigations:

CloudWatch Logs Insights:

QUERY LANGUAGE:
┌─────────────────────────────────────────────────────────────┐
│ Basic Structure:                                            │
│ fields @timestamp, @message                                 │
│ | filter @message like /error/                              │
│ | sort @timestamp desc                                      │
│ | limit 100                                                 │
└─────────────────────────────────────────────────────────────┘

SECURITY QUERIES:

# Failed console logins
fields @timestamp, userIdentity.userName, sourceIPAddress, 
       errorCode, errorMessage
| filter eventSource = 'signin.amazonaws.com'
| filter errorCode = 'ConsoleLoginFailure'
| stats count(*) as failedLogins by sourceIPAddress, 
        userIdentity.userName
| sort failedLogins desc

# Root account usage
fields @timestamp, eventName, sourceIPAddress, userAgent
| filter userIdentity.type = 'Root'
| filter eventName not like /ConsoleLogin/

# Security group changes
fields @timestamp, userIdentity.userName, eventName, 
       requestParameters.groupId
| filter eventSource = 'ec2.amazonaws.com'
| filter eventName in ['AuthorizeSecurityGroupIngress', 
                        'AuthorizeSecurityGroupEgress',
                        'RevokeSecurityGroupIngress',
                        'RevokeSecurityGroupEgress']

# IAM policy changes
fields @timestamp, userIdentity.userName, eventName, 
       requestParameters.policyName
| filter eventSource = 'iam.amazonaws.com'
| filter eventName in ['CreatePolicy', 'DeletePolicy',
                        'AttachUserPolicy', 'DetachUserPolicy',
                        'PutUserPolicy', 'DeleteUserPolicy']

# S3 bucket access from unusual IPs
fields @timestamp, userIdentity.userName, sourceIPAddress,
       requestParameters.bucketName
| filter eventSource = 's3.amazonaws.com'
| filter sourceIPAddress not like /^10\./
| filter sourceIPAddress not like /^192\.168\./
| stats count(*) by sourceIPAddress, requestParameters.bucketName
| sort count desc

SIEM Integration:

SIEM Integration Patterns:

LOG FORWARDING OPTIONS:
┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  CloudTrail / VPC Flow Logs / CloudWatch Logs               │
│                    │                                        │
│      ┌─────────────┼─────────────┐                          │
│      ▼             ▼             ▼                          │
│  ┌───────┐   ┌──────────┐   ┌────────────┐                  │
│  │  S3   │   │ Kinesis  │   │CloudWatch  │                  │
│  │       │   │ Firehose │   │Subscription│                  │
│  └───┬───┘   └────┬─────┘   └─────┬──────┘                  │
│      │            │               │                         │
│      ▼            ▼               ▼                         │
│  ┌─────────────────────────────────────────────┐            │
│  │              SIEM Platform                  │            │
│  │  (Splunk, Elastic, Sumo Logic, etc.)        │            │
│  └─────────────────────────────────────────────┘            │
│                                                             │
└─────────────────────────────────────────────────────────────┘

KINESIS FIREHOSE TO SPLUNK:
┌─────────────────────────────────────────────────────────────┐
│ 1. Create Kinesis Firehose delivery stream                  │
│ 2. Configure Splunk HEC (HTTP Event Collector) endpoint     │
│ 3. Subscribe CloudWatch Log Groups to Firehose              │
│ 4. Configure Lambda transformation if needed                │
│                                                             │
│ Benefits:                                                   │
│ - Near real-time delivery                                   │
│ - Buffering for efficiency                                  │
│ - Automatic retry                                           │
│ - Data transformation                                       │
└─────────────────────────────────────────────────────────────┘

AMAZON SECURITY LAKE:
┌─────────────────────────────────────────────────────────────┐
│ Purpose-built security data lake:                           │
│                                                             │
│ Features:                                                   │
│ - Automatic log collection from AWS sources                 │
│ - OCSF (Open Cybersecurity Schema Framework) normalization  │
│ - S3-based storage (Parquet format)                         │
│ - Cross-account and cross-region                            │
│ - Partner integrations                                      │
│                                                             │
│ Sources:                                                    │
│ - CloudTrail                                                │
│ - VPC Flow Logs                                             │
│ - Route 53 DNS logs                                         │
│ - Security Hub findings                                     │
│ - S3 access logs                                            │
│ - Lambda execution logs                                     │
│ - EKS audit logs                                            │
│                                                             │
│ Subscribers:                                                │
│ - Athena for querying                                       │
│ - Third-party SIEM/analytics                                │
│ - Custom applications                                       │
└─────────────────────────────────────────────────────────────┘

Threat Detection Queries:

Common Threat Detection Patterns:

CREDENTIAL COMPROMISE INDICATORS:
┌─────────────────────────────────────────────────────────────┐
│ 1. Console login from new location                          │
│ 2. API calls from new source IP                             │
│ 3. Access key used after long inactivity                    │
│ 4. Multiple failed authentication attempts                  │
│ 5. Successful login after multiple failures                 │
│ 6. API calls at unusual hours                               │
│ 7. Impossible travel (login from distant locations)         │
└─────────────────────────────────────────────────────────────┘

DATA EXFILTRATION INDICATORS:
┌─────────────────────────────────────────────────────────────┐
│ 1. Large S3 downloads                                       │
│ 2. S3 access from unusual IP ranges                         │
│ 3. New S3 bucket replication configured                     │
│ 4. Snapshots shared to external accounts                    │
│ 5. Large outbound data transfer (VPC Flow)                  │
│ 6. Database export to new location                          │
└─────────────────────────────────────────────────────────────┘

PERSISTENCE INDICATORS:
┌─────────────────────────────────────────────────────────────┐
│ 1. New IAM user or access key created                       │
│ 2. New IAM role with trust policy changes                   │
│ 3. Lambda function created/modified                         │
│ 4. EventBridge rule created                                 │
│ 5. New EC2 instance in unusual region                       │
│ 6. SSM document created                                     │
└─────────────────────────────────────────────────────────────┘

PRIVILEGE ESCALATION INDICATORS:
┌─────────────────────────────────────────────────────────────┐
│ 1. Policy attachment to user/role                           │
│ 2. AssumeRole to more privileged role                       │
│ 3. CreatePolicyVersion                                      │
│ 4. PassRole to service                                      │
│ 5. Instance profile changes                                 │
│ 6. STS GetSessionToken/GetFederationToken                   │
└─────────────────────────────────────────────────────────────┘

EXAMPLE DETECTION RULE (Splunk):
index=cloudtrail eventName=ConsoleLogin
| stats count by userIdentity.userName, sourceIPAddress, 
        userIdentity.arn
| where count > 1
| join type=inner userIdentity.userName 
    [search index=cloudtrail eventName=ConsoleLogin earliest=-30d
     | stats values(sourceIPAddress) as historical_ips 
       by userIdentity.userName]
| where NOT sourceIPAddress IN (historical_ips)
| table _time, userIdentity.userName, sourceIPAddress

Key insight: Detection is about baselines. Know what's normal so you can identify what's abnormal.

4) Alerting and Response

Effective alerting ensures security teams are notified of real threats without alert fatigue:

Alerting Strategy:

ALERT SEVERITY LEVELS:
┌─────────────────────────────────────────────────────────────┐
│ CRITICAL (Immediate Response):                              │
│ - Root account activity                                     │
│ - Active data exfiltration                                  │
│ - Confirmed compromise indicators                           │
│ - Critical GuardDuty findings (severity 7+)                 │
│ Response: Page on-call, immediate investigation             │
│                                                             │
│ HIGH (Same-Day Response):                                   │
│ - Unauthorized access attempts                              │
│ - Security group exposing sensitive ports                   │
│ - IAM policy changes                                        │
│ - High GuardDuty findings (severity 4-6.9)                  │
│ Response: Investigate within hours                          │
│                                                             │
│ MEDIUM (Next-Day Response):                                 │
│ - Compliance violations                                     │
│ - Configuration drift                                       │
│ - Medium GuardDuty findings (severity 1-3.9)                │
│ Response: Triage and prioritize                             │
│                                                             │
│ LOW (Weekly Review):                                        │
│ - Informational findings                                    │
│ - Best practice recommendations                             │
│ Response: Include in regular review                         │
└─────────────────────────────────────────────────────────────┘

AVOIDING ALERT FATIGUE:
┌─────────────────────────────────────────────────────────────┐
│ 1. Tune detection rules to reduce false positives           │
│ 2. Correlate multiple signals before alerting               │
│ 3. Suppress known benign activity                           │
│ 4. Use tiered alerting (not everything pages)               │
│ 5. Automate low-value alerts                                │
│ 6. Regular review and tuning                                │
└─────────────────────────────────────────────────────────────┘

CloudWatch Alarms:

CloudWatch Alarms for Security:

METRIC FILTER + ALARM PATTERN:

# Step 1: Create metric filter on CloudTrail log group
aws logs put-metric-filter \
  --log-group-name CloudTrail/logs \
  --filter-name RootAccountUsage \
  --filter-pattern '{ $.userIdentity.type = "Root" && 
                      $.eventType != "AwsServiceEvent" }' \
  --metric-transformations \
      metricName=RootAccountUsageCount,\
      metricNamespace=SecurityMetrics,\
      metricValue=1

# Step 2: Create alarm
aws cloudwatch put-metric-alarm \
  --alarm-name RootAccountUsageAlarm \
  --alarm-description "Alert on root account usage" \
  --metric-name RootAccountUsageCount \
  --namespace SecurityMetrics \
  --statistic Sum \
  --period 300 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:security-alerts

ESSENTIAL SECURITY ALARMS:
┌─────────────────────────────────────────────────────────────┐
│ Root Account Activity:                                      │
│ Pattern: { $.userIdentity.type = "Root" }                   │
│                                                             │
│ IAM Policy Changes:                                         │
│ Pattern: { ($.eventName = CreatePolicy) ||                  │
│           ($.eventName = DeletePolicy) ||                   │
│           ($.eventName = AttachUserPolicy) ||               │
│           ($.eventName = DetachUserPolicy) }                │
│                                                             │
│ Console Login Failures:                                     │
│ Pattern: { ($.eventName = ConsoleLogin) &&                  │
│           ($.errorMessage = "Failed authentication") }      │
│                                                             │
│ Security Group Changes:                                     │
│ Pattern: { ($.eventName = AuthorizeSecurityGroupIngress) || │
│           ($.eventName = AuthorizeSecurityGroupEgress) }    │
│                                                             │
│ CloudTrail Changes:                                         │
│ Pattern: { ($.eventName = StopLogging) ||                   │
│           ($.eventName = DeleteTrail) ||                    │
│           ($.eventName = UpdateTrail) }                     │
│                                                             │
│ Network Gateway Changes:                                    │
│ Pattern: { ($.eventName = CreateInternetGateway) ||         │
│           ($.eventName = AttachInternetGateway) }           │
└─────────────────────────────────────────────────────────────┘

EventBridge for Security Automation:

EventBridge Security Automation:

GUARDDUTY FINDING TO SNS:
{
    "source": ["aws.guardduty"],
    "detail-type": ["GuardDuty Finding"],
    "detail": {
        "severity": [
            {"numeric": [">=", 7]}
        ]
    }
}

GUARDDUTY FINDING TO LAMBDA (Auto-Response):
# Rule targets Lambda function
# Lambda isolates compromised instance

def lambda_handler(event, context):
    finding = event['detail']
    
    if finding['type'].startswith('UnauthorizedAccess:EC2'):
        instance_id = finding['resource']['instanceDetails']['instanceId']
        
        ec2 = boto3.client('ec2')
        
        # Create isolation security group (no ingress/egress)
        isolation_sg = ec2.create_security_group(
            GroupName=f'isolation-{instance_id}',
            Description='Isolation security group',
            VpcId=finding['resource']['instanceDetails']['networkInterfaces'][0]['vpcId']
        )
        
        # Replace instance security groups with isolation SG
        ec2.modify_instance_attribute(
            InstanceId=instance_id,
            Groups=[isolation_sg['GroupId']]
        )
        
        # Create snapshot for forensics
        volumes = ec2.describe_volumes(
            Filters=[{'Name': 'attachment.instance-id', 
                      'Values': [instance_id]}]
        )
        for vol in volumes['Volumes']:
            ec2.create_snapshot(
                VolumeId=vol['VolumeId'],
                Description=f'Forensic snapshot - GuardDuty finding'
            )
        
        # Notify security team
        sns = boto3.client('sns')
        sns.publish(
            TopicArn='arn:aws:sns:...:security-incidents',
            Message=f'Instance {instance_id} isolated due to GuardDuty finding',
            Subject='CRITICAL: EC2 Instance Isolated'
        )

SECURITY HUB TO SLACK:
# Lambda function posting to Slack webhook
def lambda_handler(event, context):
    finding = event['detail']['findings'][0]
    
    severity_colors = {
        'CRITICAL': '#FF0000',
        'HIGH': '#FF6600', 
        'MEDIUM': '#FFCC00',
        'LOW': '#00FF00'
    }
    
    message = {
        'attachments': [{
            'color': severity_colors.get(finding['Severity']['Label'], '#808080'),
            'title': finding['Title'],
            'text': finding['Description'],
            'fields': [
                {'title': 'Severity', 'value': finding['Severity']['Label'], 'short': True},
                {'title': 'Account', 'value': finding['AwsAccountId'], 'short': True},
                {'title': 'Resource', 'value': finding['Resources'][0]['Id']}
            ]
        }]
    }
    
    requests.post(SLACK_WEBHOOK_URL, json=message)

Key insight: Automate response to common scenarios. Human analysts should focus on novel threats, not routine responses.

5) Building a Security Operations Capability

Effective security monitoring requires people, process, and technology working together:

Security Operations Architecture:

CENTRALIZED SECURITY MONITORING:
┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              SECURITY ACCOUNT                       │    │
│  │                                                     │    │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐        │    │
│  │  │CloudTrail │  │ GuardDuty │  │Security   │        │    │
│  │  │  (Org)    │  │  (Org)    │  │  Hub      │        │    │
│  │  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘        │    │
│  │        │              │              │              │    │
│  │        └──────────────┼──────────────┘              │    │
│  │                       ▼                             │    │
│  │              ┌─────────────────┐                    │    │
│  │              │  Security Lake  │                    │    │
│  │              │   or S3 Bucket  │                    │    │
│  │              └────────┬────────┘                    │    │
│  │                       │                             │    │
│  │              ┌────────┴────────┐                    │    │
│  │              │   SIEM/Athena   │                    │    │
│  │              └────────┬────────┘                    │    │
│  │                       │                             │    │
│  │              ┌────────┴────────┐                    │    │
│  │              │  SOC Dashboard  │                    │    │
│  │              └─────────────────┘                    │    │
│  └─────────────────────────────────────────────────────┘    │
│                          ▲                                  │
│           ┌──────────────┼──────────────┐                   │
│           │              │              │                   │
│      ┌────┴────┐   ┌────┴────┐   ┌────┴────┐                │
│      │  Prod   │   │   Dev   │   │ Staging │                │
│      │ Account │   │ Account │   │ Account │                │
│      └─────────┘   └─────────┘   └─────────┘                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

MULTI-ACCOUNT LOG COLLECTION:
┌─────────────────────────────────────────────────────────────┐
│ CloudTrail Organization Trail:                              │
│ - Single trail for all accounts                             │
│ - Logs to central S3 bucket in security account             │
│ - KMS encryption with organization key                      │
│                                                             │
│ GuardDuty Organization:                                     │
│ - Delegated administrator in security account               │
│ - Auto-enable for new accounts                              │
│ - Central findings aggregation                              │
│                                                             │
│ Security Hub Organization:                                  │
│ - Delegated administrator                                   │
│ - Aggregation region                                        │
│ - Cross-region finding aggregation                          │
│                                                             │
│ Config Aggregator:                                          │
│ - Central view of resource configuration                    │
│ - Compliance status across accounts                         │
└─────────────────────────────────────────────────────────────┘

Incident Investigation Workflow:

Investigation Process:

INITIAL TRIAGE:
┌─────────────────────────────────────────────────────────────┐
│ 1. Validate the alert (true positive?)                      │
│ 2. Assess severity and scope                                │
│ 3. Identify affected resources                              │
│ 4. Determine timeline                                       │
│ 5. Preserve evidence                                        │
└─────────────────────────────────────────────────────────────┘

INVESTIGATION QUERIES:

# What did this principal do?
fields @timestamp, eventName, eventSource, sourceIPAddress,
       requestParameters, responseElements
| filter userIdentity.arn = 'arn:aws:iam::123456789012:user/suspect'
| sort @timestamp asc

# What else happened from this IP?
fields @timestamp, userIdentity.arn, eventName, eventSource
| filter sourceIPAddress = '192.0.2.1'
| sort @timestamp asc

# What resources were accessed?
fields @timestamp, eventName, requestParameters
| filter userIdentity.arn = 'arn:aws:iam::123456789012:user/suspect'
| filter eventSource in ['s3.amazonaws.com', 'dynamodb.amazonaws.com',
                          'secretsmanager.amazonaws.com']

# Were credentials created for persistence?
fields @timestamp, eventName, requestParameters, responseElements
| filter userIdentity.arn = 'arn:aws:iam::123456789012:user/suspect'
| filter eventName in ['CreateAccessKey', 'CreateUser', 
                        'CreateRole', 'CreateLoginProfile']

# Network connections from instance
fields @timestamp, srcaddr, dstaddr, dstport, bytes, action
| filter interfaceId = 'eni-abc123'
| filter action = 'ACCEPT'
| filter dstport not in [80, 443]  # unusual ports
| sort bytes desc

EVIDENCE PRESERVATION:
┌─────────────────────────────────────────────────────────────┐
│ 1. Enable object lock on relevant S3 logs                   │
│ 2. Create forensic snapshots of affected volumes            │
│ 3. Export CloudWatch logs to S3                             │
│ 4. Capture instance metadata                                │
│ 5. Document timeline                                        │
│                                                             │
│ # Create forensic snapshot                                  │
│ aws ec2 create-snapshot \                                   │
│   --volume-id vol-xxx \                                     │
│   --description "Forensic-$(date +%Y%m%d)-incident-123" \   │
│   --tag-specifications 'ResourceType=snapshot,              │
│     Tags=[{Key=Forensic,Value=true}]'                       │
└─────────────────────────────────────────────────────────────┘

Metrics and KPIs:

Security Monitoring Metrics:

OPERATIONAL METRICS:
┌─────────────────────────────────────────────────────────────┐
│ Mean Time to Detect (MTTD):                                 │
│ - Time from compromise to detection                         │
│ - Target: < 24 hours for critical issues                    │
│                                                             │
│ Mean Time to Respond (MTTR):                                │
│ - Time from detection to containment                        │
│ - Target: < 1 hour for critical issues                      │
│                                                             │
│ Alert Volume:                                               │
│ - Total alerts per day/week                                 │
│ - Alerts by severity                                        │
│ - Trend over time                                           │
│                                                             │
│ False Positive Rate:                                        │
│ - Percentage of alerts that are false positives             │
│ - Target: < 10%                                             │
│                                                             │
│ Alert Handling Time:                                        │
│ - Time from alert to closure                                │
│ - Breakdown by severity                                     │
└─────────────────────────────────────────────────────────────┘

COVERAGE METRICS:
┌─────────────────────────────────────────────────────────────┐
│ Log Collection Coverage:                                    │
│ - % of accounts with CloudTrail enabled                     │
│ - % of VPCs with flow logs                                  │
│ - % of resources with appropriate logging                   │
│                                                             │
│ Detection Coverage:                                         │
│ - MITRE ATT&CK techniques covered                           │
│ - Use cases implemented vs. planned                         │
│                                                             │
│ Compliance Posture:                                         │
│ - Security Hub score                                        │
│ - % of controls passing                                     │
│ - Trend over time                                           │
└─────────────────────────────────────────────────────────────┘

Key insight: Security operations is a continuous process, not a one-time implementation. Regular review and improvement are essential.

Real-World Context

Case Study: CloudTrail Disabled Attack

Attackers who compromise AWS credentials often disable CloudTrail as their first action to cover their tracks. In one incident, an attacker used compromised credentials to disable CloudTrail within minutes of access. Because the organization had alerting on CloudTrail configuration changes, the security team was notified immediately. They were able to re-enable logging, contain the incident, and use the brief window of logs to identify the initial access vector. Without the alert, the attack might have continued undetected for weeks.

Case Study: Cryptomining Detection via Flow Logs

An organization noticed unusual EC2 costs. Investigation of VPC Flow Logs revealed multiple instances making sustained connections to known cryptocurrency mining pool IPs. The instances had been launched using an overprivileged IAM role that allowed ec2:RunInstances without resource restrictions. The detection led to improved IAM policies, GuardDuty activation for cryptocurrency detection, and egress filtering to block mining pool connections.

Security Monitoring Checklist:

Security Monitoring Checklist:

LOG COLLECTION:
□ CloudTrail enabled all regions, all accounts
□ CloudTrail data events for sensitive buckets
□ VPC Flow Logs for all VPCs
□ S3 access logging for sensitive buckets
□ Load balancer access logs
□ Lambda function logs
□ RDS/database logs
□ Application logs to CloudWatch

THREAT DETECTION:
□ GuardDuty enabled all accounts
□ GuardDuty EKS/S3/RDS protection enabled
□ Security Hub enabled with standards
□ Config rules for compliance
□ Inspector for vulnerability scanning
□ Macie for data discovery

ALERTING:
□ Critical findings → Immediate notification
□ High severity → Same-day response queue
□ Root account activity alerts
□ CloudTrail modification alerts
□ Security group change alerts
□ IAM policy change alerts

LOG RETENTION:
□ Logs retained for compliance period
□ S3 lifecycle policies configured
□ Log integrity validation enabled
□ Logs encrypted with KMS

SIEM/ANALYSIS:
□ Logs forwarded to SIEM
□ Detection rules implemented
□ Dashboards for visibility
□ Regular rule tuning

RESPONSE:
□ Automated response for common scenarios
□ Runbooks documented
□ Escalation procedures defined
□ Regular response drills

Effective monitoring is about preparation. Build visibility before you need it, not during an incident.

Guided Lab: Security Monitoring Setup

In this lab, you'll configure comprehensive security monitoring with alerting and automated response.

Lab Environment:

AWS account with CloudTrail, GuardDuty, Security Hub access
AWS CLI or Console
CloudWatch Logs Insights access

Exercise Steps:

Configure CloudTrail with data events
Enable VPC Flow Logs
Enable GuardDuty
Enable Security Hub with standards
Create CloudWatch metric filters for security events
Create CloudWatch alarms
Configure EventBridge rule for GuardDuty findings
Create Lambda function for automated response
Test detection with simulated events

Reflection Questions:

How long would it take to detect credential compromise?
What events would trigger immediate alerts?
How would you investigate a GuardDuty finding?

Week Outcome Check

By the end of this week, you should be able to:

Configure CloudTrail with management and data events
Enable and interpret VPC Flow Logs
Use GuardDuty for threat detection
Aggregate findings with Security Hub
Write CloudWatch Logs Insights queries for security analysis
Create CloudWatch alarms for security events
Configure EventBridge rules for security automation
Design security monitoring architecture for multi-account environments

🎯 Hands-On Labs (Free & Essential)

Build cloud security monitoring with CloudTrail, GuardDuty, and SIEM integration.

📊 AWS Skill Builder: CloudTrail & Security Logging

What you'll do: Configure CloudTrail, analyze API logs, and build detection queries for suspicious activity.
Why it matters: CloudTrail is the foundation of AWS security visibility—master it.
Time estimate: 2-3 hours

Open AWS CloudTrail Training →

🛡️ TryHackMe: AWS GuardDuty & Detection

What you'll do: Enable GuardDuty, analyze threat findings, and create custom detection rules.
Why it matters: GuardDuty provides ML-powered threat detection—learn to use it effectively.
Time estimate: 2-3 hours

Start GuardDuty Lab →

📈 Microsoft Learn: Azure Sentinel & Cloud SIEM

What you'll do: Configure Azure Sentinel for cloud security monitoring and incident response.
Why it matters: Cloud-native SIEM skills are essential for multi-cloud security operations.
Time estimate: 3-4 hours

Open Azure Sentinel Training →

💡 Lab Strategy: Enable CloudTrail in ALL regions and S3 data events—attackers will target your monitoring blind spots.

Resources

Required: AWS CloudTrail User Guide — Comprehensive CloudTrail documentation (45 min)

Required: Amazon GuardDuty User Guide — Threat detection service documentation (45 min)

Optional: AWS Security Hub User Guide — Security findings aggregation (40 min)

Lab

Complete the following lab exercises to practice cloud security monitoring.

Part 1: CloudTrail Configuration (LO7)

Configure comprehensive CloudTrail: (a) create organization trail, (b) enable data events for S3, (c) enable log file validation, (d) configure KMS encryption, (e) verify logs are being collected.

Deliverable: CloudTrail configuration with sample events demonstrating data event collection.

Part 2: GuardDuty Setup (LO7)

Enable GuardDuty: (a) enable in all regions, (b) configure S3 protection, (c) configure EKS protection if applicable, (d) review sample findings, (e) export findings to S3.

Deliverable: GuardDuty configuration with evidence of protection features enabled.

Part 3: Security Alerting (LO7)

Create security alerts: (a) create metric filters for root activity and IAM changes, (b) create CloudWatch alarms, (c) configure SNS notifications, (d) test alerts trigger correctly.

Deliverable: Metric filter definitions, alarm configurations, and test alert notifications.

Part 4: Log Analysis (LO7)

Analyze security logs: (a) write CloudWatch Logs Insights queries for security use cases, (b) identify suspicious patterns, (c) create saved queries for investigation, (d) document findings.

Deliverable: Query library with at least 5 security-focused queries and sample results.

Part 5: Automated Response (LO7)

Implement automated response: (a) create EventBridge rule for GuardDuty findings, (b) create Lambda function to respond (isolate, notify), (c) test with sample finding, (d) document response workflow.

Deliverable: EventBridge rule, Lambda code, and evidence of successful automated response.

Checkpoint Questions

What is the difference between CloudTrail management events and data events? When would you enable each?
How does GuardDuty detect threats? What data sources does it analyze?
What is Security Hub? How does it relate to GuardDuty, Inspector, and other security services?
Write a CloudWatch Logs Insights query to find all failed console login attempts in the last 24 hours.
What security events should trigger immediate alerts? How do you avoid alert fatigue?
Describe how you would investigate a GuardDuty finding for cryptocurrency mining on an EC2 instance.

Week 08 Quiz

Test your understanding of Cloud Security Monitoring, Logging, and Response.

Format: 10 multiple-choice questions. Passing score: 70%. Time: Untimed.

Take Quiz

Weekly Reflection

Security monitoring transforms raw log data into actionable intelligence. This week explored building comprehensive visibility into cloud environments.

Reflect on the following in 200-300 words:

The volume of logs in cloud environments can be overwhelming. How do you balance comprehensive logging with the ability to actually analyze and act on the data?
Alert fatigue is a real problem that causes teams to miss real threats. How would you design an alerting strategy that maintains vigilance without overwhelming analysts?
Automation can respond faster than humans but may also cause harm if triggered incorrectly. What's the right balance between automated response and human review?
How has this week changed your understanding of the role of monitoring in security?

A strong reflection demonstrates understanding of monitoring as a continuous process requiring careful design to balance visibility with actionability.

Verified Resources & Videos

AWS re:Invent - Security Monitoring Best Practices — Comprehensive monitoring architecture (50 min)
Deep Dive on GuardDuty — Threat detection service walkthrough (45 min)
CloudTrail Lake and Security Analytics — Advanced log analysis (40 min)
AWS Security Analytics Bootstrap — Security analysis queries and dashboards
AWS Security Reference Architecture — Multi-account security design