Opening Framing
Prevention eventually fails. Attackers find vulnerabilities, credentials get compromised, and misconfigurations slip through. When prevention fails, detection becomes your last line of defense. In cloud environments, the API-driven nature of everything creates unprecedented visibility opportunities—every action generates logs, every resource change is recorded, and security services can analyze patterns across your entire environment.
Effective cloud security monitoring requires understanding what to log, how to centralize and retain logs, what constitutes suspicious activity, and how to respond when threats are detected. Cloud-native services like CloudTrail, GuardDuty, and Security Hub provide building blocks, but transforming raw data into actionable security intelligence requires deliberate design and continuous tuning.
This week covers cloud logging services, security monitoring with GuardDuty and Security Hub, log analysis and SIEM integration, alerting strategies, and building detection capabilities. You'll learn to build comprehensive visibility into your cloud security posture.
Key insight: The average time to detect a breach is still measured in months. Effective monitoring can reduce this to minutes.
1) Cloud Logging Fundamentals
Understanding what logs are available and how to collect them is the foundation of cloud security monitoring:
AWS Logging Landscape:
LOG SOURCES:
┌─────────────────────────────────────────────────────────────┐
│ CONTROL PLANE LOGS (Who did what to AWS): │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ CloudTrail │ │
│ │ - API calls to AWS services │ │
│ │ - Management events (create, modify, delete) │ │
│ │ - Data events (S3 object access, Lambda invocations) │ │
│ │ - Insights events (unusual API activity) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ DATA PLANE LOGS (What happened in resources): │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ VPC Flow Logs - Network traffic metadata │ │
│ │ S3 Access Logs - Object-level access │ │
│ │ ELB Access Logs - Load balancer requests │ │
│ │ CloudFront Logs - CDN requests │ │
│ │ RDS Logs - Database queries and connections │ │
│ │ Lambda Logs - Function execution output │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ APPLICATION LOGS: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ CloudWatch Logs - Application and system logs │ │
│ │ Custom metrics and logs from your code │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ SECURITY SERVICE LOGS: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ GuardDuty Findings - Threat detection │ │
│ │ Security Hub Findings - Aggregated security findings │ │
│ │ WAF Logs - Web attack attempts │ │
│ │ Config - Resource configuration changes │ │
│ │ Inspector - Vulnerability findings │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
CloudTrail Deep Dive:
AWS CloudTrail:
CLOUDTRAIL EVENT TYPES:
┌─────────────────────────────────────────────────────────────┐
│ Management Events (default enabled): │
│ - Control plane operations │
│ - CreateBucket, RunInstances, CreateUser │
│ - Enabled by default in all regions │
│ - Essential for security monitoring │
│ │
│ Data Events (must enable): │
│ - Data plane operations │
│ - S3: GetObject, PutObject, DeleteObject │
│ - Lambda: Invoke │
│ - DynamoDB: GetItem, PutItem │
│ - Higher volume, additional cost │
│ │
│ Insights Events (must enable): │
│ - Unusual API activity detection │
│ - Baseline normal behavior, alert on anomalies │
│ - write management events │
└─────────────────────────────────────────────────────────────┘
CLOUDTRAIL EVENT STRUCTURE:
{
"eventVersion": "1.08",
"userIdentity": {
"type": "IAMUser",
"principalId": "AIDAEXAMPLE",
"arn": "arn:aws:iam::123456789012:user/alice",
"accountId": "123456789012",
"userName": "alice"
},
"eventTime": "2024-01-15T14:30:00Z",
"eventSource": "s3.amazonaws.com",
"eventName": "GetObject",
"awsRegion": "us-east-1",
"sourceIPAddress": "192.0.2.1",
"userAgent": "aws-cli/2.0",
"requestParameters": {
"bucketName": "sensitive-data",
"key": "customer-records.csv"
},
"responseElements": null,
"requestID": "EXAMPLE123",
"eventID": "EXAMPLE456",
"readOnly": true,
"resources": [{
"type": "AWS::S3::Object",
"ARN": "arn:aws:s3:::sensitive-data/customer-records.csv"
}],
"eventType": "AwsApiCall"
}
KEY FIELDS FOR SECURITY:
┌─────────────────────────────────────────────────────────────┐
│ userIdentity - WHO performed the action │
│ eventTime - WHEN it happened │
│ eventSource - WHICH service │
│ eventName - WHAT action │
│ sourceIPAddress - WHERE from │
│ errorCode - Did it succeed/fail? │
│ errorMessage - Why did it fail? │
└─────────────────────────────────────────────────────────────┘
CLOUDTRAIL BEST PRACTICES:
┌─────────────────────────────────────────────────────────────┐
│ ✓ Enable in ALL regions (attacks happen in unused regions) │
│ ✓ Enable for all accounts in organization │
│ ✓ Centralize logs to security account │
│ ✓ Enable log file validation (integrity) │
│ ✓ Encrypt logs with KMS │
│ ✓ Set S3 bucket policy to prevent deletion │
│ ✓ Enable data events for sensitive buckets │
│ ✓ Retain logs for compliance period (often 1+ years) │
└─────────────────────────────────────────────────────────────┘
VPC Flow Logs:
VPC Flow Logs:
FLOW LOG RECORD:
┌─────────────────────────────────────────────────────────────┐
│ version account-id interface-id srcaddr dstaddr srcport │
│ dstport protocol packets bytes start end action log-status │
│ │
│ Example: │
│ 2 123456789012 eni-abc123 10.0.1.5 52.94.76.10 │
│ 34892 443 6 10 840 1620000000 1620000060 ACCEPT OK │
│ │
│ Interpretation: │
│ - Source: 10.0.1.5:34892 (internal instance) │
│ - Dest: 52.94.76.10:443 (external HTTPS) │
│ - Protocol 6 = TCP │
│ - 10 packets, 840 bytes │
│ - Action: ACCEPT (traffic allowed) │
└─────────────────────────────────────────────────────────────┘
CUSTOM FLOW LOG FORMAT:
┌─────────────────────────────────────────────────────────────┐
│ Include additional fields for security analysis: │
│ │
│ ${version} ${account-id} ${interface-id} ${srcaddr} │
│ ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} │
│ ${bytes} ${start} ${end} ${action} ${log-status} │
│ ${vpc-id} ${subnet-id} ${instance-id} ${tcp-flags} │
│ ${pkt-srcaddr} ${pkt-dstaddr} ${traffic-path} │
│ │
│ traffic-path shows routing through NAT, IGW, etc. │
└─────────────────────────────────────────────────────────────┘
SECURITY USE CASES:
┌─────────────────────────────────────────────────────────────┐
│ - Detect unauthorized network access attempts │
│ - Identify data exfiltration (large outbound transfers) │
│ - Find scanning activity (many REJECT entries) │
│ - Investigate lateral movement │
│ - Verify network segmentation effectiveness │
│ - Detect C2 communication patterns │
└─────────────────────────────────────────────────────────────┘
Key insight: Enable ALL log sources before you need them. During an incident, you can't go back and enable logging.
2) AWS Security Services
AWS provides native security monitoring services that analyze logs and detect threats:
Amazon GuardDuty:
WHAT GUARDDUTY DOES:
┌─────────────────────────────────────────────────────────────┐
│ Intelligent threat detection using: │
│ - CloudTrail logs (management and S3 data events) │
│ - VPC Flow Logs │
│ - DNS logs │
│ - EKS audit logs │
│ - RDS login activity │
│ - Lambda network activity │
│ - S3 data events │
│ - Runtime monitoring (EC2, ECS, EKS) │
│ │
│ Detection Methods: │
│ - Threat intelligence (known bad IPs, domains) │
│ - Anomaly detection (ML-based baseline) │
│ - Pattern matching (attack signatures) │
└─────────────────────────────────────────────────────────────┘
GUARDDUTY FINDING TYPES:
┌─────────────────────────────────────────────────────────────┐
│ EC2 Findings: │
│ - Backdoor:EC2/DenialOfService │
│ - CryptoCurrency:EC2/BitcoinTool │
│ - Trojan:EC2/BlackholeTraffic │
│ - UnauthorizedAccess:EC2/SSHBruteForce │
│ - Recon:EC2/PortProbeUnprotectedPort │
│ │
│ IAM Findings: │
│ - CredentialAccess:IAMUser/AnomalousBehavior │
│ - PenTest:IAMUser/KaliLinux │
│ - UnauthorizedAccess:IAMUser/ConsoleLogin │
│ - Persistence:IAMUser/AnomalousBehavior │
│ │
│ S3 Findings: │
│ - Exfiltration:S3/MaliciousIPCaller │
│ - Discovery:S3/MaliciousIPCaller │
│ - UnauthorizedAccess:S3/MaliciousIPCaller │
│ │
│ Kubernetes Findings: │
│ - PrivilegeEscalation:Kubernetes/PrivilegedContainer │
│ - Persistence:Kubernetes/ContainerWithSensitiveMount │
└─────────────────────────────────────────────────────────────┘
GUARDDUTY FINDING EXAMPLE:
{
"schemaVersion": "2.0",
"id": "123456789012-1234-abcd-1234",
"type": "UnauthorizedAccess:IAMUser/ConsoleLogin",
"severity": 5,
"title": "Console login from unusual location",
"description": "IAM user alice logged in from
an unusual geographic location",
"resource": {
"resourceType": "AccessKey",
"accessKeyDetails": {
"userName": "alice",
"userType": "IAMUser"
}
},
"service": {
"action": {
"actionType": "AWS_API_CALL",
"awsApiCallAction": {
"api": "ConsoleLogin",
"remoteIpDetails": {
"ipAddressV4": "192.0.2.1",
"country": {"countryName": "Russia"}
}
}
}
}
}
AWS Security Hub:
AWS Security Hub:
SECURITY HUB CAPABILITIES:
┌─────────────────────────────────────────────────────────────┐
│ 1. Aggregated Security Findings: │
│ - GuardDuty findings │
│ - Inspector vulnerability findings │
│ - IAM Access Analyzer findings │
│ - Firewall Manager findings │
│ - Macie findings │
│ - Third-party tool findings │
│ │
│ 2. Compliance Standards: │
│ - AWS Foundational Security Best Practices │
│ - CIS AWS Foundations Benchmark │
│ - PCI DSS │
│ - NIST 800-53 │
│ │
│ 3. Automated Response: │
│ - Custom actions │
│ - EventBridge integration │
│ - Automated remediation │
└─────────────────────────────────────────────────────────────┘
SECURITY HUB ARCHITECTURE:
┌─────────────────────────────────────────────────────────────┐
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │GuardDuty │ │Inspector │ │ Macie │ │3rd Party │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ └────────────┼────────────┼────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Security Hub │ │
│ │ │ │
│ │ - Normalize │ │
│ │ - Aggregate │ │
│ │ - Prioritize │ │
│ │ - Compliance │ │
│ └────────┬────────┘ │
│ │ │
│ ┌─────────┼─────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────┐ ┌──────────┐ │
│ │Dashboard │ │EventBr│ │ SIEM │ │
│ │ │ │idge │ │ │ │
│ └──────────┘ └──────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
AUTOMATED REMEDIATION EXAMPLE:
# EventBridge rule triggered by Security Hub finding
# Lambda function to auto-remediate
def lambda_handler(event, context):
finding = event['detail']['findings'][0]
# Check finding type
if finding['Type'] == 'Software and Configuration Checks/\
AWS Security Best Practices/\
S3 Bucket Public Access':
bucket_name = finding['Resources'][0]['Id'].split(':')[-1]
# Block public access
s3 = boto3.client('s3')
s3.put_public_access_block(
Bucket=bucket_name,
PublicAccessBlockConfiguration={
'BlockPublicAcls': True,
'IgnorePublicAcls': True,
'BlockPublicPolicy': True,
'RestrictPublicBuckets': True
}
)
# Update finding status
securityhub = boto3.client('securityhub')
securityhub.batch_update_findings(
FindingIdentifiers=[{
'Id': finding['Id'],
'ProductArn': finding['ProductArn']
}],
Workflow={'Status': 'RESOLVED'}
)
Key insight: Security Hub normalizes findings into a common format (ASFF), enabling consistent processing regardless of source.
3) Log Analysis and SIEM Integration
Raw logs must be analyzed to identify threats and support investigations:
CloudWatch Logs Insights:
QUERY LANGUAGE:
┌─────────────────────────────────────────────────────────────┐
│ Basic Structure: │
│ fields @timestamp, @message │
│ | filter @message like /error/ │
│ | sort @timestamp desc │
│ | limit 100 │
└─────────────────────────────────────────────────────────────┘
SECURITY QUERIES:
# Failed console logins
fields @timestamp, userIdentity.userName, sourceIPAddress,
errorCode, errorMessage
| filter eventSource = 'signin.amazonaws.com'
| filter errorCode = 'ConsoleLoginFailure'
| stats count(*) as failedLogins by sourceIPAddress,
userIdentity.userName
| sort failedLogins desc
# Root account usage
fields @timestamp, eventName, sourceIPAddress, userAgent
| filter userIdentity.type = 'Root'
| filter eventName not like /ConsoleLogin/
# Security group changes
fields @timestamp, userIdentity.userName, eventName,
requestParameters.groupId
| filter eventSource = 'ec2.amazonaws.com'
| filter eventName in ['AuthorizeSecurityGroupIngress',
'AuthorizeSecurityGroupEgress',
'RevokeSecurityGroupIngress',
'RevokeSecurityGroupEgress']
# IAM policy changes
fields @timestamp, userIdentity.userName, eventName,
requestParameters.policyName
| filter eventSource = 'iam.amazonaws.com'
| filter eventName in ['CreatePolicy', 'DeletePolicy',
'AttachUserPolicy', 'DetachUserPolicy',
'PutUserPolicy', 'DeleteUserPolicy']
# S3 bucket access from unusual IPs
fields @timestamp, userIdentity.userName, sourceIPAddress,
requestParameters.bucketName
| filter eventSource = 's3.amazonaws.com'
| filter sourceIPAddress not like /^10\./
| filter sourceIPAddress not like /^192\.168\./
| stats count(*) by sourceIPAddress, requestParameters.bucketName
| sort count desc
SIEM Integration:
SIEM Integration Patterns:
LOG FORWARDING OPTIONS:
┌─────────────────────────────────────────────────────────────┐
│ │
│ CloudTrail / VPC Flow Logs / CloudWatch Logs │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ ┌───────┐ ┌──────────┐ ┌────────────┐ │
│ │ S3 │ │ Kinesis │ │CloudWatch │ │
│ │ │ │ Firehose │ │Subscription│ │
│ └───┬───┘ └────┬─────┘ └─────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ SIEM Platform │ │
│ │ (Splunk, Elastic, Sumo Logic, etc.) │ │
│ └─────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
KINESIS FIREHOSE TO SPLUNK:
┌─────────────────────────────────────────────────────────────┐
│ 1. Create Kinesis Firehose delivery stream │
│ 2. Configure Splunk HEC (HTTP Event Collector) endpoint │
│ 3. Subscribe CloudWatch Log Groups to Firehose │
│ 4. Configure Lambda transformation if needed │
│ │
│ Benefits: │
│ - Near real-time delivery │
│ - Buffering for efficiency │
│ - Automatic retry │
│ - Data transformation │
└─────────────────────────────────────────────────────────────┘
AMAZON SECURITY LAKE:
┌─────────────────────────────────────────────────────────────┐
│ Purpose-built security data lake: │
│ │
│ Features: │
│ - Automatic log collection from AWS sources │
│ - OCSF (Open Cybersecurity Schema Framework) normalization │
│ - S3-based storage (Parquet format) │
│ - Cross-account and cross-region │
│ - Partner integrations │
│ │
│ Sources: │
│ - CloudTrail │
│ - VPC Flow Logs │
│ - Route 53 DNS logs │
│ - Security Hub findings │
│ - S3 access logs │
│ - Lambda execution logs │
│ - EKS audit logs │
│ │
│ Subscribers: │
│ - Athena for querying │
│ - Third-party SIEM/analytics │
│ - Custom applications │
└─────────────────────────────────────────────────────────────┘
Threat Detection Queries:
Common Threat Detection Patterns:
CREDENTIAL COMPROMISE INDICATORS:
┌─────────────────────────────────────────────────────────────┐
│ 1. Console login from new location │
│ 2. API calls from new source IP │
│ 3. Access key used after long inactivity │
│ 4. Multiple failed authentication attempts │
│ 5. Successful login after multiple failures │
│ 6. API calls at unusual hours │
│ 7. Impossible travel (login from distant locations) │
└─────────────────────────────────────────────────────────────┘
DATA EXFILTRATION INDICATORS:
┌─────────────────────────────────────────────────────────────┐
│ 1. Large S3 downloads │
│ 2. S3 access from unusual IP ranges │
│ 3. New S3 bucket replication configured │
│ 4. Snapshots shared to external accounts │
│ 5. Large outbound data transfer (VPC Flow) │
│ 6. Database export to new location │
└─────────────────────────────────────────────────────────────┘
PERSISTENCE INDICATORS:
┌─────────────────────────────────────────────────────────────┐
│ 1. New IAM user or access key created │
│ 2. New IAM role with trust policy changes │
│ 3. Lambda function created/modified │
│ 4. EventBridge rule created │
│ 5. New EC2 instance in unusual region │
│ 6. SSM document created │
└─────────────────────────────────────────────────────────────┘
PRIVILEGE ESCALATION INDICATORS:
┌─────────────────────────────────────────────────────────────┐
│ 1. Policy attachment to user/role │
│ 2. AssumeRole to more privileged role │
│ 3. CreatePolicyVersion │
│ 4. PassRole to service │
│ 5. Instance profile changes │
│ 6. STS GetSessionToken/GetFederationToken │
└─────────────────────────────────────────────────────────────┘
EXAMPLE DETECTION RULE (Splunk):
index=cloudtrail eventName=ConsoleLogin
| stats count by userIdentity.userName, sourceIPAddress,
userIdentity.arn
| where count > 1
| join type=inner userIdentity.userName
[search index=cloudtrail eventName=ConsoleLogin earliest=-30d
| stats values(sourceIPAddress) as historical_ips
by userIdentity.userName]
| where NOT sourceIPAddress IN (historical_ips)
| table _time, userIdentity.userName, sourceIPAddress
Key insight: Detection is about baselines. Know what's normal so you can identify what's abnormal.
4) Alerting and Response
Effective alerting ensures security teams are notified of real threats without alert fatigue:
Alerting Strategy:
ALERT SEVERITY LEVELS:
┌─────────────────────────────────────────────────────────────┐
│ CRITICAL (Immediate Response): │
│ - Root account activity │
│ - Active data exfiltration │
│ - Confirmed compromise indicators │
│ - Critical GuardDuty findings (severity 7+) │
│ Response: Page on-call, immediate investigation │
│ │
│ HIGH (Same-Day Response): │
│ - Unauthorized access attempts │
│ - Security group exposing sensitive ports │
│ - IAM policy changes │
│ - High GuardDuty findings (severity 4-6.9) │
│ Response: Investigate within hours │
│ │
│ MEDIUM (Next-Day Response): │
│ - Compliance violations │
│ - Configuration drift │
│ - Medium GuardDuty findings (severity 1-3.9) │
│ Response: Triage and prioritize │
│ │
│ LOW (Weekly Review): │
│ - Informational findings │
│ - Best practice recommendations │
│ Response: Include in regular review │
└─────────────────────────────────────────────────────────────┘
AVOIDING ALERT FATIGUE:
┌─────────────────────────────────────────────────────────────┐
│ 1. Tune detection rules to reduce false positives │
│ 2. Correlate multiple signals before alerting │
│ 3. Suppress known benign activity │
│ 4. Use tiered alerting (not everything pages) │
│ 5. Automate low-value alerts │
│ 6. Regular review and tuning │
└─────────────────────────────────────────────────────────────┘
CloudWatch Alarms:
CloudWatch Alarms for Security:
METRIC FILTER + ALARM PATTERN:
# Step 1: Create metric filter on CloudTrail log group
aws logs put-metric-filter \
--log-group-name CloudTrail/logs \
--filter-name RootAccountUsage \
--filter-pattern '{ $.userIdentity.type = "Root" &&
$.eventType != "AwsServiceEvent" }' \
--metric-transformations \
metricName=RootAccountUsageCount,\
metricNamespace=SecurityMetrics,\
metricValue=1
# Step 2: Create alarm
aws cloudwatch put-metric-alarm \
--alarm-name RootAccountUsageAlarm \
--alarm-description "Alert on root account usage" \
--metric-name RootAccountUsageCount \
--namespace SecurityMetrics \
--statistic Sum \
--period 300 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:security-alerts
ESSENTIAL SECURITY ALARMS:
┌─────────────────────────────────────────────────────────────┐
│ Root Account Activity: │
│ Pattern: { $.userIdentity.type = "Root" } │
│ │
│ IAM Policy Changes: │
│ Pattern: { ($.eventName = CreatePolicy) || │
│ ($.eventName = DeletePolicy) || │
│ ($.eventName = AttachUserPolicy) || │
│ ($.eventName = DetachUserPolicy) } │
│ │
│ Console Login Failures: │
│ Pattern: { ($.eventName = ConsoleLogin) && │
│ ($.errorMessage = "Failed authentication") } │
│ │
│ Security Group Changes: │
│ Pattern: { ($.eventName = AuthorizeSecurityGroupIngress) || │
│ ($.eventName = AuthorizeSecurityGroupEgress) } │
│ │
│ CloudTrail Changes: │
│ Pattern: { ($.eventName = StopLogging) || │
│ ($.eventName = DeleteTrail) || │
│ ($.eventName = UpdateTrail) } │
│ │
│ Network Gateway Changes: │
│ Pattern: { ($.eventName = CreateInternetGateway) || │
│ ($.eventName = AttachInternetGateway) } │
└─────────────────────────────────────────────────────────────┘
EventBridge for Security Automation:
EventBridge Security Automation:
GUARDDUTY FINDING TO SNS:
{
"source": ["aws.guardduty"],
"detail-type": ["GuardDuty Finding"],
"detail": {
"severity": [
{"numeric": [">=", 7]}
]
}
}
GUARDDUTY FINDING TO LAMBDA (Auto-Response):
# Rule targets Lambda function
# Lambda isolates compromised instance
def lambda_handler(event, context):
finding = event['detail']
if finding['type'].startswith('UnauthorizedAccess:EC2'):
instance_id = finding['resource']['instanceDetails']['instanceId']
ec2 = boto3.client('ec2')
# Create isolation security group (no ingress/egress)
isolation_sg = ec2.create_security_group(
GroupName=f'isolation-{instance_id}',
Description='Isolation security group',
VpcId=finding['resource']['instanceDetails']['networkInterfaces'][0]['vpcId']
)
# Replace instance security groups with isolation SG
ec2.modify_instance_attribute(
InstanceId=instance_id,
Groups=[isolation_sg['GroupId']]
)
# Create snapshot for forensics
volumes = ec2.describe_volumes(
Filters=[{'Name': 'attachment.instance-id',
'Values': [instance_id]}]
)
for vol in volumes['Volumes']:
ec2.create_snapshot(
VolumeId=vol['VolumeId'],
Description=f'Forensic snapshot - GuardDuty finding'
)
# Notify security team
sns = boto3.client('sns')
sns.publish(
TopicArn='arn:aws:sns:...:security-incidents',
Message=f'Instance {instance_id} isolated due to GuardDuty finding',
Subject='CRITICAL: EC2 Instance Isolated'
)
SECURITY HUB TO SLACK:
# Lambda function posting to Slack webhook
def lambda_handler(event, context):
finding = event['detail']['findings'][0]
severity_colors = {
'CRITICAL': '#FF0000',
'HIGH': '#FF6600',
'MEDIUM': '#FFCC00',
'LOW': '#00FF00'
}
message = {
'attachments': [{
'color': severity_colors.get(finding['Severity']['Label'], '#808080'),
'title': finding['Title'],
'text': finding['Description'],
'fields': [
{'title': 'Severity', 'value': finding['Severity']['Label'], 'short': True},
{'title': 'Account', 'value': finding['AwsAccountId'], 'short': True},
{'title': 'Resource', 'value': finding['Resources'][0]['Id']}
]
}]
}
requests.post(SLACK_WEBHOOK_URL, json=message)
Key insight: Automate response to common scenarios. Human analysts should focus on novel threats, not routine responses.
5) Building a Security Operations Capability
Effective security monitoring requires people, process, and technology working together:
Security Operations Architecture:
CENTRALIZED SECURITY MONITORING:
┌─────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ SECURITY ACCOUNT │ │
│ │ │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │CloudTrail │ │ GuardDuty │ │Security │ │ │
│ │ │ (Org) │ │ (Org) │ │ Hub │ │ │
│ │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │
│ │ │ │ │ │ │
│ │ └──────────────┼──────────────┘ │ │
│ │ ▼ │ │
│ │ ┌─────────────────┐ │ │
│ │ │ Security Lake │ │ │
│ │ │ or S3 Bucket │ │ │
│ │ └────────┬────────┘ │ │
│ │ │ │ │
│ │ ┌────────┴────────┐ │ │
│ │ │ SIEM/Athena │ │ │
│ │ └────────┬────────┘ │ │
│ │ │ │ │
│ │ ┌────────┴────────┐ │ │
│ │ │ SOC Dashboard │ │ │
│ │ └─────────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ ▲ │
│ ┌──────────────┼──────────────┐ │
│ │ │ │ │
│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │
│ │ Prod │ │ Dev │ │ Staging │ │
│ │ Account │ │ Account │ │ Account │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
MULTI-ACCOUNT LOG COLLECTION:
┌─────────────────────────────────────────────────────────────┐
│ CloudTrail Organization Trail: │
│ - Single trail for all accounts │
│ - Logs to central S3 bucket in security account │
│ - KMS encryption with organization key │
│ │
│ GuardDuty Organization: │
│ - Delegated administrator in security account │
│ - Auto-enable for new accounts │
│ - Central findings aggregation │
│ │
│ Security Hub Organization: │
│ - Delegated administrator │
│ - Aggregation region │
│ - Cross-region finding aggregation │
│ │
│ Config Aggregator: │
│ - Central view of resource configuration │
│ - Compliance status across accounts │
└─────────────────────────────────────────────────────────────┘
Incident Investigation Workflow:
Investigation Process:
INITIAL TRIAGE:
┌─────────────────────────────────────────────────────────────┐
│ 1. Validate the alert (true positive?) │
│ 2. Assess severity and scope │
│ 3. Identify affected resources │
│ 4. Determine timeline │
│ 5. Preserve evidence │
└─────────────────────────────────────────────────────────────┘
INVESTIGATION QUERIES:
# What did this principal do?
fields @timestamp, eventName, eventSource, sourceIPAddress,
requestParameters, responseElements
| filter userIdentity.arn = 'arn:aws:iam::123456789012:user/suspect'
| sort @timestamp asc
# What else happened from this IP?
fields @timestamp, userIdentity.arn, eventName, eventSource
| filter sourceIPAddress = '192.0.2.1'
| sort @timestamp asc
# What resources were accessed?
fields @timestamp, eventName, requestParameters
| filter userIdentity.arn = 'arn:aws:iam::123456789012:user/suspect'
| filter eventSource in ['s3.amazonaws.com', 'dynamodb.amazonaws.com',
'secretsmanager.amazonaws.com']
# Were credentials created for persistence?
fields @timestamp, eventName, requestParameters, responseElements
| filter userIdentity.arn = 'arn:aws:iam::123456789012:user/suspect'
| filter eventName in ['CreateAccessKey', 'CreateUser',
'CreateRole', 'CreateLoginProfile']
# Network connections from instance
fields @timestamp, srcaddr, dstaddr, dstport, bytes, action
| filter interfaceId = 'eni-abc123'
| filter action = 'ACCEPT'
| filter dstport not in [80, 443] # unusual ports
| sort bytes desc
EVIDENCE PRESERVATION:
┌─────────────────────────────────────────────────────────────┐
│ 1. Enable object lock on relevant S3 logs │
│ 2. Create forensic snapshots of affected volumes │
│ 3. Export CloudWatch logs to S3 │
│ 4. Capture instance metadata │
│ 5. Document timeline │
│ │
│ # Create forensic snapshot │
│ aws ec2 create-snapshot \ │
│ --volume-id vol-xxx \ │
│ --description "Forensic-$(date +%Y%m%d)-incident-123" \ │
│ --tag-specifications 'ResourceType=snapshot, │
│ Tags=[{Key=Forensic,Value=true}]' │
└─────────────────────────────────────────────────────────────┘
Metrics and KPIs:
Security Monitoring Metrics:
OPERATIONAL METRICS:
┌─────────────────────────────────────────────────────────────┐
│ Mean Time to Detect (MTTD): │
│ - Time from compromise to detection │
│ - Target: < 24 hours for critical issues │
│ │
│ Mean Time to Respond (MTTR): │
│ - Time from detection to containment │
│ - Target: < 1 hour for critical issues │
│ │
│ Alert Volume: │
│ - Total alerts per day/week │
│ - Alerts by severity │
│ - Trend over time │
│ │
│ False Positive Rate: │
│ - Percentage of alerts that are false positives │
│ - Target: < 10% │
│ │
│ Alert Handling Time: │
│ - Time from alert to closure │
│ - Breakdown by severity │
└─────────────────────────────────────────────────────────────┘
COVERAGE METRICS:
┌─────────────────────────────────────────────────────────────┐
│ Log Collection Coverage: │
│ - % of accounts with CloudTrail enabled │
│ - % of VPCs with flow logs │
│ - % of resources with appropriate logging │
│ │
│ Detection Coverage: │
│ - MITRE ATT&CK techniques covered │
│ - Use cases implemented vs. planned │
│ │
│ Compliance Posture: │
│ - Security Hub score │
│ - % of controls passing │
│ - Trend over time │
└─────────────────────────────────────────────────────────────┘
Key insight: Security operations is a continuous process, not a one-time implementation. Regular review and improvement are essential.
Real-World Context
Case Study: CloudTrail Disabled Attack
Attackers who compromise AWS credentials often disable CloudTrail as their first action to cover their tracks. In one incident, an attacker used compromised credentials to disable CloudTrail within minutes of access. Because the organization had alerting on CloudTrail configuration changes, the security team was notified immediately. They were able to re-enable logging, contain the incident, and use the brief window of logs to identify the initial access vector. Without the alert, the attack might have continued undetected for weeks.
Case Study: Cryptomining Detection via Flow Logs
An organization noticed unusual EC2 costs. Investigation of VPC Flow Logs revealed multiple instances making sustained connections to known cryptocurrency mining pool IPs. The instances had been launched using an overprivileged IAM role that allowed ec2:RunInstances without resource restrictions. The detection led to improved IAM policies, GuardDuty activation for cryptocurrency detection, and egress filtering to block mining pool connections.
Security Monitoring Checklist:
Security Monitoring Checklist:
LOG COLLECTION:
□ CloudTrail enabled all regions, all accounts
□ CloudTrail data events for sensitive buckets
□ VPC Flow Logs for all VPCs
□ S3 access logging for sensitive buckets
□ Load balancer access logs
□ Lambda function logs
□ RDS/database logs
□ Application logs to CloudWatch
THREAT DETECTION:
□ GuardDuty enabled all accounts
□ GuardDuty EKS/S3/RDS protection enabled
□ Security Hub enabled with standards
□ Config rules for compliance
□ Inspector for vulnerability scanning
□ Macie for data discovery
ALERTING:
□ Critical findings → Immediate notification
□ High severity → Same-day response queue
□ Root account activity alerts
□ CloudTrail modification alerts
□ Security group change alerts
□ IAM policy change alerts
LOG RETENTION:
□ Logs retained for compliance period
□ S3 lifecycle policies configured
□ Log integrity validation enabled
□ Logs encrypted with KMS
SIEM/ANALYSIS:
□ Logs forwarded to SIEM
□ Detection rules implemented
□ Dashboards for visibility
□ Regular rule tuning
RESPONSE:
□ Automated response for common scenarios
□ Runbooks documented
□ Escalation procedures defined
□ Regular response drills
Effective monitoring is about preparation. Build visibility before you need it, not during an incident.
Guided Lab: Security Monitoring Setup
In this lab, you'll configure comprehensive security monitoring with alerting and automated response.
Lab Environment:
- AWS account with CloudTrail, GuardDuty, Security Hub access
- AWS CLI or Console
- CloudWatch Logs Insights access
Exercise Steps:
- Configure CloudTrail with data events
- Enable VPC Flow Logs
- Enable GuardDuty
- Enable Security Hub with standards
- Create CloudWatch metric filters for security events
- Create CloudWatch alarms
- Configure EventBridge rule for GuardDuty findings
- Create Lambda function for automated response
- Test detection with simulated events
Reflection Questions:
- How long would it take to detect credential compromise?
- What events would trigger immediate alerts?
- How would you investigate a GuardDuty finding?
Week Outcome Check
By the end of this week, you should be able to:
- Configure CloudTrail with management and data events
- Enable and interpret VPC Flow Logs
- Use GuardDuty for threat detection
- Aggregate findings with Security Hub
- Write CloudWatch Logs Insights queries for security analysis
- Create CloudWatch alarms for security events
- Configure EventBridge rules for security automation
- Design security monitoring architecture for multi-account environments
🎯 Hands-On Labs (Free & Essential)
Build cloud security monitoring with CloudTrail, GuardDuty, and SIEM integration.
📊 AWS Skill Builder: CloudTrail & Security Logging
What you'll do: Configure CloudTrail, analyze API logs, and build detection
queries for suspicious activity.
Why it matters: CloudTrail is the foundation of AWS security
visibility—master it.
Time estimate: 2-3 hours
🛡️ TryHackMe: AWS GuardDuty & Detection
What you'll do: Enable GuardDuty, analyze threat findings, and create custom
detection rules.
Why it matters: GuardDuty provides ML-powered threat detection—learn to use
it effectively.
Time estimate: 2-3 hours
📈 Microsoft Learn: Azure Sentinel & Cloud SIEM
What you'll do: Configure Azure Sentinel for cloud security monitoring and
incident response.
Why it matters: Cloud-native SIEM skills are essential for multi-cloud
security operations.
Time estimate: 3-4 hours
💡 Lab Strategy: Enable CloudTrail in ALL regions and S3 data events—attackers will target your monitoring blind spots.
Resources
Lab
Complete the following lab exercises to practice cloud security monitoring.