Opening Framing: Beyond Single Values
So far, you've worked with individual variables: one IP address, one port, one username. But security data comes in collections: lists of blocked IPs, tables of user permissions, mappings of ports to services, collections of IOCs from threat intelligence feeds.
Data structures let you organize related data together. A list holds an ordered collection you can iterate through. A dictionary maps keys to values for instant lookup. Together, they handle virtually any data organization challenge in security scripting.
This week marks a turning point: you'll move from processing single items to managing collections—the foundation of real security tools.
Key insight: The right data structure makes code simple; the wrong one makes it painful. Lists for sequences, dictionaries for lookups—choose based on how you'll use the data.
1) Lists: Ordered Collections
Lists store ordered sequences of items. Items can be any type and can be accessed by position (index):
# Creating lists
blocked_ips = ["192.168.1.50", "10.0.0.25", "172.16.0.100"]
open_ports = [22, 80, 443, 8080]
mixed_data = ["admin", 5, True, 3.14]
# Accessing by index (0-based)
print(blocked_ips[0]) # "192.168.1.50" (first)
print(blocked_ips[-1]) # "172.16.0.100" (last)
print(open_ports[1:3]) # [80, 443] (slice)
# Length
print(len(blocked_ips)) # 3
Modifying Lists:
# Add items
blocked_ips.append("203.0.113.50") # Add to end
blocked_ips.insert(0, "198.51.100.1") # Insert at position
# Remove items
blocked_ips.remove("10.0.0.25") # Remove by value
removed = blocked_ips.pop() # Remove and return last
del blocked_ips[0] # Remove by index
# Check membership
if "192.168.1.50" in blocked_ips:
print("IP is blocked")
List Operations:
# Combine lists
list1 = [1, 2, 3]
list2 = [4, 5, 6]
combined = list1 + list2 # [1, 2, 3, 4, 5, 6]
# Sort
ports = [443, 22, 80, 8080]
ports.sort() # In-place: [22, 80, 443, 8080]
sorted_ports = sorted(ports, reverse=True) # New list, descending
# Reverse
ports.reverse() # In-place reversal
Key insight: Lists maintain order and allow duplicates. Use lists when sequence matters (log entries, scan results) or when you need to iterate through items.
2) Dictionaries: Key-Value Mappings
Dictionaries store key-value pairs. Instead of accessing by position, you access by key—perfect for lookups:
# Creating dictionaries
port_services = {
22: "SSH",
80: "HTTP",
443: "HTTPS",
3389: "RDP"
}
user_info = {
"username": "admin",
"role": "administrator",
"failed_logins": 3,
"is_locked": False
}
# Accessing by key
print(port_services[22]) # "SSH"
print(user_info["username"]) # "admin"
# Safe access with .get() (no error if missing)
print(port_services.get(8080, "Unknown")) # "Unknown"
Modifying Dictionaries:
# Add or update
port_services[8080] = "HTTP-Alt" # Add new
port_services[22] = "Secure Shell" # Update existing
# Remove
del port_services[3389] # Remove by key
removed = port_services.pop(80) # Remove and return value
# Check if key exists
if 443 in port_services:
print("HTTPS mapping exists")
Iterating Dictionaries:
# Iterate keys
for port in port_services:
print(port)
# Iterate values
for service in port_services.values():
print(service)
# Iterate both (most common)
for port, service in port_services.items():
print(f"Port {port}: {service}")
Key insight: Dictionaries provide O(1) lookup—instant access regardless of size. Use dictionaries when you need to look up values by a unique key.
3) Security Data Patterns
Let's see how lists and dictionaries model real security data:
Pattern 1: Blocklist (List)
# Simple blocklist - order doesn't matter, just membership
ip_blocklist = [
"192.168.1.50",
"203.0.113.100",
"198.51.100.25"
]
def is_blocked(ip):
return ip in ip_blocklist
# Check incoming connection
incoming_ip = "203.0.113.100"
if is_blocked(incoming_ip):
print(f"DENIED: {incoming_ip} is blocklisted")
Pattern 2: Threat Intelligence (Dictionary)
# IOC database with metadata
ioc_database = {
"5d41402abc4b2a76b9719d911017c592": {
"type": "MD5",
"malware_family": "Emotet",
"severity": "HIGH",
"first_seen": "2024-01-15"
},
"192.168.1.50": {
"type": "IP",
"category": "C2 Server",
"severity": "CRITICAL",
"first_seen": "2024-01-10"
}
}
# Look up an IOC
hash_to_check = "5d41402abc4b2a76b9719d911017c592"
if hash_to_check in ioc_database:
info = ioc_database[hash_to_check]
print(f"MATCH: {info['malware_family']} ({info['severity']})")
Pattern 3: Event Counter (Dictionary)
# Count events by source
login_attempts = [
"192.168.1.50", "10.0.0.25", "192.168.1.50",
"192.168.1.50", "172.16.0.1", "10.0.0.25"
]
# Build counter dictionary
ip_counts = {}
for ip in login_attempts:
if ip in ip_counts:
ip_counts[ip] += 1
else:
ip_counts[ip] = 1
# Or use .get() for cleaner code
ip_counts = {}
for ip in login_attempts:
ip_counts[ip] = ip_counts.get(ip, 0) + 1
print(ip_counts)
# {'192.168.1.50': 3, '10.0.0.25': 2, '172.16.0.1': 1}
Key insight: The counter pattern (dictionary counting occurrences) is fundamental to security analytics—detecting anomalies, finding top talkers, identifying patterns.
4) Nested Structures
Real security data often requires nested structures—lists of dictionaries or dictionaries containing lists:
# List of dictionaries: Multiple events
security_events = [
{
"timestamp": "2024-01-15 09:23:45",
"event_type": "login_failure",
"source_ip": "203.0.113.50",
"username": "admin"
},
{
"timestamp": "2024-01-15 09:24:12",
"event_type": "login_failure",
"source_ip": "203.0.113.50",
"username": "root"
},
{
"timestamp": "2024-01-15 09:25:00",
"event_type": "login_success",
"source_ip": "192.168.1.10",
"username": "jsmith"
}
]
# Process events
for event in security_events:
if event["event_type"] == "login_failure":
print(f"Failed login: {event['username']} from {event['source_ip']}")
Dictionary with Lists:
# User permissions model
user_permissions = {
"admin": ["read", "write", "delete", "admin"],
"analyst": ["read", "write"],
"viewer": ["read"]
}
# Check permission
def has_permission(username, action):
if username not in user_permissions:
return False
return action in user_permissions[username]
print(has_permission("analyst", "read")) # True
print(has_permission("analyst", "delete")) # False
Complex Nesting: Firewall Rules
# Firewall rule structure
firewall_rules = {
"inbound": [
{"action": "allow", "port": 443, "source": "any"},
{"action": "allow", "port": 22, "source": "192.168.0.0/16"},
{"action": "deny", "port": 23, "source": "any"}
],
"outbound": [
{"action": "allow", "port": 443, "source": "any"},
{"action": "deny", "port": 25, "source": "any"}
]
}
# Process inbound rules
print("Inbound Rules:")
for rule in firewall_rules["inbound"]:
print(f" {rule['action'].upper()} port {rule['port']} from {rule['source']}")
Key insight: Most API responses, log formats (JSON), and configuration files use nested structures. Master navigation through nested data and you can parse anything.
5) List Comprehensions: Pythonic Processing
List comprehensions provide a concise way to create lists from existing data—extremely useful for filtering and transforming security data:
# Traditional loop approach
ports = [22, 80, 443, 8080, 3389]
privileged = []
for port in ports:
if port < 1024:
privileged.append(port)
# List comprehension (same result, one line)
privileged = [port for port in ports if port < 1024]
print(privileged) # [22, 80, 443]
Security Applications:
# Extract all failed login IPs
events = [
{"type": "login_fail", "ip": "10.0.0.1"},
{"type": "login_success", "ip": "10.0.0.2"},
{"type": "login_fail", "ip": "10.0.0.3"},
]
failed_ips = [e["ip"] for e in events if e["type"] == "login_fail"]
print(failed_ips) # ['10.0.0.1', '10.0.0.3']
# Transform data: uppercase all usernames
usernames = ["admin", "root", "guest"]
upper_names = [name.upper() for name in usernames]
print(upper_names) # ['ADMIN', 'ROOT', 'GUEST']
# Filter and transform: get lengths of long passwords
passwords = ["abc", "password123", "x", "SecureP@ssw0rd!"]
long_pwd_lengths = [len(p) for p in passwords if len(p) >= 8]
print(long_pwd_lengths) # [11, 15]
Dictionary Comprehensions:
# Create port:service mapping from lists
ports = [22, 80, 443]
services = ["SSH", "HTTP", "HTTPS"]
port_map = {port: service for port, service in zip(ports, services)}
print(port_map) # {22: 'SSH', 80: 'HTTP', 443: 'HTTPS'}
# Invert a dictionary
service_to_port = {v: k for k, v in port_map.items()}
print(service_to_port) # {'SSH': 22, 'HTTP': 80, 'HTTPS': 443}
Key insight: Comprehensions are "Pythonic"—they express intent clearly and run faster than equivalent loops. Security professionals who read Python encounter them constantly.
Real-World Context: Data Structures in Security Tools
Data structures are the backbone of security tools:
SIEM Event Storage: Security events are stored as lists of dictionaries—each event is a dictionary with timestamp, type, source, etc. Queries filter and aggregate these structures. When you search "source_ip=10.0.0.1", you're filtering a list of dictionaries.
Threat Intelligence Platforms: IOC databases are dictionaries mapping indicators to metadata. VirusTotal's API returns JSON (nested dictionaries) with scan results, vendor detections, and file metadata—all accessed by key.
Configuration Management: Security tool configs (Snort rules, YARA, Suricata) often parse into dictionaries. A Suricata rule becomes a dictionary with action, protocol, source, destination, and options as keys.
MITRE ATT&CK Reference: The ATT&CK framework itself is a data structure! Techniques map to tactics (dictionary), each technique has metadata (nested dictionary), and mitigations are lists. The STIX format represents this as nested JSON.
Key insight: JSON—the universal data exchange format—is just nested dictionaries and lists. Master Python data structures and you can work with any API, any log format, any configuration.
Guided Lab: Threat Intelligence Database
Let's build a simple threat intelligence database using dictionaries and implement lookup and reporting functions.
Step 1: Create the Script
Create threat_intel.py:
# Threat Intelligence Database
# Demonstrates lists, dictionaries, and nested structures
# IOC Database (dictionary of dictionaries)
threat_db = {
"5d41402abc4b2a76b9719d911017c592": {
"type": "hash",
"hash_type": "MD5",
"malware": "Emotet",
"severity": "HIGH",
"tags": ["banking", "trojan", "botnet"]
},
"203.0.113.50": {
"type": "ip",
"category": "C2",
"malware": "Cobalt Strike",
"severity": "CRITICAL",
"tags": ["apt", "c2", "beacon"]
},
"evil-domain.com": {
"type": "domain",
"category": "Phishing",
"malware": "Credential Harvester",
"severity": "MEDIUM",
"tags": ["phishing", "credentials"]
},
"a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2": {
"type": "hash",
"hash_type": "SHA256",
"malware": "Ransomware",
"severity": "CRITICAL",
"tags": ["ransomware", "encryption", "extortion"]
}
}
def lookup_ioc(indicator):
"""Look up an IOC in the threat database."""
if indicator in threat_db:
return threat_db[indicator]
return None
def search_by_tag(tag):
"""Find all IOCs with a specific tag."""
results = []
for ioc, info in threat_db.items():
if tag in info["tags"]:
results.append({"indicator": ioc, "info": info})
return results
def search_by_severity(severity):
"""Find all IOCs with specific severity."""
return [
{"indicator": ioc, "info": info}
for ioc, info in threat_db.items()
if info["severity"] == severity
]
def generate_report():
"""Generate threat intelligence summary."""
# Count by type
type_counts = {}
severity_counts = {}
all_tags = []
for ioc, info in threat_db.items():
# Count types
ioc_type = info["type"]
type_counts[ioc_type] = type_counts.get(ioc_type, 0) + 1
# Count severities
sev = info["severity"]
severity_counts[sev] = severity_counts.get(sev, 0) + 1
# Collect tags
all_tags.extend(info["tags"])
# Count tag frequency
tag_counts = {}
for tag in all_tags:
tag_counts[tag] = tag_counts.get(tag, 0) + 1
return {
"total_iocs": len(threat_db),
"by_type": type_counts,
"by_severity": severity_counts,
"top_tags": sorted(tag_counts.items(), key=lambda x: x[1], reverse=True)[:5]
}
# Main execution
if __name__ == "__main__":
print("=" * 50)
print("THREAT INTELLIGENCE DATABASE")
print("=" * 50)
# Test lookup
print("\n[1] IOC Lookup Test")
test_ioc = "203.0.113.50"
result = lookup_ioc(test_ioc)
if result:
print(f" Found: {test_ioc}")
print(f" Type: {result['type']}")
print(f" Malware: {result['malware']}")
print(f" Severity: {result['severity']}")
# Test tag search
print("\n[2] Search by Tag: 'c2'")
c2_results = search_by_tag("c2")
for item in c2_results:
print(f" {item['indicator']}: {item['info']['malware']}")
# Test severity search
print("\n[3] Search by Severity: 'CRITICAL'")
critical = search_by_severity("CRITICAL")
for item in critical:
print(f" {item['indicator']}: {item['info']['malware']}")
# Generate report
print("\n[4] Threat Intelligence Report")
report = generate_report()
print(f" Total IOCs: {report['total_iocs']}")
print(f" By Type: {report['by_type']}")
print(f" By Severity: {report['by_severity']}")
print(f" Top Tags: {report['top_tags']}")
print("\n" + "=" * 50)
Step 2: Run and Analyze
Run the script and observe how data structures enable complex queries.
Step 3: Reflection (mandatory)
- Why is a dictionary the right choice for the threat database?
- How does
search_by_severity()use list comprehension? - What data structure does
generate_report()return? - How would you add a new IOC to the database?
Week 6 Outcome Check
By the end of this week, you should be able to:
- Create and manipulate lists (add, remove, slice, sort)
- Create and manipulate dictionaries (add, remove, lookup)
- Choose the right data structure for the task
- Work with nested structures (lists of dicts, dicts of lists)
- Use list and dictionary comprehensions
- Model security data effectively
Next week: File Operations—where we read real log files and write reports, connecting our data structures to persistent storage.
🎯 Hands-On Labs (Free & Essential)
Practice lists and dictionaries before moving to reading resources.
🎮 TryHackMe: Python Basics (Data Structures)
What you'll do: Work with lists and dictionaries in short exercises.
Why it matters: Most security data is best modeled as collections.
Time estimate: 1-1.5 hours
📝 Lab Exercise: IOC Dictionary Builder
Task: Build a dictionary that maps IPs to severity and tags.
Deliverable: A script that prints lookups and counts by severity.
Why it matters: Lookups are a constant part of triage and detection.
Time estimate: 45-60 minutes
🏁 PicoCTF Practice: General Skills (Data Structures)
What you'll do: Solve beginner challenges that require list/dict usage.
Why it matters: Data structures keep your scripts efficient and readable.
Time estimate: 1-2 hours
💡 Lab Tip: Dictionaries are for fast lookup; lists are for ordered processing. Choose intentionally.
🛡️ Secure Coding: Safe Data Structures
Lists and dictionaries often encode policy: allowlists, deny rules, and detection mappings. Handle them defensively.
Data structure safety checklist:
- Normalize keys (case, whitespace) before lookup
- Use dict.get(key, default) for safe fallbacks
- Prefer sets for allowlists/denylists
- Avoid mutating a list while iterating
📚 Building on CSY101 Week-13: Model bypasses that exploit unnormalized keys.
Resources
Complete the required resources to build your foundation.
- Python Tutorial - Data Structures · 45-60 min · 50 XP · Resource ID: csy103_w6_r1 (Required)
- Real Python - Dictionaries in Python · 45-60 min · 50 XP · Resource ID: csy103_w6_r2 (Required)
- Automate the Boring Stuff - Chapter 4: Lists · 30-45 min · 25 XP · Resource ID: csy103_w6_r3 (Optional)
Lab: Security Event Aggregator
Goal: Build a script that aggregates security events and produces statistical summaries.
Linux/Windows Path (same for both)
- Create
event_aggregator.py - Create a list of 20+ security event dictionaries with fields:
timestamp, event_type, source_ip, destination_port, severity - Implement these functions:
count_by_type(events)- return dict of event_type countscount_by_source(events)- return dict of source_ip countsfilter_by_severity(events, severity)- return filtered listget_top_sources(events, n)- return top n source IPs
- Use at least one list comprehension
- Print a formatted summary report
Deliverable (submit):
- Your
event_aggregator.pyscript - Screenshot showing the summary report output
- One paragraph: How would this help a SOC analyst?
Checkpoint Questions
- What is the difference between a list and a dictionary?
- How do you access the third element of a list?
- How do you safely access a dictionary key that might not exist?
- What does
.items()return when iterating a dictionary? - Write a list comprehension to get all even numbers from 1-10.
- Why is the counter pattern useful in security analytics?
Weekly Reflection
Reflection Prompt (200-300 words):
This week you learned data structures—the organizing principles that make complex security data manageable. Lists and dictionaries are fundamental to every security tool and data format.
Reflect on these questions:
- Think of security data you've encountered (logs, alerts, configs). How would you model it with lists and dictionaries?
- Why is instant lookup (dictionary) important for security operations like IOC checking?
- How does the structure of data affect the questions you can easily answer about it?
- JSON is essentially nested dictionaries and lists. How does understanding Python data structures help you work with APIs?
A strong reflection will connect data structures to practical security data management challenges.
Verified Resources & Videos
- Python List Methods: Python Docs - More on Lists
- JSON and Python: Python Docs - JSON Module
- STIX Data Format: STIX 2.1 Introduction
Data structures are the foundation of data processing. With lists and dictionaries, you can model virtually any security data. Next week: reading and writing files to work with real data.