Opening Framing: Beyond Single Values
So far, you've worked with individual variables: one IP address, one port, one username. But security data comes in collections: lists of blocked IPs, tables of user permissions, mappings of ports to services, collections of IOCs from threat intelligence feeds.
Data structures let you organize related data together. A list holds an ordered collection you can iterate through. A dictionary maps keys to values for instant lookup. Together, they handle virtually any data organization challenge in security scripting.
This week marks a turning point: you'll move from processing single items to managing collections—the foundation of real security tools.
Key insight: The right data structure makes code simple; the wrong one makes it painful. Lists for sequences, dictionaries for lookups—choose based on how you'll use the data.
1) Lists: Ordered Collections
Lists store ordered sequences of items. Items can be any type and can be accessed by position (index):
# Creating lists
blocked_ips = ["192.168.1.50", "10.0.0.25", "172.16.0.100"]
open_ports = [22, 80, 443, 8080]
mixed_data = ["admin", 5, True, 3.14]
# Accessing by index (0-based)
print(blocked_ips[0]) # "192.168.1.50" (first)
print(blocked_ips[-1]) # "172.16.0.100" (last)
print(open_ports[1:3]) # [80, 443] (slice)
# Length
print(len(blocked_ips)) # 3
Modifying Lists:
# Add items
blocked_ips.append("203.0.113.50") # Add to end
blocked_ips.insert(0, "198.51.100.1") # Insert at position
# Remove items
blocked_ips.remove("10.0.0.25") # Remove by value
removed = blocked_ips.pop() # Remove and return last
del blocked_ips[0] # Remove by index
# Check membership
if "192.168.1.50" in blocked_ips:
print("IP is blocked")
List Operations:
# Combine lists
list1 = [1, 2, 3]
list2 = [4, 5, 6]
combined = list1 + list2 # [1, 2, 3, 4, 5, 6]
# Sort
ports = [443, 22, 80, 8080]
ports.sort() # In-place: [22, 80, 443, 8080]
sorted_ports = sorted(ports, reverse=True) # New list, descending
# Reverse
ports.reverse() # In-place reversal
Key insight: Lists maintain order and allow duplicates. Use lists when sequence matters (log entries, scan results) or when you need to iterate through items.
2) Dictionaries: Key-Value Mappings
Dictionaries store key-value pairs. Instead of accessing by position, you access by key—perfect for lookups:
# Creating dictionaries
port_services = {
22: "SSH",
80: "HTTP",
443: "HTTPS",
3389: "RDP"
}
user_info = {
"username": "admin",
"role": "administrator",
"failed_logins": 3,
"is_locked": False
}
# Accessing by key
print(port_services[22]) # "SSH"
print(user_info["username"]) # "admin"
# Safe access with .get() (no error if missing)
print(port_services.get(8080, "Unknown")) # "Unknown"
Modifying Dictionaries:
# Add or update
port_services[8080] = "HTTP-Alt" # Add new
port_services[22] = "Secure Shell" # Update existing
# Remove
del port_services[3389] # Remove by key
removed = port_services.pop(80) # Remove and return value
# Check if key exists
if 443 in port_services:
print("HTTPS mapping exists")
Iterating Dictionaries:
# Iterate keys
for port in port_services:
print(port)
# Iterate values
for service in port_services.values():
print(service)
# Iterate both (most common)
for port, service in port_services.items():
print(f"Port {port}: {service}")
Key insight: Dictionaries provide O(1) lookup—instant access regardless of size. Use dictionaries when you need to look up values by a unique key.
3) Security Data Patterns
Let's see how lists and dictionaries model real security data:
Pattern 1: Blocklist (List)
# Simple blocklist - order doesn't matter, just membership
ip_blocklist = [
"192.168.1.50",
"203.0.113.100",
"198.51.100.25"
]
def is_blocked(ip):
return ip in ip_blocklist
# Check incoming connection
incoming_ip = "203.0.113.100"
if is_blocked(incoming_ip):
print(f"DENIED: {incoming_ip} is blocklisted")
Pattern 2: Threat Intelligence (Dictionary)
# IOC database with metadata
ioc_database = {
"5d41402abc4b2a76b9719d911017c592": {
"type": "MD5",
"malware_family": "Emotet",
"severity": "HIGH",
"first_seen": "2024-01-15"
},
"192.168.1.50": {
"type": "IP",
"category": "C2 Server",
"severity": "CRITICAL",
"first_seen": "2024-01-10"
}
}
# Look up an IOC
hash_to_check = "5d41402abc4b2a76b9719d911017c592"
if hash_to_check in ioc_database:
info = ioc_database[hash_to_check]
print(f"MATCH: {info['malware_family']} ({info['severity']})")
Pattern 3: Event Counter (Dictionary)
# Count events by source
login_attempts = [
"192.168.1.50", "10.0.0.25", "192.168.1.50",
"192.168.1.50", "172.16.0.1", "10.0.0.25"
]
# Build counter dictionary
ip_counts = {}
for ip in login_attempts:
if ip in ip_counts:
ip_counts[ip] += 1
else:
ip_counts[ip] = 1
# Or use .get() for cleaner code
ip_counts = {}
for ip in login_attempts:
ip_counts[ip] = ip_counts.get(ip, 0) + 1
print(ip_counts)
# {'192.168.1.50': 3, '10.0.0.25': 2, '172.16.0.1': 1}
Key insight: The counter pattern (dictionary counting occurrences) is fundamental to security analytics—detecting anomalies, finding top talkers, identifying patterns.
4) Nested Structures
Real security data often requires nested structures—lists of dictionaries or dictionaries containing lists:
# List of dictionaries: Multiple events
security_events = [
{
"timestamp": "2024-01-15 09:23:45",
"event_type": "login_failure",
"source_ip": "203.0.113.50",
"username": "admin"
},
{
"timestamp": "2024-01-15 09:24:12",
"event_type": "login_failure",
"source_ip": "203.0.113.50",
"username": "root"
},
{
"timestamp": "2024-01-15 09:25:00",
"event_type": "login_success",
"source_ip": "192.168.1.10",
"username": "jsmith"
}
]
# Process events
for event in security_events:
if event["event_type"] == "login_failure":
print(f"Failed login: {event['username']} from {event['source_ip']}")
Dictionary with Lists:
# User permissions model
user_permissions = {
"admin": ["read", "write", "delete", "admin"],
"analyst": ["read", "write"],
"viewer": ["read"]
}
# Check permission
def has_permission(username, action):
if username not in user_permissions:
return False
return action in user_permissions[username]
print(has_permission("analyst", "read")) # True
print(has_permission("analyst", "delete")) # False
Complex Nesting: Firewall Rules
# Firewall rule structure
firewall_rules = {
"inbound": [
{"action": "allow", "port": 443, "source": "any"},
{"action": "allow", "port": 22, "source": "192.168.0.0/16"},
{"action": "deny", "port": 23, "source": "any"}
],
"outbound": [
{"action": "allow", "port": 443, "source": "any"},
{"action": "deny", "port": 25, "source": "any"}
]
}
# Process inbound rules
print("Inbound Rules:")
for rule in firewall_rules["inbound"]:
print(f" {rule['action'].upper()} port {rule['port']} from {rule['source']}")
Key insight: Most API responses, log formats (JSON), and configuration files use nested structures. Master navigation through nested data and you can parse anything.
5) List Comprehensions: Pythonic Processing
List comprehensions provide a concise way to create lists from existing data—extremely useful for filtering and transforming security data:
# Traditional loop approach
ports = [22, 80, 443, 8080, 3389]
privileged = []
for port in ports:
if port < 1024:
privileged.append(port)
# List comprehension (same result, one line)
privileged = [port for port in ports if port < 1024]
print(privileged) # [22, 80, 443]
Security Applications:
# Extract all failed login IPs
events = [
{"type": "login_fail", "ip": "10.0.0.1"},
{"type": "login_success", "ip": "10.0.0.2"},
{"type": "login_fail", "ip": "10.0.0.3"},
]
failed_ips = [e["ip"] for e in events if e["type"] == "login_fail"]
print(failed_ips) # ['10.0.0.1', '10.0.0.3']
# Transform data: uppercase all usernames
usernames = ["admin", "root", "guest"]
upper_names = [name.upper() for name in usernames]
print(upper_names) # ['ADMIN', 'ROOT', 'GUEST']
# Filter and transform: get lengths of long passwords
passwords = ["abc", "password123", "x", "SecureP@ssw0rd!"]
long_pwd_lengths = [len(p) for p in passwords if len(p) >= 8]
print(long_pwd_lengths) # [11, 15]
Dictionary Comprehensions:
# Create port:service mapping from lists
ports = [22, 80, 443]
services = ["SSH", "HTTP", "HTTPS"]
port_map = {port: service for port, service in zip(ports, services)}
print(port_map) # {22: 'SSH', 80: 'HTTP', 443: 'HTTPS'}
# Invert a dictionary
service_to_port = {v: k for k, v in port_map.items()}
print(service_to_port) # {'SSH': 22, 'HTTP': 80, 'HTTPS': 443}
Key insight: Comprehensions are "Pythonic"—they express intent clearly and run faster than equivalent loops. Security professionals who read Python encounter them constantly.
Real-World Context: Data Structures in Security Tools
Data structures are the backbone of security tools:
SIEM Event Storage: Security events are stored as lists of dictionaries—each event is a dictionary with timestamp, type, source, etc. Queries filter and aggregate these structures. When you search "source_ip=10.0.0.1", you're filtering a list of dictionaries.
Threat Intelligence Platforms: IOC databases are dictionaries mapping indicators to metadata. VirusTotal's API returns JSON (nested dictionaries) with scan results, vendor detections, and file metadata—all accessed by key.
Configuration Management: Security tool configs (Snort rules, YARA, Suricata) often parse into dictionaries. A Suricata rule becomes a dictionary with action, protocol, source, destination, and options as keys.
MITRE ATT&CK Reference: The ATT&CK framework itself is a data structure! Techniques map to tactics (dictionary), each technique has metadata (nested dictionary), and mitigations are lists. The STIX format represents this as nested JSON.
Key insight: JSON—the universal data exchange format—is just nested dictionaries and lists. Master Python data structures and you can work with any API, any log format, any configuration.
Guided Lab: Threat Intelligence Database
Let's build a simple threat intelligence database using dictionaries and implement lookup and reporting functions.