Opening Framing: Working with Real Data
Until now, your scripts have worked with data defined in the code itself. But real security work involves files: log files from servers, exported alerts from SIEMs, threat intelligence feeds, configuration files, and reports you generate for stakeholders.
File operations connect your scripts to the real world. Reading files lets you process actual log data. Writing files lets you save results, generate reports, and export data for other tools. This is where scripts become practical security tools.
This week, you'll learn to read log files, parse CSV data, work with JSON (the lingua franca of APIs), and write professional reports—skills you'll use in every security role.
Key insight: Security data lives in files. Scripts that can't read and write files can't do real work. Master file operations and you can automate any data processing task.
1) Reading Text Files
The most common operation: reading a file line by line. This is how you process log files:
# Basic file reading
with open("auth.log", "r") as file:
content = file.read()
print(content)
# Read line by line (memory efficient for large files)
with open("auth.log", "r") as file:
for line in file:
print(line.strip()) # strip() removes trailing newline
# Read all lines into a list
with open("auth.log", "r") as file:
lines = file.readlines()
print(f"File has {len(lines)} lines")
The with Statement:
- Automatically closes the file when done (even if errors occur)
- Prevents resource leaks and file corruption
- Always use
withfor file operations
Security Log Processing:
# Process auth.log for failed logins
failed_logins = []
with open("auth.log", "r") as file:
for line in file:
if "Failed password" in line:
failed_logins.append(line.strip())
print(f"Found {len(failed_logins)} failed login attempts")
for entry in failed_logins[:5]: # Show first 5
print(f" {entry}")
Key insight: Process files line by line for large logs. Reading 10GB into memory crashes your script; iterating line by line processes any file size.
2) Writing Text Files
Writing files lets you save results, generate reports, and export data:
# Write mode ("w") - creates new or overwrites existing
with open("report.txt", "w") as file:
file.write("Security Analysis Report\n")
file.write("=" * 30 + "\n")
file.write("Generated by automated scan\n")
# Append mode ("a") - adds to existing file
with open("alerts.log", "a") as file:
file.write("2024-01-15 10:30:00 ALERT: Suspicious activity\n")
# Write multiple lines
findings = ["Finding 1: Open port 22", "Finding 2: Weak password", "Finding 3: Missing patches"]
with open("findings.txt", "w") as file:
for finding in findings:
file.write(finding + "\n")
# Or use writelines (doesn't add newlines automatically)
with open("findings.txt", "w") as file:
file.writelines([f + "\n" for f in findings])
Generating a Security Report:
# Generate formatted report
def generate_report(scan_results, output_file):
with open(output_file, "w") as file:
file.write("=" * 50 + "\n")
file.write("VULNERABILITY SCAN REPORT\n")
file.write("=" * 50 + "\n\n")
file.write(f"Total hosts scanned: {scan_results['host_count']}\n")
file.write(f"Vulnerabilities found: {scan_results['vuln_count']}\n\n")
file.write("FINDINGS:\n")
file.write("-" * 30 + "\n")
for finding in scan_results['findings']:
file.write(f" - {finding}\n")
file.write("\n" + "=" * 50 + "\n")
file.write("END OF REPORT\n")
# Use the function
results = {
"host_count": 50,
"vuln_count": 12,
"findings": ["CVE-2024-1234 on 10.0.0.5", "Weak SSH config on 10.0.0.10"]
}
generate_report(results, "scan_report.txt")
Key insight: "w" overwrites, "a" appends. Use
append for logs that accumulate over time; use write for reports you
regenerate.
3) Working with CSV Files
CSV (Comma-Separated Values) is common for exporting SIEM data, threat intel feeds, and tabular security data:
import csv
# Reading CSV
with open("alerts.csv", "r") as file:
reader = csv.reader(file)
header = next(reader) # Skip header row
for row in reader:
timestamp, severity, source_ip, message = row
print(f"{severity}: {message} from {source_ip}")
# Reading CSV as dictionaries (easier to work with)
with open("alerts.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
print(f"{row['severity']}: {row['message']}")
Writing CSV:
import csv
# Write CSV from list of lists
alerts = [
["2024-01-15 10:00", "HIGH", "10.0.0.5", "Brute force detected"],
["2024-01-15 10:05", "MEDIUM", "10.0.0.10", "Port scan detected"],
["2024-01-15 10:10", "LOW", "10.0.0.15", "Failed login"]
]
with open("output.csv", "w", newline="") as file:
writer = csv.writer(file)
writer.writerow(["timestamp", "severity", "source_ip", "message"]) # Header
writer.writerows(alerts)
# Write from dictionaries (cleaner)
alert_dicts = [
{"timestamp": "2024-01-15 10:00", "severity": "HIGH", "source": "10.0.0.5"},
{"timestamp": "2024-01-15 10:05", "severity": "MEDIUM", "source": "10.0.0.10"}
]
with open("output.csv", "w", newline="") as file:
fieldnames = ["timestamp", "severity", "source"]
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(alert_dicts)
Key insight: Use DictReader and DictWriter for
cleaner code—access columns by name instead of position.
4) JSON: The Universal Data Format
JSON (JavaScript Object Notation) is the standard for API responses, configuration files, and data exchange. Python's dictionaries map directly to JSON:
import json
# Reading JSON file
with open("config.json", "r") as file:
config = json.load(file)
print(config["api_key"])
print(config["settings"]["timeout"])
# Writing JSON file
threat_data = {
"iocs": [
{"type": "ip", "value": "203.0.113.50", "severity": "high"},
{"type": "hash", "value": "abc123...", "severity": "critical"}
],
"generated": "2024-01-15",
"source": "Internal scan"
}
with open("threats.json", "w") as file:
json.dump(threat_data, file, indent=2) # indent for readability
JSON and API Responses:
import json
# Simulated API response (string)
api_response = '''
{
"status": "success",
"data": {
"ip": "203.0.113.50",
"reputation": "malicious",
"tags": ["c2", "botnet"],
"confidence": 95
}
}
'''
# Parse JSON string
result = json.loads(api_response) # loads = load string
print(f"IP: {result['data']['ip']}")
print(f"Reputation: {result['data']['reputation']}")
print(f"Tags: {', '.join(result['data']['tags'])}")
# Convert back to string
json_string = json.dumps(result, indent=2) # dumps = dump string
Handling JSON Errors:
import json
try:
with open("data.json", "r") as file:
data = json.load(file)
except FileNotFoundError:
print("File not found")
data = {}
except json.JSONDecodeError as e:
print(f"Invalid JSON: {e}")
data = {}
Key insight: json.load() reads from file, json.loads()
parses a string. Same for dump() vs dumps().
5) File Paths and Error Handling
Robust scripts handle missing files, permission errors, and work across operating systems:
import os
from pathlib import Path
# Check if file exists before reading
if os.path.exists("auth.log"):
with open("auth.log", "r") as file:
content = file.read()
else:
print("File not found")
# Using pathlib (modern approach)
log_path = Path("logs/auth.log")
if log_path.exists():
content = log_path.read_text()
# Cross-platform path handling
from pathlib import Path
# Works on Windows and Linux
log_dir = Path("logs")
auth_log = log_dir / "auth.log" # Creates "logs/auth.log" or "logs\auth.log"
# Create directory if needed
log_dir.mkdir(exist_ok=True)
Comprehensive Error Handling:
def safe_read_file(filepath):
"""Safely read a file with proper error handling."""
try:
with open(filepath, "r") as file:
return file.read()
except FileNotFoundError:
print(f"ERROR: File not found: {filepath}")
return None
except PermissionError:
print(f"ERROR: Permission denied: {filepath}")
return None
except Exception as e:
print(f"ERROR: Unexpected error reading {filepath}: {e}")
return None
# Use safely
content = safe_read_file("/var/log/auth.log")
if content:
# Process content
pass
Working with Multiple Files:
from pathlib import Path
# Process all .log files in a directory
log_dir = Path("/var/log")
for log_file in log_dir.glob("*.log"):
print(f"Processing: {log_file.name}")
# Process each file...
# Recursive search
for log_file in log_dir.glob("**/*.log"): # ** means all subdirectories
print(log_file)
Key insight: Always handle file errors. Production scripts encounter missing files, permission issues, and corrupted data. Graceful error handling prevents crashes during incidents.
Real-World Context: Files in Security Operations
File operations are central to security workflows:
Log Analysis: Every security investigation starts with logs. Auth.log, syslog, Windows Event Logs (exported as EVTX or CSV), application logs—all are files your scripts can process. The first skill in DFIR is parsing log files efficiently.
Threat Intelligence: IOC feeds arrive as files—STIX/TAXII bundles (JSON), CSV exports from platforms, or plain text lists. Your scripts read these feeds, parse them, and integrate them into detection systems.
Report Generation: Security assessments produce reports. Automating report generation from scan results saves hours of manual work. Many tools output JSON that scripts transform into readable reports.
MITRE ATT&CK Reference: Technique T1005 (Data from Local System) describes how attackers collect files for exfiltration. Defenders use the same file operations to analyze what was accessed, monitor file integrity, and investigate breaches.
Key insight: The ability to read, process, and write files transforms you from a tool user into a tool builder. Every custom security workflow involves file operations.
Guided Lab: Log Parser and Reporter
Let's build a complete log analysis tool that reads a log file, analyzes it, and generates both CSV and JSON reports.