Opening Framing: Don't Roll Your Own Crypto
One of the cardinal rules of security: never implement your own cryptography. The same applies to many security operations—hashing, encoding, parsing, and network protocols have subtle complexities that experts have already solved.
Python's ecosystem includes powerful libraries for security tasks. This week, you'll learn to use standard library modules for hashing and encoding, plus third-party libraries for common security operations.
Using established libraries means your code benefits from years of security review, bug fixes, and optimization. It also means other security professionals can understand your code because they know the same libraries.
Key insight: Professional security tools are built from libraries, not from scratch. Knowing which library to use—and how—is a core skill.
1) Hashing with hashlib
The hashlib module provides secure hash functions for
integrity verification, password storage concepts, and IOC generation:
import hashlib
# MD5 hash (for file identification, NOT security)
data = b"Hello, Security World!"
md5_hash = hashlib.md5(data).hexdigest()
print(f"MD5: {md5_hash}")
# SHA-256 (secure, recommended)
sha256_hash = hashlib.sha256(data).hexdigest()
print(f"SHA-256: {sha256_hash}")
# SHA-1 (legacy, avoid for new applications)
sha1_hash = hashlib.sha1(data).hexdigest()
print(f"SHA-1: {sha1_hash}")
Hashing Files (Critical for Malware Analysis):
import hashlib
def hash_file(filepath, algorithm="sha256"):
"""Calculate hash of a file."""
hash_obj = hashlib.new(algorithm)
with open(filepath, "rb") as f:
# Read in chunks for large files
for chunk in iter(lambda: f.read(8192), b""):
hash_obj.update(chunk)
return hash_obj.hexdigest()
# Calculate multiple hashes
def get_file_hashes(filepath):
"""Get MD5, SHA1, and SHA256 of a file."""
return {
"md5": hash_file(filepath, "md5"),
"sha1": hash_file(filepath, "sha1"),
"sha256": hash_file(filepath, "sha256")
}
# Usage
# hashes = get_file_hashes("suspicious_file.exe")
# print(hashes)
When to Use Which Hash:
- MD5: File identification, deduplication (NOT security)
- SHA-1: Legacy systems only, being phased out
- SHA-256: Recommended for all new applications
- SHA-512: When extra security margin needed
Key insight: MD5 is cryptographically broken but still useful for file identification. Use SHA-256 for anything security-critical.
2) Encoding and Decoding with base64
Base64 encoding converts binary data to ASCII text—essential for handling encoded payloads, analyzing obfuscated malware, and working with web data:
import base64
# Encode string to base64
original = "Malicious payload data"
encoded = base64.b64encode(original.encode()).decode()
print(f"Encoded: {encoded}")
# Decode base64 back to string
decoded = base64.b64decode(encoded).decode()
print(f"Decoded: {decoded}")
# Handle binary data
binary_data = b"\x00\x01\x02\xff\xfe"
encoded_binary = base64.b64encode(binary_data).decode()
print(f"Binary encoded: {encoded_binary}")
Security Application: Detecting Base64 Obfuscation
import base64
import re
def try_decode_base64(text):
"""Attempt to decode potential base64 strings."""
# Base64 pattern: letters, numbers, +, /, = padding
b64_pattern = r"[A-Za-z0-9+/]{20,}={0,2}"
findings = []
for match in re.findall(b64_pattern, text):
try:
decoded = base64.b64decode(match).decode("utf-8", errors="ignore")
# Check if decoded content is readable
if decoded.isprintable() and len(decoded) > 5:
findings.append({
"encoded": match[:50] + "..." if len(match) > 50 else match,
"decoded": decoded[:100]
})
except Exception:
continue
return findings
# Test with suspicious content
suspicious = """
Command: powershell -enc UG93ZXJTaGVsbCBpcyBhd2Vzb21l
Data: SGVsbG8gV29ybGQh
"""
results = try_decode_base64(suspicious)
for r in results:
print(f"Found: {r['decoded']}")
Other Encodings:
import base64
import binascii
# URL-safe base64 (for URLs and filenames)
data = b"data with special chars?"
url_safe = base64.urlsafe_b64encode(data).decode()
print(f"URL-safe: {url_safe}")
# Hex encoding
hex_encoded = binascii.hexlify(b"Hello").decode()
print(f"Hex: {hex_encoded}") # 48656c6c6f
# Hex decoding
original = binascii.unhexlify(hex_encoded)
print(f"Decoded: {original}")
Key insight: Base64 is encoding, NOT encryption. It provides no security—just format conversion. Attackers use it for obfuscation, not protection.
3) Working with URLs using urllib
The urllib module handles URL parsing, encoding, and
manipulation—essential for analyzing web-based threats:
from urllib.parse import urlparse, parse_qs, urlencode, quote, unquote
# Parse a URL into components
url = "https://evil.com:8080/malware/download?file=payload.exe&id=12345"
parsed = urlparse(url)
print(f"Scheme: {parsed.scheme}") # https
print(f"Host: {parsed.netloc}") # evil.com:8080
print(f"Path: {parsed.path}") # /malware/download
print(f"Query: {parsed.query}") # file=payload.exe&id=12345
# Parse query parameters
params = parse_qs(parsed.query)
print(f"Parameters: {params}")
# {'file': ['payload.exe'], 'id': ['12345']}
URL Encoding/Decoding (for Attack Analysis):
from urllib.parse import quote, unquote, unquote_plus
# URL encoding (percent encoding)
payload = "<script>alert('XSS')</script>"
encoded = quote(payload)
print(f"Encoded: {encoded}")
# %3Cscript%3Ealert%28%27XSS%27%29%3C%2Fscript%3E
# URL decoding
encoded_attack = "%3Cscript%3Ealert%28%27XSS%27%29%3C%2Fscript%3E"
decoded = unquote(encoded_attack)
print(f"Decoded: {decoded}")
# Double-encoded attacks (evasion technique)
double_encoded = "%253Cscript%253E" # %25 = %
first_decode = unquote(double_encoded) # %3Cscript%3E
second_decode = unquote(first_decode) # <script>
print(f"Double decoded: {second_decode}")
Extracting IOCs from URLs:
from urllib.parse import urlparse
def analyze_url(url):
"""Extract security-relevant information from URL."""
try:
parsed = urlparse(url)
analysis = {
"full_url": url,
"scheme": parsed.scheme,
"domain": parsed.netloc.split(":")[0],
"port": parsed.port or (443 if parsed.scheme == "https" else 80),
"path": parsed.path,
"has_query": bool(parsed.query),
"suspicious_indicators": []
}
# Check for suspicious patterns
if parsed.port and parsed.port not in [80, 443, 8080, 8443]:
analysis["suspicious_indicators"].append(f"Unusual port: {parsed.port}")
if "@" in parsed.netloc:
analysis["suspicious_indicators"].append("Contains @ (possible credential phishing)")
if parsed.path.endswith((".exe", ".dll", ".ps1", ".bat")):
analysis["suspicious_indicators"].append("Executable file extension in path")
return analysis
except Exception as e:
return {"error": str(e)}
# Test
result = analyze_url("http://evil.com:4444/update.exe?token=abc")
print(result)
Key insight: URL analysis reveals attack infrastructure. Parsing URLs programmatically helps identify phishing, malware delivery, and C2.
4) Date and Time with datetime
Security events have timestamps. The datetime module
handles parsing, formatting, and calculating time differences:
from datetime import datetime, timedelta
import time
# Current time
now = datetime.now()
print(f"Current: {now}")
# UTC time (preferred for logs)
utc_now = datetime.utcnow()
print(f"UTC: {utc_now}")
# Unix timestamp
timestamp = time.time()
print(f"Unix timestamp: {timestamp}")
# Convert timestamp to datetime
dt = datetime.fromtimestamp(timestamp)
print(f"From timestamp: {dt}")
Parsing Log Timestamps:
from datetime import datetime
# Common log formats
formats = {
"iso": "%Y-%m-%dT%H:%M:%S",
"apache": "%d/%b/%Y:%H:%M:%S",
"syslog": "%b %d %H:%M:%S",
"windows": "%m/%d/%Y %H:%M:%S"
}
# Parse different formats
iso_time = "2024-01-15T09:23:45"
dt = datetime.strptime(iso_time, formats["iso"])
print(f"Parsed ISO: {dt}")
apache_time = "15/Jan/2024:09:23:45"
dt = datetime.strptime(apache_time, formats["apache"])
print(f"Parsed Apache: {dt}")
# Format for output
formatted = dt.strftime("%Y-%m-%d %H:%M:%S")
print(f"Formatted: {formatted}")
Time-Based Analysis:
from datetime import datetime, timedelta
def analyze_event_timing(events):
"""Analyze timing patterns in security events."""
if len(events) < 2:
return None
timestamps = [datetime.fromisoformat(e["timestamp"]) for e in events]
timestamps.sort()
# Calculate intervals
intervals = []
for i in range(1, len(timestamps)):
delta = (timestamps[i] - timestamps[i-1]).total_seconds()
intervals.append(delta)
avg_interval = sum(intervals) / len(intervals)
min_interval = min(intervals)
analysis = {
"total_events": len(events),
"time_span": str(timestamps[-1] - timestamps[0]),
"avg_interval_seconds": round(avg_interval, 2),
"min_interval_seconds": round(min_interval, 2)
}
# Detect rapid-fire events (possible automation/attack)
if min_interval < 1:
analysis["alert"] = "Sub-second intervals detected - possible automated attack"
return analysis
Key insight: Time analysis reveals attack patterns. Rapid-fire events suggest automation; unusual hours suggest external attackers.
5) Command-Line Arguments with argparse
Professional tools accept command-line arguments. The argparse
module makes your scripts configurable and user-friendly:
import argparse
def main():
parser = argparse.ArgumentParser(
description="Security Log Analyzer"
)
# Required argument
parser.add_argument(
"logfile",
help="Path to log file to analyze"
)
# Optional arguments
parser.add_argument(
"-o", "--output",
default="report.txt",
help="Output file path (default: report.txt)"
)
parser.add_argument(
"-v", "--verbose",
action="store_true",
help="Enable verbose output"
)
parser.add_argument(
"--threshold",
type=int,
default=5,
help="Alert threshold (default: 5)"
)
args = parser.parse_args()
print(f"Analyzing: {args.logfile}")
print(f"Output to: {args.output}")
print(f"Verbose: {args.verbose}")
print(f"Threshold: {args.threshold}")
if __name__ == "__main__":
main()
Usage Examples:
# Basic usage
python analyzer.py access.log
# With options
python analyzer.py access.log -o results.txt -v --threshold 10
# Help
python analyzer.py --help
Security Tool Pattern:
import argparse
parser = argparse.ArgumentParser(description="IOC Scanner")
parser.add_argument("target", help="File or directory to scan")
parser.add_argument("-t", "--type", choices=["ip", "hash", "domain", "all"],
default="all", help="IOC type to extract")
parser.add_argument("-f", "--format", choices=["json", "csv", "txt"],
default="txt", help="Output format")
parser.add_argument("--defang", action="store_true",
help="Defang IOCs in output")
args = parser.parse_args()
Key insight: Command-line arguments transform scripts into flexible tools. Users can customize behavior without editing code.
Real-World Context: Libraries in Security Tools
Every major security tool leverages libraries:
Volatility (Memory Forensics): Uses hashlib for integrity checks, struct for binary parsing, and numerous specialized libraries for memory analysis.
Scapy (Network Analysis): Built on Python's socket library, uses struct for packet parsing, and integrates with cryptographic libraries for protocol analysis.
YARA-Python: Wraps the YARA library to provide Pythonic access to pattern matching—a library wrapping a library.
MITRE ATT&CK Reference: Technique T1140 (Deobfuscate/ Decode Files or Information) involves base64, XOR, and other encoding. Understanding these libraries helps you both decode attacker payloads and analyze malware behavior.
Key insight: Libraries are force multipliers. Learning to find, evaluate, and use the right library is as important as coding skills.
Guided Lab: File Integrity Checker
Let's build a tool that monitors files for changes using hashing—a simplified version of what OSSEC and Tripwire do.