CSY103 Week 11 - Practice security libraries before moving to reading resources.

Opening Framing: Don't Roll Your Own Crypto

One of the cardinal rules of security: never implement your own cryptography. The same applies to many security operations—hashing, encoding, parsing, and network protocols have subtle complexities that experts have already solved.

Python's ecosystem includes powerful libraries for security tasks. This week, you'll learn to use standard library modules for hashing and encoding, plus third-party libraries for common security operations.

Using established libraries means your code benefits from years of security review, bug fixes, and optimization. It also means other security professionals can understand your code because they know the same libraries.

Key insight: Professional security tools are built from libraries, not from scratch. Knowing which library to use—and how—is a core skill.

1) Hashing with hashlib

The hashlib module provides secure hash functions for integrity verification, password storage concepts, and IOC generation:

import hashlib

# MD5 hash (for file identification, NOT security)
data = b"Hello, Security World!"
md5_hash = hashlib.md5(data).hexdigest()
print(f"MD5: {md5_hash}")

# SHA-256 (secure, recommended)
sha256_hash = hashlib.sha256(data).hexdigest()
print(f"SHA-256: {sha256_hash}")

# SHA-1 (legacy, avoid for new applications)
sha1_hash = hashlib.sha1(data).hexdigest()
print(f"SHA-1: {sha1_hash}")

Hashing Files (Critical for Malware Analysis):

import hashlib

def hash_file(filepath, algorithm="sha256"):
    """Calculate hash of a file."""
    hash_obj = hashlib.new(algorithm)
    
    with open(filepath, "rb") as f:
        # Read in chunks for large files
        for chunk in iter(lambda: f.read(8192), b""):
            hash_obj.update(chunk)
    
    return hash_obj.hexdigest()

# Calculate multiple hashes
def get_file_hashes(filepath):
    """Get MD5, SHA1, and SHA256 of a file."""
    return {
        "md5": hash_file(filepath, "md5"),
        "sha1": hash_file(filepath, "sha1"),
        "sha256": hash_file(filepath, "sha256")
    }

# Usage
# hashes = get_file_hashes("suspicious_file.exe")
# print(hashes)

When to Use Which Hash:

MD5: File identification, deduplication (NOT security)
SHA-1: Legacy systems only, being phased out
SHA-256: Recommended for all new applications
SHA-512: When extra security margin needed

Key insight: MD5 is cryptographically broken but still useful for file identification. Use SHA-256 for anything security-critical.

2) Encoding and Decoding with base64

Base64 encoding converts binary data to ASCII text—essential for handling encoded payloads, analyzing obfuscated malware, and working with web data:

import base64

# Encode string to base64
original = "Malicious payload data"
encoded = base64.b64encode(original.encode()).decode()
print(f"Encoded: {encoded}")

# Decode base64 back to string
decoded = base64.b64decode(encoded).decode()
print(f"Decoded: {decoded}")

# Handle binary data
binary_data = b"\x00\x01\x02\xff\xfe"
encoded_binary = base64.b64encode(binary_data).decode()
print(f"Binary encoded: {encoded_binary}")

Security Application: Detecting Base64 Obfuscation

import base64
import re

def try_decode_base64(text):
    """Attempt to decode potential base64 strings."""
    # Base64 pattern: letters, numbers, +, /, = padding
    b64_pattern = r"[A-Za-z0-9+/]{20,}={0,2}"
    
    findings = []
    for match in re.findall(b64_pattern, text):
        try:
            decoded = base64.b64decode(match).decode("utf-8", errors="ignore")
            # Check if decoded content is readable
            if decoded.isprintable() and len(decoded) > 5:
                findings.append({
                    "encoded": match[:50] + "..." if len(match) > 50 else match,
                    "decoded": decoded[:100]
                })
        except Exception:
            continue
    
    return findings

# Test with suspicious content
suspicious = """
Command: powershell -enc UG93ZXJTaGVsbCBpcyBhd2Vzb21l
Data: SGVsbG8gV29ybGQh
"""

results = try_decode_base64(suspicious)
for r in results:
    print(f"Found: {r['decoded']}")

Other Encodings:

import base64
import binascii

# URL-safe base64 (for URLs and filenames)
data = b"data with special chars?"
url_safe = base64.urlsafe_b64encode(data).decode()
print(f"URL-safe: {url_safe}")

# Hex encoding
hex_encoded = binascii.hexlify(b"Hello").decode()
print(f"Hex: {hex_encoded}")  # 48656c6c6f

# Hex decoding
original = binascii.unhexlify(hex_encoded)
print(f"Decoded: {original}")

Key insight: Base64 is encoding, NOT encryption. It provides no security—just format conversion. Attackers use it for obfuscation, not protection.

3) Working with URLs using urllib

The urllib module handles URL parsing, encoding, and manipulation—essential for analyzing web-based threats:

from urllib.parse import urlparse, parse_qs, urlencode, quote, unquote

# Parse a URL into components
url = "https://evil.com:8080/malware/download?file=payload.exe&id=12345"
parsed = urlparse(url)

print(f"Scheme: {parsed.scheme}")     # https
print(f"Host: {parsed.netloc}")       # evil.com:8080
print(f"Path: {parsed.path}")         # /malware/download
print(f"Query: {parsed.query}")       # file=payload.exe&id=12345

# Parse query parameters
params = parse_qs(parsed.query)
print(f"Parameters: {params}")
# {'file': ['payload.exe'], 'id': ['12345']}

URL Encoding/Decoding (for Attack Analysis):

from urllib.parse import quote, unquote, unquote_plus

# URL encoding (percent encoding)
payload = "<script>alert('XSS')</script>"
encoded = quote(payload)
print(f"Encoded: {encoded}")
# %3Cscript%3Ealert%28%27XSS%27%29%3C%2Fscript%3E

# URL decoding
encoded_attack = "%3Cscript%3Ealert%28%27XSS%27%29%3C%2Fscript%3E"
decoded = unquote(encoded_attack)
print(f"Decoded: {decoded}")

# Double-encoded attacks (evasion technique)
double_encoded = "%253Cscript%253E"  # %25 = %
first_decode = unquote(double_encoded)   # %3Cscript%3E
second_decode = unquote(first_decode)    # <script>
print(f"Double decoded: {second_decode}")

Extracting IOCs from URLs:

from urllib.parse import urlparse

def analyze_url(url):
    """Extract security-relevant information from URL."""
    try:
        parsed = urlparse(url)
        
        analysis = {
            "full_url": url,
            "scheme": parsed.scheme,
            "domain": parsed.netloc.split(":")[0],
            "port": parsed.port or (443 if parsed.scheme == "https" else 80),
            "path": parsed.path,
            "has_query": bool(parsed.query),
            "suspicious_indicators": []
        }
        
        # Check for suspicious patterns
        if parsed.port and parsed.port not in [80, 443, 8080, 8443]:
            analysis["suspicious_indicators"].append(f"Unusual port: {parsed.port}")
        
        if "@" in parsed.netloc:
            analysis["suspicious_indicators"].append("Contains @ (possible credential phishing)")
        
        if parsed.path.endswith((".exe", ".dll", ".ps1", ".bat")):
            analysis["suspicious_indicators"].append("Executable file extension in path")
        
        return analysis
    except Exception as e:
        return {"error": str(e)}

# Test
result = analyze_url("http://evil.com:4444/update.exe?token=abc")
print(result)

Key insight: URL analysis reveals attack infrastructure. Parsing URLs programmatically helps identify phishing, malware delivery, and C2.

4) Date and Time with datetime

Security events have timestamps. The datetime module handles parsing, formatting, and calculating time differences:

from datetime import datetime, timedelta
import time

# Current time
now = datetime.now()
print(f"Current: {now}")

# UTC time (preferred for logs)
utc_now = datetime.utcnow()
print(f"UTC: {utc_now}")

# Unix timestamp
timestamp = time.time()
print(f"Unix timestamp: {timestamp}")

# Convert timestamp to datetime
dt = datetime.fromtimestamp(timestamp)
print(f"From timestamp: {dt}")

Parsing Log Timestamps:

from datetime import datetime

# Common log formats
formats = {
    "iso": "%Y-%m-%dT%H:%M:%S",
    "apache": "%d/%b/%Y:%H:%M:%S",
    "syslog": "%b %d %H:%M:%S",
    "windows": "%m/%d/%Y %H:%M:%S"
}

# Parse different formats
iso_time = "2024-01-15T09:23:45"
dt = datetime.strptime(iso_time, formats["iso"])
print(f"Parsed ISO: {dt}")

apache_time = "15/Jan/2024:09:23:45"
dt = datetime.strptime(apache_time, formats["apache"])
print(f"Parsed Apache: {dt}")

# Format for output
formatted = dt.strftime("%Y-%m-%d %H:%M:%S")
print(f"Formatted: {formatted}")

Time-Based Analysis:

from datetime import datetime, timedelta

def analyze_event_timing(events):
    """Analyze timing patterns in security events."""
    if len(events) < 2:
        return None
    
    timestamps = [datetime.fromisoformat(e["timestamp"]) for e in events]
    timestamps.sort()
    
    # Calculate intervals
    intervals = []
    for i in range(1, len(timestamps)):
        delta = (timestamps[i] - timestamps[i-1]).total_seconds()
        intervals.append(delta)
    
    avg_interval = sum(intervals) / len(intervals)
    min_interval = min(intervals)
    
    analysis = {
        "total_events": len(events),
        "time_span": str(timestamps[-1] - timestamps[0]),
        "avg_interval_seconds": round(avg_interval, 2),
        "min_interval_seconds": round(min_interval, 2)
    }
    
    # Detect rapid-fire events (possible automation/attack)
    if min_interval < 1:
        analysis["alert"] = "Sub-second intervals detected - possible automated attack"
    
    return analysis

Key insight: Time analysis reveals attack patterns. Rapid-fire events suggest automation; unusual hours suggest external attackers.

5) Command-Line Arguments with argparse

Professional tools accept command-line arguments. The argparse module makes your scripts configurable and user-friendly:

import argparse

def main():
    parser = argparse.ArgumentParser(
        description="Security Log Analyzer"
    )
    
    # Required argument
    parser.add_argument(
        "logfile",
        help="Path to log file to analyze"
    )
    
    # Optional arguments
    parser.add_argument(
        "-o", "--output",
        default="report.txt",
        help="Output file path (default: report.txt)"
    )
    
    parser.add_argument(
        "-v", "--verbose",
        action="store_true",
        help="Enable verbose output"
    )
    
    parser.add_argument(
        "--threshold",
        type=int,
        default=5,
        help="Alert threshold (default: 5)"
    )
    
    args = parser.parse_args()
    
    print(f"Analyzing: {args.logfile}")
    print(f"Output to: {args.output}")
    print(f"Verbose: {args.verbose}")
    print(f"Threshold: {args.threshold}")

if __name__ == "__main__":
    main()

Usage Examples:

# Basic usage
python analyzer.py access.log

# With options
python analyzer.py access.log -o results.txt -v --threshold 10

# Help
python analyzer.py --help

Security Tool Pattern:

import argparse

parser = argparse.ArgumentParser(description="IOC Scanner")

parser.add_argument("target", help="File or directory to scan")
parser.add_argument("-t", "--type", choices=["ip", "hash", "domain", "all"],
                    default="all", help="IOC type to extract")
parser.add_argument("-f", "--format", choices=["json", "csv", "txt"],
                    default="txt", help="Output format")
parser.add_argument("--defang", action="store_true",
                    help="Defang IOCs in output")

args = parser.parse_args()

Key insight: Command-line arguments transform scripts into flexible tools. Users can customize behavior without editing code.

Real-World Context: Libraries in Security Tools

Every major security tool leverages libraries:

Volatility (Memory Forensics): Uses hashlib for integrity checks, struct for binary parsing, and numerous specialized libraries for memory analysis.

Scapy (Network Analysis): Built on Python's socket library, uses struct for packet parsing, and integrates with cryptographic libraries for protocol analysis.

YARA-Python: Wraps the YARA library to provide Pythonic access to pattern matching—a library wrapping a library.

MITRE ATT&CK Reference: Technique T1140 (Deobfuscate/ Decode Files or Information) involves base64, XOR, and other encoding. Understanding these libraries helps you both decode attacker payloads and analyze malware behavior.

Key insight: Libraries are force multipliers. Learning to find, evaluate, and use the right library is as important as coding skills.

Guided Lab: File Integrity Checker

Let's build a tool that monitors files for changes using hashing—a simplified version of what OSSEC and Tripwire do.

Step 1: Create the Integrity Checker

Create integrity_checker.py:

#!/usr/bin/env python3
"""
File Integrity Checker
Monitors files for unauthorized changes using cryptographic hashes
"""

import hashlib
import json
import os
import argparse
from datetime import datetime
from pathlib import Path


def calculate_hash(filepath, algorithm="sha256"):
    """Calculate hash of a file."""
    hash_obj = hashlib.new(algorithm)
    try:
        with open(filepath, "rb") as f:
            for chunk in iter(lambda: f.read(8192), b""):
                hash_obj.update(chunk)
        return hash_obj.hexdigest()
    except (IOError, OSError) as e:
        return None


def scan_directory(directory, extensions=None):
    """Scan directory and hash all files."""
    results = {}
    directory = Path(directory)
    
    for filepath in directory.rglob("*"):
        if filepath.is_file():
            if extensions and filepath.suffix.lower() not in extensions:
                continue
            
            file_hash = calculate_hash(filepath)
            if file_hash:
                results[str(filepath)] = {
                    "hash": file_hash,
                    "size": filepath.stat().st_size,
                    "modified": datetime.fromtimestamp(
                        filepath.stat().st_mtime
                    ).isoformat()
                }
    
    return results


def save_baseline(data, output_file="baseline.json"):
    """Save baseline hashes to file."""
    baseline = {
        "created": datetime.now().isoformat(),
        "files": data
    }
    with open(output_file, "w") as f:
        json.dump(baseline, f, indent=2)
    print(f"Baseline saved: {len(data)} files to {output_file}")


def check_integrity(baseline_file="baseline.json"):
    """Check current files against baseline."""
    try:
        with open(baseline_file, "r") as f:
            baseline = json.load(f)
    except FileNotFoundError:
        print(f"Error: Baseline file not found: {baseline_file}")
        return None
    
    results = {
        "checked_at": datetime.now().isoformat(),
        "baseline_from": baseline["created"],
        "modified": [],
        "deleted": [],
        "new": []
    }
    
    current_files = set()
    
    for filepath, info in baseline["files"].items():
        current_hash = calculate_hash(filepath)
        current_files.add(filepath)
        
        if current_hash is None:
            results["deleted"].append(filepath)
        elif current_hash != info["hash"]:
            results["modified"].append({
                "file": filepath,
                "old_hash": info["hash"][:16] + "...",
                "new_hash": current_hash[:16] + "..."
            })
    
    return results


def main():
    parser = argparse.ArgumentParser(
        description="File Integrity Checker"
    )
    
    subparsers = parser.add_subparsers(dest="command", help="Commands")
    
    # Baseline command
    baseline_parser = subparsers.add_parser("baseline", help="Create baseline")
    baseline_parser.add_argument("directory", help="Directory to scan")
    baseline_parser.add_argument("-o", "--output", default="baseline.json")
    
    # Check command
    check_parser = subparsers.add_parser("check", help="Check integrity")
    check_parser.add_argument("-b", "--baseline", default="baseline.json")
    
    args = parser.parse_args()
    
    if args.command == "baseline":
        print(f"Scanning {args.directory}...")
        files = scan_directory(args.directory)
        save_baseline(files, args.output)
    
    elif args.command == "check":
        print("Checking file integrity...")
        results = check_integrity(args.baseline)
        
        if results:
            if results["modified"]:
                print(f"\n[!] MODIFIED FILES ({len(results['modified'])}):")
                for f in results["modified"]:
                    print(f"    {f['file']}")
            
            if results["deleted"]:
                print(f"\n[!] DELETED FILES ({len(results['deleted'])}):")
                for f in results["deleted"]:
                    print(f"    {f}")
            
            if not results["modified"] and not results["deleted"]:
                print("\n[+] All files intact. No changes detected.")
    
    else:
        parser.print_help()


if __name__ == "__main__":
    main()

Step 2: Test the Tool

# Create baseline
python integrity_checker.py baseline ./test_folder

# Modify a file, then check
python integrity_checker.py check

Step 3: Reflection (mandatory)

Why use SHA-256 instead of MD5 for integrity checking?
How does argparse improve the tool's usability?
What would you add to make this production-ready?
How could an attacker try to evade this type of monitoring?

Week 11 Outcome Check

By the end of this week, you should be able to:

Calculate file hashes with hashlib
Encode and decode base64 data
Parse and analyze URLs with urllib
Work with timestamps using datetime
Build command-line tools with argparse
Choose appropriate libraries for security tasks

Next week: Capstone Project—where you'll combine everything into a complete security tool of your own design.

🎯 Hands-On Labs (Free & Essential)

Practice security libraries before moving to reading resources.

🎮 TryHackMe: Intro to Cryptography

What you'll do: Work with hashing, encoding, and crypto basics in guided exercises.
Why it matters: Security libraries are the safe way to handle sensitive operations.
Time estimate: 1.5-2 hours

Start TryHackMe Cryptography →

📝 Lab Exercise: Hash + Encode Toolkit

Task: Write a script that hashes files and base64-encodes strings safely.
Deliverable: CLI tool with `--hash` and `--encode` options.
Why it matters: Hashing and encoding are fundamental operations in security workflows.
Time estimate: 60-90 minutes

🏁 PicoCTF Practice: Cryptography (Beginner)

What you'll do: Solve beginner crypto challenges involving hashes and encodings.
Why it matters: Crypto basics reinforce library usage and data handling.
Time estimate: 1-2 hours

Start PicoCTF Cryptography →

💡 Lab Tip: Use library functions (hashlib, base64) instead of custom implementations.

🛡️ Secure Coding: Safe Crypto Usage

Cryptography is easy to misuse. Defensive code sticks to proven libraries and safe defaults.

Crypto safety checklist:
- Never implement your own crypto
- Use modern algorithms (SHA-256, bcrypt, argon2)
- Store keys securely and rotate regularly
- Separate encoding (base64) from encryption

📚 Building on CSY101 Week-14: Align crypto usage with documented standards and controls.

Resources

Complete the required resources to build your foundation.

Python Docs - hashlib · 20-30 min · 50 XP · Resource ID: csy103_w11_r1 (Required)
Python Docs - argparse Tutorial · 30-45 min · 50 XP · Resource ID: csy103_w11_r2 (Required)
Real Python - Unicode and Encodings · 45-60 min · 25 XP · Resource ID: csy103_w11_r3 (Optional)

Lab: Malware Sample Analyzer

Goal: Build a tool that performs basic static analysis on files, extracting security-relevant information.

Linux/Windows Path (same for both)

Create sample_analyzer.py
Implement these features:
- Calculate MD5, SHA1, SHA256 hashes
- Extract printable strings (like the strings command)
- Detect base64-encoded content and decode it
- Identify URLs, IPs, and email addresses in strings
- Generate a JSON report with all findings
Use argparse for command-line interface
Test on sample text files (create your own test samples)

Deliverable (submit):

Your sample_analyzer.py script
Sample files used for testing
Example JSON output from analysis
One paragraph: How would this help in malware triage?

Checkpoint Questions

Why should you never implement your own cryptography?
What is the difference between MD5 and SHA-256 in terms of security?
Is base64 encoding a form of encryption? Why or why not?
How do you parse query parameters from a URL in Python?
What is the purpose of argparse?
How do you handle different timestamp formats in log analysis?

Weekly Reflection

Reflection Prompt (200-300 words):

This week you learned to leverage Python's security-relevant libraries. These libraries encode decades of expertise into reusable components that make your tools more reliable and your code more readable.

Reflect on these questions:

How does using established libraries improve the security of your code?
Think of a security task you've done manually. Which libraries could help automate it?
What's the trade-off between using libraries vs. writing custom code?
How would you evaluate whether a third-party library is safe to use?

A strong reflection will consider both the benefits and responsibilities of depending on external code in security tools.

Verified Resources & Videos

Python Standard Library: Python Docs - Standard Library
Base64 Encoding: Python Docs - base64 Module
Security perspective (MITRE ATT&CK): MITRE ATT&CK — Deobfuscate/Decode Files (T1140)

Libraries are tools built by experts for everyone. Knowing which library to use—and when—separates script writers from tool builders. Next week: your capstone project brings everything together.