CSY105 Week 10 - Week Content

Week Overview

This week introduces malware analysis techniques: hashing, strings extraction, PE parsing, YARA rules, and safe sandbox workflows.

Understand static vs dynamic analysis workflows
Extract hashes and metadata for triage
Parse PE headers to identify suspicious attributes
Write YARA rules for IOC detection
Submit hashes to VirusTotal safely

⚠️ Safety Warning: Never analyze real malware on production systems. Use isolated VMs with snapshots, no shared folders, and no internet unless strictly needed for API lookups.

Real-World Context: Malware triage is a daily task in SOCs and incident response teams. Analysts rely on fast static indicators before executing deeper dynamic analysis.

Section 1: Malware Analysis Workflow

Static vs Dynamic

Static analysis inspects files without executing them. Dynamic analysis runs samples in controlled sandboxes to observe behavior.

Safe Lab Setup Checklist

Isolated VM with snapshot capability
Host-only networking or simulated internet
No shared clipboard or folders
Logging enabled for file, process, and registry changes

Analysis Pipeline

1) Hash file (MD5, SHA256)
2) Extract strings and file metadata
3) Parse PE headers (imports, sections)
4) Identify suspicious IOCs
5) Run in sandbox (if required)
6) Document findings and write YARA rule

Common Analysis Artifacts

Hash list (MD5/SHA256)
IOC list (IPs, domains, URLs)
Behavior report from sandbox

Section 2: Hashing & Metadata

Hash Calculator

#!/usr/bin/env python3
"""
Calculate MD5, SHA1, and SHA256 hashes for a file.
"""
from __future__ import annotations

import hashlib
from pathlib import Path
from typing import Dict


def hash_file(path: str, chunk_size: int = 8192) -> Dict[str, str]:
    """
    Return hashes for a given file path.
    """
    hashes = {
        "md5": hashlib.md5(),
        "sha1": hashlib.sha1(),
        "sha256": hashlib.sha256(),
    }

    file_path = Path(path)
    if not file_path.exists():
        raise FileNotFoundError(f"File not found: {path}")

    with file_path.open("rb") as handle:
        while chunk := handle.read(chunk_size):
            for algo in hashes.values():
                algo.update(chunk)

    return {name: algo.hexdigest() for name, algo in hashes.items()}


if __name__ == "__main__":
    results = hash_file("sample.exe")
    for name, value in results.items():
        print(f"{name}: {value}")

File Metadata Inspector

#!/usr/bin/env python3
"""
Gather metadata like size, timestamps, and entropy.
"""
from __future__ import annotations

import math
from pathlib import Path
from typing import Dict


def file_entropy(data: bytes) -> float:
    """
    Calculate Shannon entropy for binary data.
    """
    if not data:
        return 0.0
    freq = [0] * 256
    for byte in data:
        freq[byte] += 1

    entropy = 0.0
    for count in freq:
        if count == 0:
            continue
        p = count / len(data)
        entropy -= p * math.log2(p)
    return entropy


def inspect_file(path: str) -> Dict[str, str]:
    """
    Return basic metadata and entropy.
    """
    file_path = Path(path)
    data = file_path.read_bytes()
    stats = file_path.stat()
    return {
        "size": str(stats.st_size),
        "created": str(stats.st_ctime),
        "modified": str(stats.st_mtime),
        "entropy": f"{file_entropy(data):.2f}",
    }


if __name__ == "__main__":
    print(inspect_file("sample.exe"))

Local Hash Database Lookup

#!/usr/bin/env python3
"""
Compare a hash against a local known-bad list.
"""
from __future__ import annotations

import csv
from typing import Dict


def load_hash_db(path: str) -> Dict[str, str]:
    """
    Load CSV of sha256 -> label.
    """
    entries = {}
    with open(path, "r", encoding="utf-8") as handle:
        reader = csv.DictReader(handle)
        for row in reader:
            entries[row["sha256"]] = row.get("label", "unknown")
    return entries


def check_hash(sha256: str, db: Dict[str, str]) -> str:
    """
    Return label if hash is known.
    """
    return db.get(sha256, "not_found")


if __name__ == "__main__":
    database = load_hash_db("hash_db.csv")
    print(check_hash("abc123", database))

Hash Algorithm Quick Reference

MD5: Fast, legacy; prone to collisions
SHA1: Legacy, still seen in old systems
SHA256: Modern standard for malware triage

Section 3: Strings & IOC Extraction

Strings Extractor

#!/usr/bin/env python3
"""
Extract printable strings from a binary.
"""
from __future__ import annotations

import re
from pathlib import Path
from typing import List


def extract_strings(path: str, min_len: int = 4) -> List[str]:
    """
    Return list of ASCII strings from binary file.
    """
    data = Path(path).read_bytes()
    pattern = re.compile(rb"[\x20-\x7E]{%d,}" % min_len)
    return [match.decode(errors="ignore") for match in pattern.findall(data)]


if __name__ == "__main__":
    for s in extract_strings("sample.exe")[:50]:
        print(s)

IOC Extractor (IPs, Domains, URLs)

#!/usr/bin/env python3
"""
Extract IPs, domains, and URLs from strings output.
"""
from __future__ import annotations

import re
from typing import Dict, List


IP_PATTERN = re.compile(r"\b(?:\d{1,3}\.){3}\d{1,3}\b")
DOMAIN_PATTERN = re.compile(r"\b[a-zA-Z0-9.-]+\.(com|net|org|ru|cn|info)\b")
URL_PATTERN = re.compile(r"https?://[^\s\"']+")


def extract_iocs(strings: List[str]) -> Dict[str, List[str]]:
    """
    Extract IOCs from list of strings.
    """
    ips = []
    domains = []
    urls = []
    for line in strings:
        ips.extend(IP_PATTERN.findall(line))
        domains.extend(DOMAIN_PATTERN.findall(line))
        urls.extend(URL_PATTERN.findall(line))

    return {
        "ips": sorted(set(ips)),
        "domains": sorted(set(domains)),
        "urls": sorted(set(urls)),
    }


if __name__ == "__main__":
    data = extract_strings("sample.exe")
    print(extract_iocs(data))

Keyword Filter for Triage

#!/usr/bin/env python3
"""
Flag suspicious keywords in extracted strings.
"""
from __future__ import annotations

from typing import List


KEYWORDS = [
    "powershell",
    "cmd.exe",
    "reg add",
    "schtasks",
    "bitcoin",
    "keylogger",
]


def find_keywords(strings: List[str]) -> List[str]:
    """
    Return matching keywords found in strings.
    """
    matches = set()
    for line in strings:
        lower = line.lower()
        for keyword in KEYWORDS:
            if keyword in lower:
                matches.add(keyword)
    return sorted(matches)


if __name__ == "__main__":
    hits = find_keywords(["powershell -enc", "normal text"])
    print(hits)

Sample Keyword Output

['powershell']

Section 4: PE File Parsing (Windows)

PE Metadata Extractor

#!/usr/bin/env python3
"""
Extract PE header information using pefile.
"""
from __future__ import annotations

from typing import Dict
import pefile


def parse_pe(path: str) -> Dict[str, str]:
    """
    Parse PE headers and return key metadata.
    """
    pe = pefile.PE(path)
    return {
        "entry_point": hex(pe.OPTIONAL_HEADER.AddressOfEntryPoint),
        "image_base": hex(pe.OPTIONAL_HEADER.ImageBase),
        "timestamp": str(pe.FILE_HEADER.TimeDateStamp),
        "sections": str(len(pe.sections)),
        "imports": ", ".join({entry.dll.decode(errors="ignore") for entry in pe.DIRECTORY_ENTRY_IMPORT}),
    }


if __name__ == "__main__":
    print(parse_pe("sample.exe"))

Suspicious Indicators in PE Files

High entropy sections (packed or encrypted)
Unusual section names (.xyz, .crypt)
Imports from networking or process injection APIs
Missing or invalid digital signatures

Section 5: PE Sections & Entropy Profiling

Section Entropy Analyzer

#!/usr/bin/env python3
"""
Calculate entropy for each PE section.
"""
from __future__ import annotations

import math
from typing import Dict, List

import pefile


def entropy(data: bytes) -> float:
    """
    Shannon entropy for a section.
    """
    if not data:
        return 0.0
    freq = [0] * 256
    for byte in data:
        freq[byte] += 1
    result = 0.0
    for count in freq:
        if count == 0:
            continue
        p = count / len(data)
        result -= p * math.log2(p)
    return result


def section_entropy(path: str) -> List[Dict[str, str]]:
    """
    Return section entropy list.
    """
    pe = pefile.PE(path)
    results = []
    for section in pe.sections:
        name = section.Name.decode(errors="ignore").strip("\x00")
        data = section.get_data()
        results.append({
            "name": name,
            "size": str(section.SizeOfRawData),
            "entropy": f"{entropy(data):.2f}",
        })
    return results


if __name__ == "__main__":
    for entry in section_entropy("sample.exe"):
        print(entry)

Typical Entropy Ranges

Low (0-4): Mostly empty or highly repetitive
Normal (4-7): Typical compiled code/data
High (7-8): Packed, encrypted, or compressed

Section 6: Import Table Analysis

Extract Suspicious API Imports

#!/usr/bin/env python3
"""
Identify risky Windows API imports from a PE file.
"""
from __future__ import annotations

from typing import Dict, List
import pefile


SUSPICIOUS_APIS = {
    "CreateRemoteThread",
    "VirtualAllocEx",
    "WriteProcessMemory",
    "WinExec",
    "ShellExecuteA",
    "URLDownloadToFileA",
    "InternetOpenA",
    "InternetConnectA",
}


def extract_imports(path: str) -> Dict[str, List[str]]:
    """
    Return a dict of DLLs to imported functions.
    """
    pe = pefile.PE(path)
    imports = {}
    for entry in getattr(pe, "DIRECTORY_ENTRY_IMPORT", []):
        dll_name = entry.dll.decode(errors="ignore")
        funcs = []
        for imp in entry.imports:
            if imp.name:
                funcs.append(imp.name.decode(errors="ignore"))
        imports[dll_name] = funcs
    return imports


def find_suspicious(imports: Dict[str, List[str]]) -> List[str]:
    """
    Return list of suspicious API imports.
    """
    hits = []
    for funcs in imports.values():
        for func in funcs:
            if func in SUSPICIOUS_APIS:
                hits.append(func)
    return sorted(set(hits))


if __name__ == "__main__":
    imports = extract_imports("sample.exe")
    print(find_suspicious(imports))

Section 7: Strings Context & IOC Scoring

Strings with Offset Context

#!/usr/bin/env python3
"""
Extract strings with byte offsets for context.
"""
from __future__ import annotations

import re
from pathlib import Path
from typing import List, Tuple


def strings_with_offsets(path: str, min_len: int = 4) -> List[Tuple[int, str]]:
    """
    Return list of (offset, string).
    """
    data = Path(path).read_bytes()
    pattern = re.compile(rb"[\\x20-\\x7E]{%d,}" % min_len)
    results = []
    for match in pattern.finditer(data):
        results.append((match.start(), match.group().decode(errors="ignore")))
    return results


if __name__ == "__main__":
    for offset, value in strings_with_offsets("sample.exe")[:20]:
        print(hex(offset), value)

IOC Scoring Heuristic

#!/usr/bin/env python3
"""
Score IOCs based on risky patterns.
"""
from __future__ import annotations

from typing import Dict, List


def score_iocs(iocs: Dict[str, List[str]]) -> Dict[str, int]:
    """
    Assign a basic severity score.
    """
    score = 0
    score += len(iocs.get("ips", [])) * 3
    score += len(iocs.get("domains", [])) * 2
    score += len(iocs.get("urls", [])) * 4
    return {"score": score}


if __name__ == "__main__":
    sample = {"ips": ["203.0.113.50"], "domains": ["evil.example"], "urls": ["http://evil.example/payload"]}
    print(score_iocs(sample))

Section 8: YARA Rule Engineering

Composite YARA Rule Example

rule Suspicious_Downloader
{
    meta:
        author = "CSY105"
        description = "Detects typical downloader behavior"
    strings:
        $s1 = "URLDownloadToFileA" nocase
        $s2 = "User-Agent" nocase
        $s3 = "powershell" nocase
        $hex = { 68 74 74 70 3A 2F 2F }  // "http://"
    condition:
        2 of ($s*) and $hex
}

YARA Rule Linting Checklist

Use distinct strings to reduce false positives
Include metadata (author, description, date)
Prefer hex patterns for obfuscated binaries
Test on known-good samples for false positives

Section 9: YARA Rules

Simple YARA Rule

rule Suspicious_Powershell
{
    meta:
        author = "CSY105"
        description = "Detects base64 PowerShell usage"
    strings:
        $ps = "powershell" nocase
        $enc = "-enc" nocase
    condition:
        $ps and $enc
}

YARA Scanner Script

#!/usr/bin/env python3
"""
Run YARA rules against a file.
"""
from __future__ import annotations

import yara
from typing import List


def scan_file(rule_path: str, target_path: str) -> List[str]:
    """
    Return list of matching rule names.
    """
    rules = yara.compile(filepath=rule_path)
    matches = rules.match(target_path)
    return [match.rule for match in matches]


if __name__ == "__main__":
    results = scan_file("rules.yar", "sample.exe")
    print(results)

Section 10: Sandbox & Dynamic Analysis (Safe)

Behavior Logging Stub

#!/usr/bin/env python3
"""
Simulate behavior logging for a sandboxed sample.
"""
from __future__ import annotations

import json
from datetime import datetime
from typing import Dict, List


def log_event(events: List[Dict[str, str]], event_type: str, detail: str) -> None:
    """
    Add a behavior event to the list.
    """
    events.append({
        "timestamp": datetime.utcnow().isoformat(),
        "event_type": event_type,
        "detail": detail,
    })


def simulate_sandbox_run() -> List[Dict[str, str]]:
    """
    Simulate malware behavior events for reporting.
    """
    events = []
    log_event(events, "process", "Spawned powershell.exe with encoded command")
    log_event(events, "file", "Created C:\\Users\\Public\\temp.bin")
    log_event(events, "network", "Outbound TCP to 203.0.113.50:443")
    log_event(events, "registry", "Modified HKCU\\Software\\Microsoft\\Run")
    return events


def export_report(events: List[Dict[str, str]], output_path: str) -> None:
    """
    Save behavior report to JSON.
    """
    with open(output_path, "w", encoding="utf-8") as handle:
        json.dump(events, handle, indent=2)


if __name__ == "__main__":
    report = simulate_sandbox_run()
    export_report(report, "behavior_report.json")

Section 11: VirusTotal Integration

Hash Lookup (No Upload)

#!/usr/bin/env python3
"""
Query VirusTotal for a hash without uploading the file.
"""
from __future__ import annotations

import os
import requests
from typing import Dict

VT_URL = "https://www.virustotal.com/api/v3/files/{}"


def vt_hash_lookup(file_hash: str, api_key: str) -> Dict[str, str]:
    """
    Query VirusTotal by hash.
    """
    headers = {"x-apikey": api_key}
    response = requests.get(VT_URL.format(file_hash), headers=headers, timeout=15)
    response.raise_for_status()
    data = response.json()
    stats = data.get("data", {}).get("attributes", {}).get("last_analysis_stats", {})
    return {k: str(v) for k, v in stats.items()}


if __name__ == "__main__":
    key = os.getenv("VT_API_KEY", "")
    if not key:
        raise SystemExit("Missing VT_API_KEY")
    print(vt_hash_lookup("44d88612fea8a8f36de82e1278abb02f", key))

Section 12: Reporting

Structured Report Generator

#!/usr/bin/env python3
"""
Generate a markdown report from analysis artifacts.
"""
from __future__ import annotations

import json
from pathlib import Path
from typing import Dict


def generate_report(hashes: Dict[str, str], meta: Dict[str, str], iocs: Dict[str, list], output_path: str) -> None:
    """
    Write a markdown summary report.
    """
    lines = ["# Malware Triage Report", "", "## Hashes"]
    for name, value in hashes.items():
        lines.append(f"- {name}: `{value}`")

    lines.extend(["", "## Metadata"])
    for key, value in meta.items():
        lines.append(f"- {key}: {value}")

    lines.extend(["", "## Indicators of Compromise (IOCs)"])
    lines.append(f"- IPs: {', '.join(iocs.get('ips', []))}")
    lines.append(f"- Domains: {', '.join(iocs.get('domains', []))}")
    lines.append(f"- URLs: {', '.join(iocs.get('urls', []))}")

    Path(output_path).write_text("\n".join(lines))


if __name__ == "__main__":
    hashes = {"sha256": "abc123"}
    meta = {"size": "120000", "entropy": "6.8"}
    iocs = {"ips": ["203.0.113.50"], "domains": ["evil.example"], "urls": []}
    generate_report(hashes, meta, iocs, "report.md")

Section 13: Registry & File System Diffing

Registry Snapshot Diff (Windows Export)

#!/usr/bin/env python3
"""
Diff two registry export files (reg.txt) to find changes.
"""
from __future__ import annotations

from pathlib import Path
from typing import List, Set


def load_lines(path: str) -> Set[str]:
    """
    Load registry export lines into a set.
    """
    return set(Path(path).read_text(encoding="utf-8").splitlines())


def diff_registry(before_path: str, after_path: str) -> List[str]:
    """
    Return lines that appear only in the after snapshot.
    """
    before = load_lines(before_path)
    after = load_lines(after_path)
    changes = sorted(after - before)
    return changes


if __name__ == "__main__":
    for line in diff_registry("reg_before.txt", "reg_after.txt"):
        print(line)

File System Snapshot Diff

#!/usr/bin/env python3
"""
Compare file listings before and after execution.
"""
from __future__ import annotations

from pathlib import Path
from typing import Dict


def snapshot(directory: str) -> Dict[str, int]:
    """
    Return dict of file path to size.
    """
    base = Path(directory)
    result = {}
    for path in base.rglob("*"):
        if path.is_file():
            result[str(path)] = path.stat().st_size
    return result


def diff_snapshots(before: Dict[str, int], after: Dict[str, int]) -> Dict[str, int]:
    """
    Return new files and size changes.
    """
    changes = {}
    for path, size in after.items():
        if path not in before or before[path] != size:
            changes[path] = size
    return changes


if __name__ == "__main__":
    before = snapshot("C:/Sandbox/before")
    after = snapshot("C:/Sandbox/after")
    print(diff_snapshots(before, after))

Section 14: Process Tree Analysis

Parse Process Tree Logs

#!/usr/bin/env python3
"""
Parse process creation logs and build a parent-child tree.
"""
from __future__ import annotations

import json
from collections import defaultdict
from typing import Dict, List


def load_process_log(path: str) -> List[Dict[str, str]]:
    """
    Load JSON process logs.
    """
    with open(path, "r", encoding="utf-8") as handle:
        return json.load(handle)


def build_tree(events: List[Dict[str, str]]) -> Dict[str, List[str]]:
    """
    Build parent-to-children mapping.
    """
    tree = defaultdict(list)
    for event in events:
        parent = event.get("parent_process", "unknown")
        child = event.get("process", "unknown")
        tree[parent].append(child)
    return tree


def print_tree(tree: Dict[str, List[str]]) -> None:
    """
    Print tree summary.
    """
    for parent, children in tree.items():
        print(f"{parent}:")
        for child in children:
            print(f"  - {child}")


if __name__ == "__main__":
    events = load_process_log("process_log.json")
    tree = build_tree(events)
    print_tree(tree)

Section 15: Network Behavior Summary

Summarize Connections

#!/usr/bin/env python3
"""
Summarize network events from sandbox logs.
"""
from __future__ import annotations

import json
from collections import Counter
from typing import Dict, List


def load_network_events(path: str) -> List[Dict[str, str]]:
    """
    Load network logs from JSON.
    """
    with open(path, "r", encoding="utf-8") as handle:
        return json.load(handle)


def summarize(events: List[Dict[str, str]]) -> None:
    """
    Print top destinations and ports.
    """
    dests = Counter()
    ports = Counter()
    for event in events:
        dests[event.get("dest_ip", "unknown")] += 1
        ports[event.get("dest_port", "unknown")] += 1

    print("Top destinations:")
    for ip, count in dests.most_common(5):
        print(f"  - {ip}: {count}")

    print("Top ports:")
    for port, count in ports.most_common(5):
        print(f"  - {port}: {count}")


if __name__ == "__main__":
    events = load_network_events("network_log.json")
    summarize(events)

Section 16: Timeline Visualization

Plot Behavior Timeline

#!/usr/bin/env python3
"""
Plot a timeline of sandbox events using pandas.
"""
from __future__ import annotations

import json
import pandas as pd
import matplotlib.pyplot as plt


def plot_timeline(path: str) -> None:
    """
    Plot event counts over time.
    """
    with open(path, "r", encoding="utf-8") as handle:
        events = json.load(handle)

    df = pd.DataFrame(events)
    df["timestamp"] = pd.to_datetime(df["timestamp"], errors="coerce")
    df = df.dropna(subset=["timestamp"])
    df["minute"] = df["timestamp"].dt.floor("T")

    counts = df.groupby(["minute", "event_type"]).size().reset_index(name="count")

    for event_type in counts["event_type"].unique():
        subset = counts[counts["event_type"] == event_type]
        plt.plot(subset["minute"], subset["count"], label=event_type)

    plt.title("Behavior Timeline")
    plt.xlabel("Minute")
    plt.ylabel("Event Count")
    plt.legend()
    plt.tight_layout()
    plt.savefig("behavior_timeline.png")
    print("[*] Saved behavior_timeline.png")


if __name__ == "__main__":
    plot_timeline("behavior_report.json")

Section 17: Analyst Checklist

Static Analysis Checklist

Confirm file hashes and size
Check entropy for packing indicators
Review imports for suspicious APIs
Extract IOCs (IPs, domains, URLs)
Write preliminary YARA rule

Dynamic Analysis Checklist

Monitor process tree for unusual parents
Record file system and registry changes
Capture network connections and DNS queries
Generate a behavior timeline

Section 18: PE Export & Signature Checks

Export Table Parser

#!/usr/bin/env python3
"""
List exported functions from a PE file (if any).
"""
from __future__ import annotations

import pefile
from typing import List


def list_exports(path: str) -> List[str]:
    """
    Return exported symbols.
    """
    pe = pefile.PE(path)
    exports = []
    if hasattr(pe, "DIRECTORY_ENTRY_EXPORT"):
        for exp in pe.DIRECTORY_ENTRY_EXPORT.symbols:
            if exp.name:
                exports.append(exp.name.decode(errors="ignore"))
    return exports


if __name__ == "__main__":
    print(list_exports("sample.exe"))

Digital Signature Presence Check

#!/usr/bin/env python3
"""
Check whether a PE has an Authenticode signature.
"""
from __future__ import annotations

import pefile


def has_signature(path: str) -> bool:
    """
    Return True if IMAGE_DIRECTORY_ENTRY_SECURITY exists.
    """
    pe = pefile.PE(path)
    security_dir = pe.OPTIONAL_HEADER.DATA_DIRECTORY[4]
    return security_dir.VirtualAddress != 0


if __name__ == "__main__":
    print(has_signature("sample.exe"))

Section 19: Memory Artifact Triage

Simulated Memory String Extraction

#!/usr/bin/env python3
"""
Extract strings from a memory dump (lab only).
"""
from __future__ import annotations

import re
from pathlib import Path
from typing import List


def extract_memory_strings(path: str, min_len: int = 6) -> List[str]:
    """
    Extract printable strings from a memory dump.
    """
    data = Path(path).read_bytes()
    pattern = re.compile(rb"[\\x20-\\x7E]{%d,}" % min_len)
    return [m.decode(errors="ignore") for m in pattern.findall(data)]


if __name__ == "__main__":
    for item in extract_memory_strings("memory_dump.bin")[:25]:
        print(item)

Suspicious Keyword List

# Common malware keywords
cmd.exe
powershell
curl
http://
https://
bitcoin
keylogger

Section 20: YARA Test Harness

Batch Scan Directory

#!/usr/bin/env python3
"""
Scan a directory of samples with YARA rules.
"""
from __future__ import annotations

from pathlib import Path
from typing import Dict, List

import yara


def scan_directory(rule_path: str, sample_dir: str) -> Dict[str, List[str]]:
    """
    Return dict of file -> matching rules.
    """
    rules = yara.compile(filepath=rule_path)
    results = {}
    for path in Path(sample_dir).glob("*"):
        if not path.is_file():
            continue
        matches = rules.match(str(path))
        results[str(path)] = [match.rule for match in matches]
    return results


if __name__ == "__main__":
    findings = scan_directory("rules.yar", "./samples")
    for file_path, matches in findings.items():
        print(file_path, matches)

Rule Coverage Table

Rule	Matches	Notes
Suspicious_Powershell	3	Review for false positives
Suspicious_Downloader	1	High confidence

Section 21: Threat Intel Enrichment

Enrich IOCs with Reputation

#!/usr/bin/env python3
"""
Enrich extracted IOCs with AbuseIPDB reputation scores.
"""
from __future__ import annotations

import os
from typing import Dict, List

import requests

ABUSEIPDB_URL = "https://api.abuseipdb.com/api/v2/check"


def enrich_ips(ips: List[str], api_key: str) -> Dict[str, Dict[str, str]]:
    """
    Lookup IP reputation for each IP.
    """
    results = {}
    headers = {"Key": api_key, "Accept": "application/json"}
    for ip in ips:
        try:
            response = requests.get(ABUSEIPDB_URL, headers=headers, params={"ipAddress": ip}, timeout=15)
            response.raise_for_status()
            data = response.json().get("data", {})
            results[ip] = {
                "abuseConfidenceScore": str(data.get("abuseConfidenceScore", 0)),
                "countryCode": str(data.get("countryCode", "")),
                "domain": str(data.get("domain", "")),
            }
        except requests.RequestException as exc:
            results[ip] = {"error": str(exc)}
    return results


if __name__ == "__main__":
    key = os.getenv("ABUSEIPDB_KEY", "")
    if not key:
        raise SystemExit("Missing ABUSEIPDB_KEY")
    print(enrich_ips(["203.0.113.50"], key))

Section 22: Report Template

Analyst Report Outline

# Malware Analysis Report

## Summary
- Sample name:
- SHA256:
- Initial verdict:

## Static Analysis
- File size and entropy:
- PE sections:
- Suspicious imports:
- Strings and IOCs:

## Dynamic Analysis (if performed)
- Process tree:
- File and registry changes:
- Network activity:

## Risk Assessment
- Severity:
- Recommended actions:

Lab 10: Malware Analysis Toolkit (90-150 minutes)

Lab Safety: Only use inert samples or test files provided by your instructor. Snapshot your VM before any analysis.

Lab Part 1: Suspicious File Scanner (20-30 min)

Objective: Build a tool that computes hashes and flags high entropy files.

Requirements:

Calculate MD5, SHA1, SHA256
Compute file entropy
Flag files with entropy > 7.0

Success Criteria: Scanner outputs a JSON report with flags.

Hint: Entropy threshold

if entropy > 7.0:
    flags.append("packed_or_encrypted")

Lab Part 2: PE Metadata Extractor (20-25 min)

Objective: Extract PE metadata from Windows binaries.

Requirements:

List section names and sizes
List imported DLLs
Identify suspicious section names

Success Criteria: Output highlights anomalies.

Hint: Section analysis

for section in pe.sections:
    name = section.Name.decode(errors="ignore").strip("\x00")
    if name not in {".text", ".data", ".rdata"}:
        print(f"[!] Unusual section: {name}")

Lab Part 3: Strings & IOC Extraction (20-25 min)

Objective: Extract printable strings and IOCs.

Requirements:

Extract strings of length 4+
Find IPs, domains, URLs
Export IOCs to iocs.json

Success Criteria: IOC file contains deduplicated values.

Hint: Deduplication

iocs = {"ips": sorted(set(ips)), "domains": sorted(set(domains)), "urls": sorted(set(urls))}

Lab Part 4: VirusTotal Hash Lookup (15-20 min)

Objective: Query VirusTotal for hashes.

Requirements:

Use hash-only lookups (no upload)
Handle API errors gracefully
Save analysis stats to JSON

Success Criteria: Results show detection counts.

Hint: Error handling

try:
    response.raise_for_status()
except requests.RequestException as exc:
    return {"error": str(exc)}

Lab Part 5: YARA Rule Matching (15-20 min)

Objective: Write a YARA rule and scan a sample file.

Requirements:

Rule must match a known string in the sample
Scan output lists matching rule names
Document false positive risk

Success Criteria: YARA scanner flags the sample correctly.

Hint: YARA rule style

rule DemoRule
{
    strings:
        $a = "sample string"
    condition:
        $a
}

Stretch Challenges (Optional)

Build a triage score that combines entropy, imports, and IOC count
Generate a CSV summary of multiple samples in a folder
Create a timeline chart of sandbox events

Hint: Triage scoring formula

score = 0
score += 2 if entropy > 7.0 else 0
score += 1 * len(suspicious_imports)
score += 2 * len(iocs.get("urls", []))
severity = "high" if score > 6 else "medium"

🎯 Lab Complete! You have built a malware analysis toolkit with hashing, parsing, IOC extraction, and YARA detection.

📤 Deliverables:

hash_scanner.py - Hash and entropy scanner
pe_parser.py - PE metadata extractor
strings_ioc.py - Strings/IOC extractor
vt_lookup.py - VirusTotal hash lookup
yara_scan.py - YARA scanning tool
report.md - Analysis report

Additional Resources

Malware Analysis References

Safe Labs

REMnux (Malware Analysis VM)
FLARE VM (Windows malware analysis)
Practical Malware Analysis - Sikorski & Honig

Key Takeaways

✅ Hashes and metadata provide quick triage indicators
✅ Strings reveal embedded URLs, commands, and IOCs
✅ PE parsing highlights suspicious sections and imports
✅ YARA rules enable scalable detection
✅ Sandbox logs show behavioral indicators
⚠️ Always analyze malware in isolated, safe labs

Week 10 Quiz

Test your understanding of malware analysis basics.

Format: 10 multiple-choice questions. Passing score: 70%. Time: Untimed.

Take Quiz