Opening Framing: Beyond Local Files
Last week, you learned to read and write local files. But security data doesn't just live on your machine—it's distributed across the internet: threat intelligence APIs, vulnerability databases, WHOIS lookups, IP reputation services, and countless web resources.
This week, you'll learn to fetch data from the web, interact with APIs, and process external data sources. These skills let your scripts tap into the collective knowledge of the security community—checking IOCs against threat feeds, enriching alerts with context, and automating lookups that would take hours manually.
Working with external data introduces new challenges: network failures, rate limits, authentication, and unpredictable response formats. You'll apply the defensive coding skills from Week 8 to handle these gracefully.
Key insight: The most powerful security scripts don't work in isolation—they connect to external intelligence sources that provide context and enrichment.
1) HTTP Requests with the Requests Library
The requests library is Python's standard for HTTP operations.
It handles the complexity of web communication with a simple interface:
import requests
# Basic GET request
response = requests.get("https://httpbin.org/get")
print(response.status_code) # 200
print(response.text) # Response body as string
# Check if request succeeded
if response.status_code == 200:
print("Success!")
elif response.status_code == 404:
print("Not found")
elif response.status_code == 403:
print("Forbidden - check authentication")
Common HTTP Status Codes:
200- OK (success)201- Created (POST success)400- Bad Request (client error)401- Unauthorized (auth required)403- Forbidden (access denied)404- Not Found429- Too Many Requests (rate limited)500- Internal Server Error
# Request with parameters
params = {"ip": "8.8.8.8", "format": "json"}
response = requests.get("https://api.example.com/lookup", params=params)
# URL becomes: https://api.example.com/lookup?ip=8.8.8.8&format=json
# Request with headers (common for APIs)
headers = {
"User-Agent": "SecurityScript/1.0",
"Accept": "application/json"
}
response = requests.get("https://api.example.com/data", headers=headers)
Key insight: HTTP is the foundation of web APIs. Understanding requests, responses, and status codes is essential for working with any external data source.
2) Working with JSON APIs
Most modern APIs return JSON. The requests library makes parsing easy:
import requests
# Get JSON data
response = requests.get("https://httpbin.org/json")
# Parse JSON response
if response.status_code == 200:
data = response.json() # Automatically parses JSON
print(data)
else:
print(f"Error: {response.status_code}")
Security API Example: IP Geolocation
import requests
def get_ip_info(ip_address):
"""Look up IP geolocation information."""
try:
# Free IP geolocation API (no key required)
url = f"http://ip-api.com/json/{ip_address}"
response = requests.get(url, timeout=10)
if response.status_code == 200:
data = response.json()
if data.get("status") == "success":
return {
"ip": ip_address,
"country": data.get("country"),
"city": data.get("city"),
"isp": data.get("isp"),
"org": data.get("org")
}
else:
return {"ip": ip_address, "error": data.get("message")}
else:
return {"ip": ip_address, "error": f"HTTP {response.status_code}"}
except requests.exceptions.Timeout:
return {"ip": ip_address, "error": "Request timed out"}
except requests.exceptions.RequestException as e:
return {"ip": ip_address, "error": str(e)}
# Test
result = get_ip_info("8.8.8.8")
print(result)
Handling API Authentication:
# API Key in header (most common)
headers = {"Authorization": "Bearer YOUR_API_KEY"}
response = requests.get("https://api.example.com/data", headers=headers)
# API Key as parameter
params = {"api_key": "YOUR_API_KEY", "query": "malware"}
response = requests.get("https://api.example.com/search", params=params)
# Basic authentication
response = requests.get(
"https://api.example.com/data",
auth=("username", "password")
)
Key insight: Always use response.json() for JSON APIs—it
handles parsing and raises clear errors if the response isn't valid JSON.
3) Error Handling for Network Operations
Network operations fail in ways file operations don't. Robust scripts handle all failure modes:
import requests
from requests.exceptions import (
RequestException,
ConnectionError,
Timeout,
HTTPError
)
def safe_api_call(url, timeout=10):
"""Make API call with comprehensive error handling."""
try:
response = requests.get(url, timeout=timeout)
response.raise_for_status() # Raises HTTPError for 4xx/5xx
return {"success": True, "data": response.json()}
except ConnectionError:
return {"success": False, "error": "Connection failed - check network"}
except Timeout:
return {"success": False, "error": f"Request timed out after {timeout}s"}
except HTTPError as e:
return {"success": False, "error": f"HTTP error: {e.response.status_code}"}
except ValueError: # JSON decode error
return {"success": False, "error": "Invalid JSON response"}
except RequestException as e:
return {"success": False, "error": f"Request failed: {e}"}
Implementing Retry Logic:
import time
import requests
def api_call_with_retry(url, max_retries=3, backoff=2):
"""Make API call with exponential backoff retry."""
for attempt in range(max_retries):
try:
response = requests.get(url, timeout=10)
# Handle rate limiting
if response.status_code == 429:
wait_time = int(response.headers.get("Retry-After", backoff ** attempt))
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
continue
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
if attempt < max_retries - 1:
wait_time = backoff ** attempt
print(f"Attempt {attempt + 1} failed. Retrying in {wait_time}s...")
time.sleep(wait_time)
else:
raise
return None
Key insight: Network operations need timeouts, retries, and graceful degradation. A script that hangs forever on a network call is worse than one that fails fast.
4) Fetching Web Content
Sometimes you need to fetch and parse web pages, not just APIs:
import requests
# Fetch a web page
response = requests.get("https://example.com")
html_content = response.text
# Check content type
content_type = response.headers.get("Content-Type", "")
print(f"Content-Type: {content_type}")
# Download a file
def download_file(url, local_path):
"""Download file from URL to local path."""
try:
response = requests.get(url, stream=True, timeout=30)
response.raise_for_status()
with open(local_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
return True
except Exception as e:
print(f"Download failed: {e}")
return False
# Download threat intel feed
download_file(
"https://example.com/ioc-feed.txt",
"downloaded_iocs.txt"
)
Parsing HTML (Basic):
# For simple extraction, string methods work
html = "<title>Security Alert</title>"
if "<title>" in html:
start = html.find("<title>") + 7
end = html.find("</title>")
title = html[start:end]
print(title) # "Security Alert"
# For complex HTML, use BeautifulSoup (install: pip install beautifulsoup4)
# from bs4 import BeautifulSoup
# soup = BeautifulSoup(html, "html.parser")
# title = soup.title.string
Security Consideration: URL Validation
from urllib.parse import urlparse
def is_safe_url(url):
"""Basic URL safety check."""
try:
parsed = urlparse(url)
# Must have scheme and netloc
if not parsed.scheme or not parsed.netloc:
return False
# Only allow http/https
if parsed.scheme not in ["http", "https"]:
return False
# Block localhost and private IPs (basic check)
dangerous = ["localhost", "127.0.0.1", "0.0.0.0"]
if parsed.netloc.split(":")[0] in dangerous:
return False
return True
except Exception:
return False
Key insight: Always validate URLs before fetching, especially if they come from user input. SSRF (Server-Side Request Forgery) attacks exploit scripts that fetch arbitrary URLs.
5) Building a Threat Intelligence Client
Let's combine everything into a practical threat intel lookup tool:
import requests
import time
class ThreatIntelClient:
"""Client for querying threat intelligence APIs."""
def __init__(self, api_key=None):
self.api_key = api_key
self.session = requests.Session()
self.session.headers.update({
"User-Agent": "ThreatIntelClient/1.0"
})
if api_key:
self.session.headers["Authorization"] = f"Bearer {api_key}"
def _make_request(self, url, params=None):
"""Make request with error handling."""
try:
response = self.session.get(url, params=params, timeout=15)
if response.status_code == 429:
return {"error": "Rate limited", "retry_after":
response.headers.get("Retry-After", 60)}
response.raise_for_status()
return {"success": True, "data": response.json()}
except requests.exceptions.RequestException as e:
return {"error": str(e)}
def lookup_ip(self, ip_address):
"""Look up IP reputation."""
# Using free ip-api.com for demo
url = f"http://ip-api.com/json/{ip_address}"
result = self._make_request(url)
if "error" in result:
return result
data = result["data"]
return {
"ip": ip_address,
"country": data.get("country"),
"isp": data.get("isp"),
"org": data.get("org"),
"query_success": data.get("status") == "success"
}
def lookup_domain(self, domain):
"""Look up domain information."""
# Placeholder - would use real API
return {
"domain": domain,
"note": "Implement with real threat intel API"
}
def bulk_lookup(self, iocs, delay=1):
"""Look up multiple IOCs with rate limiting."""
results = []
for i, ioc in enumerate(iocs):
print(f"Looking up {i+1}/{len(iocs)}: {ioc}")
# Detect IOC type and lookup
if self._is_ip(ioc):
result = self.lookup_ip(ioc)
else:
result = self.lookup_domain(ioc)
results.append(result)
# Rate limiting
if i < len(iocs) - 1:
time.sleep(delay)
return results
def _is_ip(self, value):
"""Check if value looks like an IP address."""
parts = value.split(".")
if len(parts) != 4:
return False
return all(p.isdigit() and 0 <= int(p) <= 255 for p in parts)
# Usage example
if __name__ == "__main__":
client = ThreatIntelClient()
# Single lookup
result = client.lookup_ip("8.8.8.8")
print(result)
# Bulk lookup
iocs = ["8.8.8.8", "1.1.1.1", "208.67.222.222"]
results = client.bulk_lookup(iocs)
for r in results:
print(r)
Key insight: Wrapping API interactions in a class provides a clean interface, centralizes error handling, and makes it easy to swap between different threat intel providers.
Real-World Context: External Data in Security Operations
External data enrichment is core to modern security operations:
SIEM Enrichment: When a SIEM generates an alert, analysts need context. Is this IP known-malicious? What country? What organization? Scripts that automatically enrich alerts with threat intel save hours of manual lookups during investigations.
Threat Intelligence Platforms: Services like VirusTotal, AlienVault OTX, and Shodan provide APIs for IOC lookups. Integrating these into your workflows means instant access to community threat intelligence.
Vulnerability Management: The NVD (National Vulnerability Database) provides APIs for CVE lookups. Scripts can automatically check if software versions have known vulnerabilities.
MITRE ATT&CK Reference: Technique T1102 (Web Service) describes how attackers use legitimate web services for C2. Understanding how to interact with web services helps you both detect this technique and build defensive tools.
Key insight: The security community shares intelligence through APIs. Scripts that tap into this collective knowledge are far more powerful than those working in isolation.
Guided Lab: Multi-Source IOC Enrichment
Let's build a tool that enriches IOCs using multiple data sources.
Step 1: Create the Enrichment Script
Create ioc_enrichment.py:
"""
IOC Enrichment Tool
Queries multiple sources to enrich indicators of compromise
"""
import requests
import json
import time
from datetime import datetime
def lookup_ip_geolocation(ip):
"""Get geolocation for IP address."""
try:
response = requests.get(
f"http://ip-api.com/json/{ip}",
timeout=10
)
if response.status_code == 200:
data = response.json()
if data.get("status") == "success":
return {
"source": "ip-api.com",
"country": data.get("country"),
"country_code": data.get("countryCode"),
"region": data.get("regionName"),
"city": data.get("city"),
"isp": data.get("isp"),
"org": data.get("org"),
"as": data.get("as")
}
return {"source": "ip-api.com", "error": "Lookup failed"}
except Exception as e:
return {"source": "ip-api.com", "error": str(e)}
def lookup_dns(domain):
"""Get DNS information for domain (simulated)."""
# In production, use dnspython or similar
return {
"source": "dns_lookup",
"note": "Implement with dnspython for real lookups",
"domain": domain
}
def check_abuse_ipdb(ip, api_key=None):
"""Check IP against AbuseIPDB (requires API key)."""
if not api_key:
return {
"source": "abuseipdb",
"note": "API key required - get free key at abuseipdb.com"
}
try:
headers = {
"Key": api_key,
"Accept": "application/json"
}
params = {
"ipAddress": ip,
"maxAgeInDays": 90
}
response = requests.get(
"https://api.abuseipdb.com/api/v2/check",
headers=headers,
params=params,
timeout=10
)
if response.status_code == 200:
data = response.json().get("data", {})
return {
"source": "abuseipdb",
"abuse_confidence": data.get("abuseConfidenceScore"),
"total_reports": data.get("totalReports"),
"is_whitelisted": data.get("isWhitelisted"),
"usage_type": data.get("usageType")
}
return {"source": "abuseipdb", "error": f"HTTP {response.status_code}"}
except Exception as e:
return {"source": "abuseipdb", "error": str(e)}
def enrich_ip(ip, api_keys=None):
"""Enrich an IP address with multiple sources."""
api_keys = api_keys or {}
enrichment = {
"indicator": ip,
"type": "ip",
"enriched_at": datetime.now().isoformat(),
"sources": []
}
# Geolocation lookup
print(f" [*] Looking up geolocation...")
geo_result = lookup_ip_geolocation(ip)
enrichment["sources"].append(geo_result)
time.sleep(0.5) # Rate limiting
# AbuseIPDB lookup
print(f" [*] Checking AbuseIPDB...")
abuse_result = check_abuse_ipdb(ip, api_keys.get("abuseipdb"))
enrichment["sources"].append(abuse_result)
# Summarize findings
enrichment["summary"] = generate_summary(enrichment)
return enrichment
def generate_summary(enrichment):
"""Generate human-readable summary of enrichment."""
summary = []
for source in enrichment["sources"]:
if source["source"] == "ip-api.com" and "country" in source:
summary.append(f"Location: {source.get('city', 'Unknown')}, {source.get('country', 'Unknown')}")
summary.append(f"ISP: {source.get('isp', 'Unknown')}")
if source["source"] == "abuseipdb" and "abuse_confidence" in source:
score = source["abuse_confidence"]
if score > 50:
summary.append(f"WARNING: High abuse score ({score}%)")
elif score > 0:
summary.append(f"Note: Some abuse reports ({score}%)")
else:
summary.append("No abuse reports found")
return summary
def enrich_batch(iocs, api_keys=None):
"""Enrich a batch of IOCs."""
results = []
for i, ioc in enumerate(iocs):
print(f"\n[{i+1}/{len(iocs)}] Enriching: {ioc}")
# Detect type and enrich
if is_ip(ioc):
result = enrich_ip(ioc, api_keys)
else:
result = {
"indicator": ioc,
"type": "unknown",
"note": "Only IP enrichment implemented in this demo"
}
results.append(result)
# Rate limiting between IOCs
if i < len(iocs) - 1:
time.sleep(1)
return results
def is_ip(value):
"""Check if value is an IP address."""
parts = value.split(".")
if len(parts) != 4:
return False
try:
return all(0 <= int(p) <= 255 for p in parts)
except ValueError:
return False
def print_report(results):
"""Print enrichment report."""
print("\n" + "=" * 60)
print("IOC ENRICHMENT REPORT")
print("=" * 60)
for result in results:
print(f"\nIndicator: {result['indicator']} ({result['type']})")
print("-" * 40)
if "summary" in result:
for line in result["summary"]:
print(f" {line}")
if "note" in result:
print(f" Note: {result['note']}")
print("\n" + "=" * 60)
# Main execution
if __name__ == "__main__":
# Test IOCs
test_iocs = [
"8.8.8.8", # Google DNS
"1.1.1.1", # Cloudflare DNS
"208.67.222.222" # OpenDNS
]
# API keys (set these if you have them)
api_keys = {
# "abuseipdb": "YOUR_API_KEY_HERE"
}
print("IOC Enrichment Tool")
print("=" * 40)
# Enrich IOCs
results = enrich_batch(test_iocs, api_keys)
# Print report
print_report(results)
# Save to JSON
with open("enrichment_results.json", "w") as f:
json.dump(results, f, indent=2)
print("\nResults saved to enrichment_results.json")
Step 2: Run and Test
Run python3 ioc_enrichment.py and observe the enrichment process.
Step 3: Reflection (mandatory)
- Why do we add delays between API calls?
- How does the script handle missing API keys?
- What would you add to make this production-ready?
- How could you extend this to support more IOC types?
Week 9 Outcome Check
By the end of this week, you should be able to:
- Make HTTP requests using the requests library
- Parse JSON API responses
- Handle network errors gracefully
- Implement retry logic and rate limiting
- Fetch and process web content
- Build API client classes for threat intelligence
Next week: Regular Expressions—where we learn to find patterns in security data with surgical precision.
🎯 Hands-On Labs (Free & Essential)
Practice API requests and external data handling before moving to reading resources.
🎮 TryHackMe: Python Basics (HTTP Requests)
What you'll do: Use Python to fetch and parse data from web endpoints.
Why it matters: Threat intel and enrichment workflows rely on external APIs.
Time estimate: 1-1.5 hours
📝 Lab Exercise: Simple IOC Enricher
Task: Query a public API (e.g., ip-api.com) and enrich a list of IPs.
Deliverable: JSON output with country/ISP metadata per IP.
Why it matters: Enrichment adds context that improves triage accuracy.
Time estimate: 60-90 minutes
🏁 PicoCTF Practice: General Skills (Web Requests)
What you'll do: Solve beginner challenges that involve pulling data from the web.
Why it matters: Many security scripts must handle unpredictable web responses.
Time estimate: 1-2 hours
🛡️ Lab: Password Hashing Basics
What you'll do: Use bcrypt or argon2 to hash and verify passwords.
Deliverable: Script that stores salted hashes and verifies login attempts.
Why it matters: Storing plaintext passwords leads to instant account compromise.
Time estimate: 60-90 minutes
💡 Lab Tip: Always set timeouts for HTTP requests to avoid hanging scripts.
🛡️ Secure Coding: Safe External Data
External data is untrusted by default. Defensive scripts validate inputs, verify responses, and limit what they accept.
External data safety checklist:
- Validate URLs and allowed domains
- Enforce timeouts and retries
- Verify TLS certificates (avoid verify=False)
- Parse responses defensively and check status codes
📚 Building on CSY101 Week-13: Threat model how attacker-controlled endpoints can abuse your client.
Resources
Complete the required resources to build your foundation.
- Requests Library - Quickstart Guide · 30-45 min · 50 XP · Resource ID: csy103_w9_r1 (Required)
- Real Python - Python Requests Library · 45-60 min · 50 XP · Resource ID: csy103_w9_r2 (Required)
- Python Docs - URL Parsing · 20-30 min · 25 XP · Resource ID: csy103_w9_r3 (Optional)
Lab: Threat Feed Aggregator
Goal: Build a tool that fetches IOCs from multiple public threat feeds and consolidates them.
Linux/Windows Path (same for both)
- Create
feed_aggregator.py - Implement functions to fetch from at least 2 public feeds:
- Example: Feodo Tracker C2 IPs
- Example: URLhaus malicious URLs
- Or other free, public threat feeds
- Parse each feed's format (may be CSV, JSON, or plain text)
- Consolidate IOCs into a unified format
- Deduplicate entries
- Export to JSON with source attribution
- Include proper error handling for network failures
Deliverable (submit):
- Your
feed_aggregator.pyscript - Sample output JSON file
- One paragraph: How would you schedule this to run automatically?
Checkpoint Questions
- What HTTP method is used for retrieving data from an API?
- What does status code 429 indicate and how should you handle it?
- Why is a timeout important for network requests?
- What is the difference between
response.textandresponse.json()? - Why should you validate URLs before fetching them?
- What is rate limiting and why do APIs enforce it?
Weekly Reflection
Reflection Prompt (200-300 words):
This week you learned to connect your scripts to external data sources. APIs and web resources extend your capabilities beyond local data, tapping into the collective intelligence of the security community.
Reflect on these questions:
- How does automated IOC enrichment change the workflow of a security analyst compared to manual lookups?
- What are the risks of depending on external APIs for security operations?
- How would you design a system that gracefully handles API outages?
- What ethical considerations exist when querying threat intelligence APIs (rate limits, terms of service, data privacy)?
A strong reflection will consider both the power and the responsibilities of connecting to external security intelligence.
Verified Resources & Videos
- HTTP Status Codes: MDN - HTTP Status Codes
- Free Threat Intel APIs: AbuseIPDB API Documentation
- Security perspective (MITRE ATT&CK): MITRE ATT&CK — Web Service (T1102)
External data transforms local scripts into connected intelligence tools. With API skills, you can tap into VirusTotal, Shodan, threat feeds, and countless other resources. Next week: pattern matching with regular expressions.