Week Overview
This week transitions from raw sockets to high-level HTTP interactions—the foundation of modern web security testing. You'll learn to:
- Master HTTP protocol mechanics (requests, responses, headers, cookies)
- Build web security scanners and reconnaissance tools
- Interact with security APIs (VirusTotal, Shodan, URLhaus)
- Implement web scraping for OSINT gathering
- Automate directory bruteforcing and subdomain enumeration
Section 1: HTTP Protocol Deep Dive
Understanding HTTP: The Web's Foundation
HTTP (HyperText Transfer Protocol) is a request-response protocol operating at Layer 7 (Application) of the OSI model. Every web interaction—from browsing Google to exploiting XSS—uses HTTP.
HTTP Request Anatomy
When your browser visits https://example.com/login?user=admin, it sends:
GET /login?user=admin HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Accept: text/html,application/json
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: sessionid=abc123; csrftoken=xyz789
Request Components:
- Request Line:
GET /login?user=admin HTTP/1.1GET: HTTP method (also POST, PUT, DELETE, OPTIONS, etc.)/login?user=admin: Resource path + query stringHTTP/1.1: Protocol version
- Headers: Key-value metadata
Host: Target domain (required in HTTP/1.1)User-Agent: Client identifier (scanners often change this)Cookie: Session/authentication tokens (critical for testing)
- Body: (in POST/PUT requests) - Form data, JSON payloads, file uploads
HTTP Response Anatomy
Server responds with:
HTTP/1.1 200 OK
Date: Sat, 18 Jan 2026 12:00:00 GMT
Server: nginx/1.18.0
Content-Type: text/html; charset=UTF-8
Content-Length: 1234
Set-Cookie: sessionid=abc123; Path=/; HttpOnly; Secure
X-Frame-Options: DENY
Strict-Transport-Security: max-age=31536000
<!DOCTYPE html>
<html><head><title>Login</title></head>...
Response Components:
- Status Line:
HTTP/1.1 200 OK200: Status code (2xx = success, 3xx = redirect, 4xx = client error, 5xx = server error)
- Headers: Server metadata
Server: Web server software (can leak version info)Set-Cookie: Sends cookies to clientContent-Type: Response format (HTML, JSON, etc.)- Security headers:
X-Frame-Options,Content-Security-Policy,HSTS
- Body: Actual content (HTML, JSON, binary data)
HTTP Status Codes (Security Perspective)
#!/usr/bin/env python3
"""
HTTP status codes relevant for security testing
"""
# 2xx Success
STATUS_CODES = {
200: 'OK - Request succeeded',
201: 'Created - Resource created (POST success)',
204: 'No Content - Success but no response body',
# 3xx Redirection
301: 'Moved Permanently - Resource relocated (follow redirect)',
302: 'Found - Temporary redirect',
304: 'Not Modified - Cached version is current',
# 4xx Client Errors (Important for scanning)
400: 'Bad Request - Malformed request (may indicate WAF/filtering)',
401: 'Unauthorized - Authentication required',
403: 'Forbidden - Access denied (resource exists but restricted)',
404: 'Not Found - Resource does not exist',
405: 'Method Not Allowed - HTTP method rejected (e.g., DELETE blocked)',
429: 'Too Many Requests - Rate limiting active',
# 5xx Server Errors (Potential vulnerabilities)
500: 'Internal Server Error - Application crash (SQL errors, exceptions)',
502: 'Bad Gateway - Proxy/gateway error',
503: 'Service Unavailable - Server overloaded/down',
}
# Security insights from status codes:
# - 403 on /admin: Directory exists but requires authentication
# - 404 vs 403: Reveals if resource exists (information disclosure)
# - 500 errors: May leak stack traces, SQL errors (inject payloads to trigger)
# - 401 + 403: Test for authentication vs authorization flaws
Common HTTP Methods
| Method | Purpose | Security Testing Use |
|---|---|---|
GET |
Retrieve resource | Directory enumeration, XSS in URL params |
POST |
Submit data (forms, APIs) | SQLi, XSS, authentication bypass |
PUT |
Upload/replace resource | File upload vulns, unauthorized modifications |
DELETE |
Remove resource | Authorization bypass, IDOR testing |
OPTIONS |
Query allowed methods | CORS misconfig, method enumeration |
HEAD |
Get headers only (no body) | Fast resource existence checks |
Section 2: Python's requests Library
Why requests (Not urllib)?
Python's built-in urllib works but is verbose. requests (by Kenneth Reitz) is industry standard for HTTP work.
# Install requests
pip install requests
Basic GET Request
#!/usr/bin/env python3
"""
Simple HTTP GET request with requests library
"""
import requests
def basic_get_request(url: str) -> None:
"""
Perform GET request and display response.
Args:
url: Target URL
"""
try:
# Send GET request
response = requests.get(url, timeout=5)
# Status code
print(f"[*] Status: {response.status_code}")
# Headers (dict)
print(f"[*] Server: {response.headers.get('Server', 'Unknown')}")
print(f"[*] Content-Type: {response.headers.get('Content-Type')}")
# Response body
print(f"\n[*] Response body (first 500 chars):")
print(response.text[:500])
# Or access raw bytes
# print(response.content[:500])
except requests.exceptions.Timeout:
print(f"[!] Request timeout for {url}")
except requests.exceptions.ConnectionError:
print(f"[!] Connection error to {url}")
except requests.exceptions.RequestException as e:
print(f"[!] Request failed: {e}")
# Usage
if __name__ == '__main__':
basic_get_request('https://httpbin.org/get')
Custom Headers and User-Agents
Many web apps block default Python user-agents. Customize headers to bypass simple filters:
#!/usr/bin/env python3
"""
Custom headers for stealthy requests
"""
import requests
def request_with_custom_headers(url: str) -> None:
"""
Send request with custom headers.
"""
# Custom headers dict
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
}
response = requests.get(url, headers=headers, timeout=5)
print(f"[*] Status: {response.status_code}")
print(f"[*] Response length: {len(response.content)} bytes")
# Check if our User-Agent was accepted
if 'User-Agent' in response.request.headers:
print(f"[*] User-Agent sent: {response.request.headers['User-Agent']}")
# Common User-Agents for testing
USER_AGENTS = {
'chrome_windows': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'firefox_linux': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0',
'mobile_ios': 'Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15',
'googlebot': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',
'python_default': 'python-requests/2.31.0', # Default (often blocked)
}
if __name__ == '__main__':
request_with_custom_headers('https://httpbin.org/headers')
POST Requests: Form Data and JSON
#!/usr/bin/env python3
"""
POST requests with form data and JSON payloads
"""
import requests
import json
# Example 1: Form data (like HTML form submission)
def post_form_data(url: str, username: str, password: str) -> None:
"""
Submit login form (application/x-www-form-urlencoded).
"""
data = {
'username': username,
'password': password,
'submit': 'Login'
}
response = requests.post(url, data=data, timeout=5)
print(f"[*] Status: {response.status_code}")
print(f"[*] Response:\n{response.text[:500]}")
# Example 2: JSON payload (modern APIs)
def post_json_data(url: str, payload: dict) -> None:
"""
Send JSON data (application/json).
"""
# requests automatically sets Content-Type: application/json
response = requests.post(url, json=payload, timeout=5)
print(f"[*] Status: {response.status_code}")
# Parse JSON response
try:
response_json = response.json()
print(f"[*] JSON Response: {json.dumps(response_json, indent=2)}")
except json.JSONDecodeError:
print(f"[*] Non-JSON response: {response.text[:200]}")
# Example 3: Multipart form data (file uploads)
def post_file_upload(url: str, file_path: str) -> None:
"""
Upload file (multipart/form-data).
"""
with open(file_path, 'rb') as f:
files = {
'file': ('upload.txt', f, 'text/plain')
}
# Additional form fields
data = {
'description': 'Test upload'
}
response = requests.post(url, files=files, data=data, timeout=10)
print(f"[*] Status: {response.status_code}")
print(f"[*] Response: {response.text[:300]}")
if __name__ == '__main__':
# Test form login
post_form_data('https://httpbin.org/post', 'admin', 'password123')
# Test JSON API
payload = {
'action': 'search',
'query': 'vulnerability',
'limit': 10
}
post_json_data('https://httpbin.org/post', payload)
Session Management and Cookies
Sessions persist cookies and connection pooling across requests:
#!/usr/bin/env python3
"""
Session management for authenticated testing
"""
import requests
def session_based_requests() -> None:
"""
Use Session object to maintain cookies across requests.
"""
# Create session
session = requests.Session()
# Set default headers for all requests in this session
session.headers.update({
'User-Agent': 'SecurityScanner/1.0'
})
# Step 1: Login (server sets session cookie)
login_url = 'https://httpbin.org/cookies/set/sessionid/abc123xyz'
response = session.get(login_url)
print(f"[*] Login response: {response.status_code}")
print(f"[*] Cookies set: {session.cookies.get_dict()}")
# Step 2: Access protected resource (session cookie automatically sent)
protected_url = 'https://httpbin.org/cookies'
response = session.get(protected_url)
print(f"\n[*] Protected resource response:")
print(response.text)
# Step 3: Manual cookie manipulation
session.cookies.set('custom_token', 'my_value', domain='httpbin.org')
response = session.get('https://httpbin.org/cookies')
print(f"\n[*] After adding custom cookie:")
print(response.text)
def cookie_extraction_example() -> None:
"""
Extract and analyze cookies from response.
"""
response = requests.get('https://httpbin.org/cookies/set?token=secret123')
# Get all cookies
cookies = response.cookies
print("[*] Cookies received:")
for cookie in cookies:
print(f" - {cookie.name} = {cookie.value}")
print(f" Domain: {cookie.domain}")
print(f" Path: {cookie.path}")
print(f" Secure: {cookie.secure}")
print(f" HttpOnly: {cookie.has_nonstandard_attr('HttpOnly')}")
if __name__ == '__main__':
session_based_requests()
print("\n" + "="*60 + "\n")
cookie_extraction_example()
Handling Redirects and SSL
#!/usr/bin/env python3
"""
Redirect handling and SSL verification
"""
import requests
def handle_redirects(url: str) -> None:
"""
Control redirect behavior.
"""
# Follow redirects (default behavior)
response = requests.get(url, allow_redirects=True)
print(f"[*] Final URL: {response.url}")
print(f"[*] Status: {response.status_code}")
print(f"[*] Redirect history: {[r.status_code for r in response.history]}")
print("\n" + "-"*60 + "\n")
# Don't follow redirects (useful for detecting redirects)
response = requests.get(url, allow_redirects=False)
print(f"[*] Status without following: {response.status_code}")
if 300 <= response.status_code < 400:
redirect_location = response.headers.get('Location')
print(f"[*] Redirects to: {redirect_location}")
def ssl_verification_options(url: str) -> None:
"""
SSL/TLS certificate verification options.
"""
# Verify SSL certificate (default, recommended)
try:
response = requests.get(url, verify=True, timeout=5)
print(f"[✓] SSL verification passed: {response.status_code}")
except requests.exceptions.SSLError as e:
print(f"[!] SSL verification failed: {e}")
# Disable SSL verification (NOT recommended for production)
# Useful for testing self-signed certs in labs
try:
response = requests.get(url, verify=False, timeout=5)
print(f"[*] Request with verify=False: {response.status_code}")
except Exception as e:
print(f"[!] Request failed: {e}")
if __name__ == '__main__':
# Test redirect handling
handle_redirects('http://github.com') # Redirects to https
print("\n" + "="*60 + "\n")
# Test SSL verification
ssl_verification_options('https://self-signed.badssl.com/')
Section 3: Web Scraping with BeautifulSoup
Why Web Scraping for Security?
Web scraping automates OSINT gathering:
- Extract subdomains from certificate transparency logs
- Scrape public employee data from LinkedIn for social engineering
- Parse vulnerability databases (CVE, ExploitDB)
- Monitor paste sites for leaked credentials
- Collect threat intelligence from security blogs
# Install BeautifulSoup4 and parser
pip install beautifulsoup4 lxml
Basic HTML Parsing
#!/usr/bin/env python3
"""
Basic web scraping with BeautifulSoup
"""
import requests
from bs4 import BeautifulSoup
def scrape_basic_example(url: str) -> None:
"""
Scrape and parse HTML content.
"""
# Fetch page
response = requests.get(url, timeout=10)
if response.status_code != 200:
print(f"[!] Failed to fetch {url}: {response.status_code}")
return
# Parse HTML
soup = BeautifulSoup(response.content, 'lxml')
# Extract title
title = soup.title.string if soup.title else "No title"
print(f"[*] Page Title: {title}")
# Find all links
print(f"\n[*] Links found:")
links = soup.find_all('a', href=True)
for link in links[:10]: # First 10 links
href = link['href']
text = link.get_text(strip=True)
print(f" - {text[:50]}: {href}")
# Find all images
print(f"\n[*] Images found:")
images = soup.find_all('img', src=True)
for img in images[:5]:
src = img['src']
alt = img.get('alt', 'No alt text')
print(f" - {alt}: {src}")
def extract_forms(url: str) -> None:
"""
Extract all forms from a page (useful for testing).
"""
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.content, 'lxml')
forms = soup.find_all('form')
print(f"[*] Found {len(forms)} forms on {url}")
for i, form in enumerate(forms, 1):
print(f"\n[Form {i}]")
print(f" Action: {form.get('action', 'None')}")
print(f" Method: {form.get('method', 'GET').upper()}")
# Extract input fields
inputs = form.find_all('input')
print(f" Input fields:")
for inp in inputs:
name = inp.get('name', 'unnamed')
input_type = inp.get('type', 'text')
value = inp.get('value', '')
print(f" - {name} (type={input_type}, value={value})")
def scrape_security_blog() -> None:
"""
Example: Scrape latest security advisories.
"""
url = 'https://www.exploit-db.com/'
try:
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.content, 'lxml')
# Find exploit titles (example structure, may change)
# This demonstrates technique, actual selectors depend on site
exploits = soup.find_all('a', class_='exploit-link')[:5]
print(f"[*] Recent exploits from ExploitDB:")
for exploit in exploits:
title = exploit.get_text(strip=True)
link = exploit['href']
print(f" - {title}")
print(f" {link}\n")
except Exception as e:
print(f"[!] Scraping failed: {e}")
if __name__ == '__main__':
# Example: Scrape httpbin test page
scrape_basic_example('https://httpbin.org/html')
print("\n" + "="*60 + "\n")
# Extract forms
extract_forms('https://httpbin.org/forms/post')
CSS Selectors for Precise Extraction
#!/usr/bin/env python3
"""
Using CSS selectors for precise data extraction
"""
from bs4 import BeautifulSoup
import requests
def css_selector_examples(url: str) -> None:
"""
Demonstrate CSS selector usage.
"""
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.content, 'lxml')
# Select by ID
element = soup.select_one('#specific-id')
if element:
print(f"[*] Element with ID: {element.get_text(strip=True)}")
# Select by class
elements = soup.select('.class-name')
print(f"[*] Elements with class: {len(elements)}")
# Select by tag and class
divs = soup.select('div.container')
print(f"[*] Div containers: {len(divs)}")
# Select nested elements
links_in_nav = soup.select('nav a')
print(f"[*] Links in navigation: {len(links_in_nav)}")
# Select by attribute
external_links = soup.select('a[target="_blank"]')
print(f"[*] External links: {len(external_links)}")
# Select specific children
first_paragraph = soup.select_one('div.content > p:first-child')
if first_paragraph:
print(f"[*] First paragraph: {first_paragraph.get_text(strip=True)[:100]}")
# Common CSS selectors for security testing
"""
Selector | Matches
----------------------|----------------------------------
#id | Element with id="id"
.class | Elements with class="class"
tag | All elements
tag.class | with class="class"
tag#id | with id="id"
parent > child | Direct child
ancestor descendant | Any descendant
[attribute] | Elements with attribute
[attribute=value] | Specific attribute value
:first-child | First child element
:nth-child(n) | Nth child element
"""
def scrape_table_data(url: str) -> None:
"""
Extract data from HTML tables.
"""
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.content, 'lxml')
# Find all tables
tables = soup.find_all('table')
for i, table in enumerate(tables, 1):
print(f"\n[Table {i}]")
# Extract headers
headers = []
header_row = table.find('thead')
if header_row:
headers = [th.get_text(strip=True) for th in header_row.find_all('th')]
print(f" Headers: {headers}")
# Extract rows
rows = table.find_all('tr')
print(f" Rows: {len(rows)}")
for row in rows[:3]: # First 3 rows
cells = [td.get_text(strip=True) for td in row.find_all('td')]
if cells:
print(f" {cells}")
if __name__ == '__main__':
css_selector_examples('https://example.com')
Handling Dynamic Content
BeautifulSoup only parses static HTML. For JavaScript-rendered content, use Selenium:
#!/usr/bin/env python3
"""
Scraping JavaScript-heavy sites with Selenium
(For when BeautifulSoup can't access dynamic content)
"""
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
def scrape_with_selenium(url: str) -> None:
"""
Use Selenium for JavaScript-rendered pages.
"""
# Setup headless Chrome
chrome_options = Options()
chrome_options.add_argument('--headless') # Run without GUI
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
# Initialize driver
driver = webdriver.Chrome(options=chrome_options)
try:
# Load page
driver.get(url)
# Wait for specific element to load (max 10 seconds)
wait = WebDriverWait(driver, 10)
element = wait.until(
EC.presence_of_element_located((By.CLASS_NAME, 'dynamic-content'))
)
# Extract data
print(f"[*] Page title: {driver.title}")
# Find elements
links = driver.find_elements(By.TAG_NAME, 'a')
print(f"[*] Links found: {len(links)}")
for link in links[:5]:
print(f" - {link.text}: {link.get_attribute('href')}")
# Execute JavaScript
result = driver.execute_script("return document.body.innerHTML;")
print(f"\n[*] Page HTML length: {len(result)} chars")
except Exception as e:
print(f"[!] Selenium error: {e}")
finally:
driver.quit()
# Note: Install selenium and chromedriver:
# pip install selenium
# brew install chromedriver # macOS
# apt install chromium-chromedriver # Linux
Section 4: REST APIs and Security Data Sources
Working with Security APIs
Many security platforms provide APIs for threat intelligence, vulnerability data, and scanning:
- VirusTotal: File/URL reputation, malware analysis
- Shodan: Internet-connected device search
- URLhaus: Malware URL database
- AbuseIPDB: IP reputation and abuse reports
- Have I Been Pwned: Breach notification
VirusTotal API Example
#!/usr/bin/env python3
"""
VirusTotal API integration for URL/file reputation checks
"""
import requests
import time
import hashlib
class VirusTotalClient:
"""
VirusTotal API client for security checks.
"""
def __init__(self, api_key: str):
"""
Initialize with API key (get free key from virustotal.com).
"""
self.api_key = api_key
self.base_url = 'https://www.virustotal.com/api/v3'
self.headers = {
'x-apikey': self.api_key,
'Accept': 'application/json'
}
def check_url(self, url: str) -> dict:
"""
Check URL reputation.
Returns:
API response dict
"""
# Submit URL for scanning
scan_url = f'{self.base_url}/urls'
data = {'url': url}
response = requests.post(scan_url, headers=self.headers, data=data)
if response.status_code != 200:
return {'error': f'API error: {response.status_code}'}
result = response.json()
# Get scan ID
scan_id = result['data']['id']
# Wait for analysis to complete
print(f"[*] Scanning {url}... (ID: {scan_id})")
time.sleep(15) # VirusTotal needs time to analyze
# Get results
analysis_url = f'{self.base_url}/analyses/{scan_id}'
response = requests.get(analysis_url, headers=self.headers)
if response.status_code == 200:
return response.json()
else:
return {'error': f'Failed to get results: {response.status_code}'}
def check_file_hash(self, file_hash: str) -> dict:
"""
Check file hash (MD5, SHA1, or SHA256).
Returns:
Detection results
"""
url = f'{self.base_url}/files/{file_hash}'
response = requests.get(url, headers=self.headers)
if response.status_code == 200:
return response.json()
elif response.status_code == 404:
return {'error': 'File not found in VirusTotal database'}
else:
return {'error': f'API error: {response.status_code}'}
def parse_results(self, results: dict) -> None:
"""
Parse and display scan results.
"""
if 'error' in results:
print(f"[!] {results['error']}")
return
try:
stats = results['data']['attributes']['stats']
print(f"\n[*] Scan Results:")
print(f" Malicious: {stats.get('malicious', 0)}")
print(f" Suspicious: {stats.get('suspicious', 0)}")
print(f" Harmless: {stats.get('harmless', 0)}")
print(f" Undetected: {stats.get('undetected', 0)}")
# Verdict
if stats.get('malicious', 0) > 0:
print(f"\n[!] WARNING: Detected as malicious by {stats['malicious']} engines!")
else:
print(f"\n[✓] No malicious detections")
except KeyError as e:
print(f"[!] Error parsing results: {e}")
# Usage example
if __name__ == '__main__':
# Get your free API key from: https://www.virustotal.com/gui/join-us
API_KEY = 'YOUR_API_KEY_HERE'
vt = VirusTotalClient(API_KEY)
# Check suspicious URL
# results = vt.check_url('http://malware-traffic-analysis.net')
# vt.parse_results(results)
# Check file hash
# malware_hash = 'd41d8cd98f00b204e9800998ecf8427e' # Example MD5
# results = vt.check_file_hash(malware_hash)
# vt.parse_results(results)
print("[!] Set API_KEY to use this example")
Shodan API Example
#!/usr/bin/env python3
"""
Shodan API for internet-connected device reconnaissance
"""
import requests
class ShodanClient:
"""
Shodan API client for device search.
"""
def __init__(self, api_key: str):
"""
Initialize with Shodan API key (get from shodan.io).
"""
self.api_key = api_key
self.base_url = 'https://api.shodan.io'
def search(self, query: str, limit: int = 10) -> dict:
"""
Search Shodan for devices matching query.
Args:
query: Search query (e.g., 'apache', 'port:22', 'country:US')
limit: Max results to return
Returns:
Search results dict
"""
url = f'{self.base_url}/shodan/host/search'
params = {
'key': self.api_key,
'query': query,
'limit': limit
}
try:
response = requests.get(url, params=params, timeout=10)
if response.status_code == 200:
return response.json()
else:
return {'error': f'API error: {response.status_code}'}
except requests.exceptions.RequestException as e:
return {'error': f'Request failed: {e}'}
def get_host_info(self, ip: str) -> dict:
"""
Get detailed information about an IP address.
Args:
ip: Target IP address
Returns:
Host information dict
"""
url = f'{self.base_url}/shodan/host/{ip}'
params = {'key': self.api_key}
try:
response = requests.get(url, params=params, timeout=10)
if response.status_code == 200:
return response.json()
else:
return {'error': f'API error: {response.status_code}'}
except requests.exceptions.RequestException as e:
return {'error': f'Request failed: {e}'}
def parse_search_results(self, results: dict) -> None:
"""
Parse and display Shodan search results.
"""
if 'error' in results:
print(f"[!] {results['error']}")
return
matches = results.get('matches', [])
total = results.get('total', 0)
print(f"[*] Total results: {total}")
print(f"[*] Showing: {len(matches)}\n")
for i, match in enumerate(matches, 1):
ip = match.get('ip_str', 'Unknown')
port = match.get('port', 0)
org = match.get('org', 'Unknown')
location = match.get('location', {})
country = location.get('country_name', 'Unknown')
print(f"[{i}] {ip}:{port}")
print(f" Organization: {org}")
print(f" Location: {country}")
# Display banner (first 200 chars)
banner = match.get('data', '')
if banner:
print(f" Banner: {banner[:200]}")
print()
# Usage
if __name__ == '__main__':
API_KEY = 'YOUR_SHODAN_API_KEY'
shodan = ShodanClient(API_KEY)
# Search for Apache servers
# results = shodan.search('apache', limit=5)
# shodan.parse_search_results(results)
# Get info about specific IP
# info = shodan.get_host_info('8.8.8.8')
# print(info)
print("[!] Set API_KEY to use this example")
Lab 6: HTTP Security Tools
Part 1: Directory Bruteforcer (35 minutes)
Objective: Build a tool to discover hidden directories and files on web servers.
Requirements:
- Create
dirbrute.pythat:- Accepts target URL and wordlist file as arguments
- Tests each word as a directory/file path
- Identifies valid paths (200, 301, 302, 403 status codes)
- Supports custom file extensions (e.g., .php, .html, .txt)
- Implements rate limiting to avoid DoS
- Displays progress and results in real-time
- Test against DVWA or local test server
- Generate report of discovered paths
Example Usage:
python dirbrute.py http://testphp.vulnweb.com wordlist.txt -e php,html -t 10
Success Criteria:
- Discovers at least 5 valid paths from wordlist
- Handles timeouts and errors gracefully
- Rate limiting prevents overwhelming server
- Output shows status codes and response sizes
Hint: Wordlist Structure
def load_wordlist(file_path: str) -> list:
"""
Load wordlist from file.
"""
try:
with open(file_path, 'r') as f:
words = [line.strip() for line in f if line.strip()]
return words
except FileNotFoundError:
print(f"[!] Wordlist not found: {file_path}")
return []
def test_path(base_url: str, path: str, extensions: list = None) -> None:
"""
Test if path exists on server.
"""
paths_to_test = [path]
# Add extension variants
if extensions:
for ext in extensions:
paths_to_test.append(f"{path}.{ext}")
for test_path in paths_to_test:
url = f"{base_url.rstrip('/')}/{test_path.lstrip('/')}"
try:
response = requests.get(url, timeout=5, allow_redirects=False)
# Interesting status codes
if response.status_code in [200, 201, 301, 302, 403]:
size = len(response.content)
print(f"[{response.status_code}] {url} ({size} bytes)")
except requests.exceptions.Timeout:
pass # Ignore timeouts
except requests.exceptions.RequestException:
pass # Ignore other errors
# Rate limiting
time.sleep(0.1) # 100ms delay between requests
# Common wordlist sources:
# - SecLists: https://github.com/danielmiessler/SecLists
# - DirBuster wordlists (built into Kali)
# - Custom wordlists based on target technology
Part 2: Subdomain Enumerator (40 minutes)
Objective: Discover subdomains of a target domain for reconnaissance.
Requirements:
- Create
subenum.pywith multiple discovery methods:- Wordlist bruteforcing: Test common subdomain names
- Certificate Transparency logs: Query crt.sh API
- DNS enumeration: Test for wildcard DNS
- Verify each subdomain resolves (DNS lookup)
- Attempt HTTP/HTTPS connection to active subdomains
- Extract server headers and technologies
- Export results to JSON/CSV
Example Usage:
python subenum.py example.com -w subdomains.txt -o results.json
Success Criteria:
- Discovers at least 3 subdomains via wordlist
- Queries crt.sh and extracts subdomains from certificates
- Detects wildcard DNS if present
- Outputs structured report with subdomain details
Hint: Certificate Transparency API
import requests
import json
def query_crt_sh(domain: str) -> set:
"""
Query crt.sh for subdomains from certificate transparency logs.
"""
url = f'https://crt.sh/?q=%.{domain}&output=json'
try:
response = requests.get(url, timeout=15)
if response.status_code == 200:
data = response.json()
subdomains = set()
for entry in data:
name = entry.get('name_value', '')
# Handle wildcard and multi-line entries
for subdomain in name.split('\n'):
subdomain = subdomain.strip().replace('*.', '')
if subdomain.endswith(domain):
subdomains.add(subdomain)
return subdomains
except Exception as e:
print(f"[!] crt.sh query failed: {e}")
return set()
def verify_subdomain(subdomain: str) -> dict:
"""
Verify subdomain exists and is accessible.
"""
import socket
result = {
'subdomain': subdomain,
'resolves': False,
'ip': None,
'http_status': None,
'https_status': None,
}
# DNS resolution
try:
ip = socket.gethostbyname(subdomain)
result['resolves'] = True
result['ip'] = ip
except socket.gaierror:
return result # DNS failed
# HTTP check
for protocol in ['http', 'https']:
url = f'{protocol}://{subdomain}'
try:
response = requests.get(url, timeout=5, allow_redirects=True)
result[f'{protocol}_status'] = response.status_code
except:
pass
return result
# Usage
subdomains = query_crt_sh('example.com')
print(f"[*] Found {len(subdomains)} subdomains from CT logs")
for sub in list(subdomains)[:10]:
print(f" - {sub}")
Part 3: HTTP Header Analyzer (30 minutes)
Objective: Analyze HTTP response headers for security misconfigurations.
Requirements:
- Create
header_analyzer.pythat:- Fetches target URL and extracts all response headers
- Checks for missing security headers:
Strict-Transport-Security(HSTS)X-Frame-Options(clickjacking protection)X-Content-Type-Options(MIME sniffing)Content-Security-Policy(CSP)X-XSS-Protection(legacy XSS filter)
- Identifies information disclosure headers (
Server,X-Powered-By) - Analyzes cookie security flags (
HttpOnly,Secure,SameSite) - Generates security score and recommendations
Success Criteria:
- Detects all missing security headers
- Flags insecure cookies
- Provides actionable remediation advice
- Outputs color-coded report (green=secure, yellow=warning, red=critical)
Hint: Security Header Checks
def analyze_security_headers(url: str) -> dict:
"""
Analyze security headers and return findings.
"""
try:
response = requests.get(url, timeout=10)
except Exception as e:
return {'error': f'Request failed: {e}'}
headers = response.headers
findings = {
'url': url,
'status': response.status_code,
'missing_headers': [],
'present_headers': [],
'information_disclosure': [],
'cookie_issues': [],
}
# Required security headers
security_headers = {
'Strict-Transport-Security': 'HSTS not set - site vulnerable to SSL stripping',
'X-Frame-Options': 'Clickjacking protection missing',
'X-Content-Type-Options': 'MIME sniffing protection missing',
'Content-Security-Policy': 'CSP not implemented - XSS risk higher',
'Referrer-Policy': 'Referrer leakage possible',
}
for header, description in security_headers.items():
if header not in headers:
findings['missing_headers'].append({
'header': header,
'risk': description
})
else:
findings['present_headers'].append({
'header': header,
'value': headers[header]
})
# Information disclosure
disclosure_headers = ['Server', 'X-Powered-By', 'X-AspNet-Version']
for header in disclosure_headers:
if header in headers:
findings['information_disclosure'].append({
'header': header,
'value': headers[header],
'risk': 'Version information may help attackers'
})
# Cookie analysis
if 'Set-Cookie' in headers:
cookies = headers.get('Set-Cookie', '')
if 'HttpOnly' not in cookies:
findings['cookie_issues'].append('HttpOnly flag missing - XSS can steal cookies')
if 'Secure' not in cookies:
findings['cookie_issues'].append('Secure flag missing - cookies sent over HTTP')
if 'SameSite' not in cookies:
findings['cookie_issues'].append('SameSite not set - CSRF risk')
return findings
def print_report(findings: dict) -> None:
"""
Print color-coded security report.
"""
print(f"\n{'='*60}")
print(f"Security Header Analysis: {findings['url']}")
print(f"{'='*60}\n")
# Missing headers (RED)
if findings['missing_headers']:
print(f"[!] MISSING SECURITY HEADERS ({len(findings['missing_headers'])}):")
for item in findings['missing_headers']:
print(f" ❌ {item['header']}: {item['risk']}")
print()
# Present headers (GREEN)
if findings['present_headers']:
print(f"[✓] SECURITY HEADERS PRESENT ({len(findings['present_headers'])}):")
for item in findings['present_headers']:
print(f" ✅ {item['header']}: {item['value'][:50]}")
print()
# Information disclosure (YELLOW)
if findings['information_disclosure']:
print(f"[!] INFORMATION DISCLOSURE ({len(findings['information_disclosure'])}):")
for item in findings['information_disclosure']:
print(f" ⚠️ {item['header']}: {item['value']}")
print()
# Cookie issues (RED)
if findings['cookie_issues']:
print(f"[!] COOKIE SECURITY ISSUES:")
for issue in findings['cookie_issues']:
print(f" ❌ {issue}")
print()
# Security score
total_checks = len(security_headers)
passed = len(findings['present_headers'])
score = (passed / total_checks) * 100
print(f"{'='*60}")
print(f"Security Score: {score:.1f}% ({passed}/{total_checks} headers)")
print(f"{'='*60}\n")
# Usage
findings = analyze_security_headers('https://example.com')
print_report(findings)
Part 4: Web Login Bruteforcer (30 minutes)
Objective: Build a form-based authentication bruteforcer (for authorized testing only).
Requirements:
- Create
loginbrute.pythat:- Accepts target login URL, username, and password wordlist
- Extracts form fields automatically (username, password, submit)
- Handles CSRF tokens (if present)
- Tests each password from wordlist
- Detects successful login (status code, response size, redirect)
- Implements rate limiting and backoff on failures
- Stops on successful authentication
- Test against DVWA login form (setup in local VM)
⚠️ CRITICAL: Only test on systems you own or have explicit written permission to test.
Success Criteria:
- Successfully logs in with correct credentials from wordlist
- Handles CSRF token extraction and submission
- Rate limiting prevents account lockout
- Detects login success based on response analysis
Hint: Form Parsing and CSRF Handling
from bs4 import BeautifulSoup
import requests
def extract_form_fields(url: str, form_id: str = None) -> dict:
"""
Extract form fields from login page.
"""
session = requests.Session()
response = session.get(url)
soup = BeautifulSoup(response.content, 'lxml')
# Find login form
if form_id:
form = soup.find('form', id=form_id)
else:
form = soup.find('form') # First form
if not form:
return None
fields = {}
# Extract all input fields
for input_tag in form.find_all('input'):
name = input_tag.get('name')
value = input_tag.get('value', '')
input_type = input_tag.get('type', 'text')
if name:
fields[name] = value
return {
'session': session,
'action': form.get('action'),
'method': form.get('method', 'post').upper(),
'fields': fields
}
def bruteforce_login(url: str, username: str, passwords: list) -> None:
"""
Bruteforce login form.
"""
# Get form structure
form_data = extract_form_fields(url)
if not form_data:
print("[!] Could not find login form")
return
session = form_data['session']
fields = form_data['fields']
action = form_data['action']
# Determine username and password field names
# (Usually 'username'/'user' and 'password'/'pass')
username_field = None
password_field = None
for field_name in fields.keys():
if 'user' in field_name.lower():
username_field = field_name
elif 'pass' in field_name.lower():
password_field = field_name
if not username_field or not password_field:
print("[!] Could not identify username/password fields")
print(f"[*] Available fields: {list(fields.keys())}")
return
print(f"[*] Username field: {username_field}")
print(f"[*] Password field: {password_field}")
print(f"[*] Testing {len(passwords)} passwords...\n")
# Build full action URL
if action.startswith('http'):
submit_url = action
elif action.startswith('/'):
from urllib.parse import urlparse
parsed = urlparse(url)
submit_url = f"{parsed.scheme}://{parsed.netloc}{action}"
else:
submit_url = url
# Try each password
for password in passwords:
# Update password field
fields[username_field] = username
fields[password_field] = password
# Re-extract CSRF token if needed
response = session.get(url)
soup = BeautifulSoup(response.content, 'lxml')
csrf_input = soup.find('input', {'name': 'csrf_token'})
if csrf_input:
fields['csrf_token'] = csrf_input.get('value')
# Submit form
try:
response = session.post(submit_url, data=fields, timeout=10)
# Check for successful login
# (Customize based on application behavior)
if response.status_code == 302: # Redirect
print(f"[✓] SUCCESS! Password: {password}")
print(f"[*] Redirect to: {response.headers.get('Location')}")
break
elif 'welcome' in response.text.lower() or 'dashboard' in response.text.lower():
print(f"[✓] SUCCESS! Password: {password}")
break
elif 'incorrect' in response.text.lower() or 'invalid' in response.text.lower():
print(f"[-] Failed: {password}")
else:
print(f"[?] Unknown response for: {password} (status: {response.status_code})")
# Rate limiting
time.sleep(0.5) # 500ms between attempts
except Exception as e:
print(f"[!] Error with password '{password}': {e}")
# Usage
passwords = ['password', '123456', 'admin', 'letmein', 'welcome']
bruteforce_login('http://localhost/dvwa/login.php', 'admin', passwords)
📤 Deliverables:
dirbrute.py- Directory bruteforcing toolsubenum.py- Subdomain enumeration toolheader_analyzer.py- Security header analyzerloginbrute.py- Form-based login bruteforcer- Sample outputs/reports from each tool
- Screenshots of tools in action
Additional Resources
Python Libraries
- requests - HTTP for Humans
- BeautifulSoup4 - HTML/XML parsing
- Selenium - Browser automation
- requests Advanced Usage
Security Testing Resources
- SecLists - Wordlists for security testing
- OWASP Web Security Testing Guide
- PortSwigger Web Security Academy
- Shodan - Internet device search engine
- VirusTotal - File/URL reputation
Practice Targets (Authorized)
- DVWA - Damn Vulnerable Web Application
- Hacksplaining - Interactive web security lessons
- HackTheBox - Pentesting labs
- PortSwigger Labs - Free web security practice
Further Learning
- Black Hat Python, 2nd Edition - Chapter 5 (Web Hacking)
- SANS SEC573 - Modules 2-3 (Web Automation)
- Web Scraping with Python (freeCodeCamp)
Key Takeaways
- ✅ HTTP is a request-response protocol with headers, methods, and status codes
- ✅
requestslibrary simplifies HTTP interactions vs raw sockets - ✅ Sessions maintain cookies and connection pooling across requests
- ✅ BeautifulSoup parses HTML for web scraping and OSINT
- ✅ Security APIs (VirusTotal, Shodan) provide threat intelligence programmatically
- ✅ Directory bruteforcing discovers hidden web resources
- ✅ Subdomain enumeration expands attack surface mapping
- ✅ HTTP header analysis reveals security misconfigurations
- ✅ Form bruteforcing tests authentication strength
- ⚠️ Always obtain authorization before testing web applications
Week 06 Quiz
Test your understanding of HTTP and web interactions in Python.
Format: 10 multiple-choice questions. Passing score: 70%. Time: Untimed.
Take Quiz