CSY105 Week 06 - Week Content

Week Overview

This week transitions from raw sockets to high-level HTTP interactions—the foundation of modern web security testing. You'll learn to:

Master HTTP protocol mechanics (requests, responses, headers, cookies)
Build web security scanners and reconnaissance tools
Interact with security APIs (VirusTotal, Shodan, URLhaus)
Implement web scraping for OSINT gathering
Automate directory bruteforcing and subdomain enumeration

⚠️ Ethical Boundaries: Web scanning, scraping, and bruteforcing can be illegal if performed without authorization. Only test systems you own, explicitly authorized targets (like DVWA, HackTheBox), or public bug bounty programs. Violating this is a crime under the Computer Fraud and Abuse Act (CFAA) and similar laws globally.

Real-World Context: Tools like Burp Suite, OWASP ZAP, sqlmap, and Nikto all interact with web applications via HTTP. This week gives you the foundation to build custom scanners, automate bug bounty workflows, and integrate with security platforms. Skills align with SANS SEC573 Modules 2-3 (Web Security Automation) and Black Hat Python Chapter 5 (Web Hacking).

Section 1: HTTP Protocol Deep Dive

Understanding HTTP: The Web's Foundation

HTTP (HyperText Transfer Protocol) is a request-response protocol operating at Layer 7 (Application) of the OSI model. Every web interaction—from browsing Google to exploiting XSS—uses HTTP.

HTTP Request Anatomy

When your browser visits https://example.com/login?user=admin, it sends:

GET /login?user=admin HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Accept: text/html,application/json
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: sessionid=abc123; csrftoken=xyz789

Request Components:

Request Line: GET /login?user=admin HTTP/1.1
- GET: HTTP method (also POST, PUT, DELETE, OPTIONS, etc.)
- /login?user=admin: Resource path + query string
- HTTP/1.1: Protocol version
Headers: Key-value metadata
- Host: Target domain (required in HTTP/1.1)
- User-Agent: Client identifier (scanners often change this)
- Cookie: Session/authentication tokens (critical for testing)
Body: (in POST/PUT requests) - Form data, JSON payloads, file uploads

HTTP Response Anatomy

Server responds with:

HTTP/1.1 200 OK
Date: Sat, 18 Jan 2026 12:00:00 GMT
Server: nginx/1.18.0
Content-Type: text/html; charset=UTF-8
Content-Length: 1234
Set-Cookie: sessionid=abc123; Path=/; HttpOnly; Secure
X-Frame-Options: DENY
Strict-Transport-Security: max-age=31536000

<!DOCTYPE html>
<html><head><title>Login</title></head>...

Response Components:

Status Line: HTTP/1.1 200 OK
- 200: Status code (2xx = success, 3xx = redirect, 4xx = client error, 5xx = server error)
Headers: Server metadata
- Server: Web server software (can leak version info)
- Set-Cookie: Sends cookies to client
- Content-Type: Response format (HTML, JSON, etc.)
- Security headers: X-Frame-Options, Content-Security-Policy, HSTS
Body: Actual content (HTML, JSON, binary data)

HTTP Status Codes (Security Perspective)

#!/usr/bin/env python3
"""
HTTP status codes relevant for security testing
"""

# 2xx Success
STATUS_CODES = {
    200: 'OK - Request succeeded',
    201: 'Created - Resource created (POST success)',
    204: 'No Content - Success but no response body',

    # 3xx Redirection
    301: 'Moved Permanently - Resource relocated (follow redirect)',
    302: 'Found - Temporary redirect',
    304: 'Not Modified - Cached version is current',

    # 4xx Client Errors (Important for scanning)
    400: 'Bad Request - Malformed request (may indicate WAF/filtering)',
    401: 'Unauthorized - Authentication required',
    403: 'Forbidden - Access denied (resource exists but restricted)',
    404: 'Not Found - Resource does not exist',
    405: 'Method Not Allowed - HTTP method rejected (e.g., DELETE blocked)',
    429: 'Too Many Requests - Rate limiting active',

    # 5xx Server Errors (Potential vulnerabilities)
    500: 'Internal Server Error - Application crash (SQL errors, exceptions)',
    502: 'Bad Gateway - Proxy/gateway error',
    503: 'Service Unavailable - Server overloaded/down',
}

# Security insights from status codes:
# - 403 on /admin: Directory exists but requires authentication
# - 404 vs 403: Reveals if resource exists (information disclosure)
# - 500 errors: May leak stack traces, SQL errors (inject payloads to trigger)
# - 401 + 403: Test for authentication vs authorization flaws

Common HTTP Methods

Method	Purpose	Security Testing Use
`GET`	Retrieve resource	Directory enumeration, XSS in URL params
`POST`	Submit data (forms, APIs)	SQLi, XSS, authentication bypass
`PUT`	Upload/replace resource	File upload vulns, unauthorized modifications
`DELETE`	Remove resource	Authorization bypass, IDOR testing
`OPTIONS`	Query allowed methods	CORS misconfig, method enumeration
`HEAD`	Get headers only (no body)	Fast resource existence checks

Section 2: Python's requests Library

Why requests (Not urllib)?

Python's built-in urllib works but is verbose. requests (by Kenneth Reitz) is industry standard for HTTP work.

# Install requests
pip install requests

Basic GET Request

#!/usr/bin/env python3
"""
Simple HTTP GET request with requests library
"""
import requests

def basic_get_request(url: str) -> None:
    """
    Perform GET request and display response.

    Args:
        url: Target URL
    """
    try:
        # Send GET request
        response = requests.get(url, timeout=5)

        # Status code
        print(f"[*] Status: {response.status_code}")

        # Headers (dict)
        print(f"[*] Server: {response.headers.get('Server', 'Unknown')}")
        print(f"[*] Content-Type: {response.headers.get('Content-Type')}")

        # Response body
        print(f"\n[*] Response body (first 500 chars):")
        print(response.text[:500])

        # Or access raw bytes
        # print(response.content[:500])

    except requests.exceptions.Timeout:
        print(f"[!] Request timeout for {url}")
    except requests.exceptions.ConnectionError:
        print(f"[!] Connection error to {url}")
    except requests.exceptions.RequestException as e:
        print(f"[!] Request failed: {e}")


# Usage
if __name__ == '__main__':
    basic_get_request('https://httpbin.org/get')

Custom Headers and User-Agents

Many web apps block default Python user-agents. Customize headers to bypass simple filters:

#!/usr/bin/env python3
"""
Custom headers for stealthy requests
"""
import requests

def request_with_custom_headers(url: str) -> None:
    """
    Send request with custom headers.
    """
    # Custom headers dict
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate',
        'Connection': 'keep-alive',
        'Upgrade-Insecure-Requests': '1',
    }

    response = requests.get(url, headers=headers, timeout=5)

    print(f"[*] Status: {response.status_code}")
    print(f"[*] Response length: {len(response.content)} bytes")

    # Check if our User-Agent was accepted
    if 'User-Agent' in response.request.headers:
        print(f"[*] User-Agent sent: {response.request.headers['User-Agent']}")


# Common User-Agents for testing
USER_AGENTS = {
    'chrome_windows': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'firefox_linux': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0',
    'mobile_ios': 'Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15',
    'googlebot': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',
    'python_default': 'python-requests/2.31.0',  # Default (often blocked)
}


if __name__ == '__main__':
    request_with_custom_headers('https://httpbin.org/headers')

POST Requests: Form Data and JSON

#!/usr/bin/env python3
"""
POST requests with form data and JSON payloads
"""
import requests
import json

# Example 1: Form data (like HTML form submission)
def post_form_data(url: str, username: str, password: str) -> None:
    """
    Submit login form (application/x-www-form-urlencoded).
    """
    data = {
        'username': username,
        'password': password,
        'submit': 'Login'
    }

    response = requests.post(url, data=data, timeout=5)

    print(f"[*] Status: {response.status_code}")
    print(f"[*] Response:\n{response.text[:500]}")


# Example 2: JSON payload (modern APIs)
def post_json_data(url: str, payload: dict) -> None:
    """
    Send JSON data (application/json).
    """
    # requests automatically sets Content-Type: application/json
    response = requests.post(url, json=payload, timeout=5)

    print(f"[*] Status: {response.status_code}")

    # Parse JSON response
    try:
        response_json = response.json()
        print(f"[*] JSON Response: {json.dumps(response_json, indent=2)}")
    except json.JSONDecodeError:
        print(f"[*] Non-JSON response: {response.text[:200]}")


# Example 3: Multipart form data (file uploads)
def post_file_upload(url: str, file_path: str) -> None:
    """
    Upload file (multipart/form-data).
    """
    with open(file_path, 'rb') as f:
        files = {
            'file': ('upload.txt', f, 'text/plain')
        }

        # Additional form fields
        data = {
            'description': 'Test upload'
        }

        response = requests.post(url, files=files, data=data, timeout=10)

    print(f"[*] Status: {response.status_code}")
    print(f"[*] Response: {response.text[:300]}")


if __name__ == '__main__':
    # Test form login
    post_form_data('https://httpbin.org/post', 'admin', 'password123')

    # Test JSON API
    payload = {
        'action': 'search',
        'query': 'vulnerability',
        'limit': 10
    }
    post_json_data('https://httpbin.org/post', payload)

Session Management and Cookies

Sessions persist cookies and connection pooling across requests:

#!/usr/bin/env python3
"""
Session management for authenticated testing
"""
import requests

def session_based_requests() -> None:
    """
    Use Session object to maintain cookies across requests.
    """
    # Create session
    session = requests.Session()

    # Set default headers for all requests in this session
    session.headers.update({
        'User-Agent': 'SecurityScanner/1.0'
    })

    # Step 1: Login (server sets session cookie)
    login_url = 'https://httpbin.org/cookies/set/sessionid/abc123xyz'
    response = session.get(login_url)

    print(f"[*] Login response: {response.status_code}")
    print(f"[*] Cookies set: {session.cookies.get_dict()}")

    # Step 2: Access protected resource (session cookie automatically sent)
    protected_url = 'https://httpbin.org/cookies'
    response = session.get(protected_url)

    print(f"\n[*] Protected resource response:")
    print(response.text)

    # Step 3: Manual cookie manipulation
    session.cookies.set('custom_token', 'my_value', domain='httpbin.org')

    response = session.get('https://httpbin.org/cookies')
    print(f"\n[*] After adding custom cookie:")
    print(response.text)


def cookie_extraction_example() -> None:
    """
    Extract and analyze cookies from response.
    """
    response = requests.get('https://httpbin.org/cookies/set?token=secret123')

    # Get all cookies
    cookies = response.cookies

    print("[*] Cookies received:")
    for cookie in cookies:
        print(f"  - {cookie.name} = {cookie.value}")
        print(f"    Domain: {cookie.domain}")
        print(f"    Path: {cookie.path}")
        print(f"    Secure: {cookie.secure}")
        print(f"    HttpOnly: {cookie.has_nonstandard_attr('HttpOnly')}")


if __name__ == '__main__':
    session_based_requests()
    print("\n" + "="*60 + "\n")
    cookie_extraction_example()

Handling Redirects and SSL

#!/usr/bin/env python3
"""
Redirect handling and SSL verification
"""
import requests

def handle_redirects(url: str) -> None:
    """
    Control redirect behavior.
    """
    # Follow redirects (default behavior)
    response = requests.get(url, allow_redirects=True)
    print(f"[*] Final URL: {response.url}")
    print(f"[*] Status: {response.status_code}")
    print(f"[*] Redirect history: {[r.status_code for r in response.history]}")

    print("\n" + "-"*60 + "\n")

    # Don't follow redirects (useful for detecting redirects)
    response = requests.get(url, allow_redirects=False)
    print(f"[*] Status without following: {response.status_code}")

    if 300 <= response.status_code < 400:
        redirect_location = response.headers.get('Location')
        print(f"[*] Redirects to: {redirect_location}")


def ssl_verification_options(url: str) -> None:
    """
    SSL/TLS certificate verification options.
    """
    # Verify SSL certificate (default, recommended)
    try:
        response = requests.get(url, verify=True, timeout=5)
        print(f"[✓] SSL verification passed: {response.status_code}")
    except requests.exceptions.SSLError as e:
        print(f"[!] SSL verification failed: {e}")

    # Disable SSL verification (NOT recommended for production)
    # Useful for testing self-signed certs in labs
    try:
        response = requests.get(url, verify=False, timeout=5)
        print(f"[*] Request with verify=False: {response.status_code}")
    except Exception as e:
        print(f"[!] Request failed: {e}")


if __name__ == '__main__':
    # Test redirect handling
    handle_redirects('http://github.com')  # Redirects to https

    print("\n" + "="*60 + "\n")

    # Test SSL verification
    ssl_verification_options('https://self-signed.badssl.com/')

Section 3: Web Scraping with BeautifulSoup

Why Web Scraping for Security?

Web scraping automates OSINT gathering:

Extract subdomains from certificate transparency logs
Scrape public employee data from LinkedIn for social engineering
Parse vulnerability databases (CVE, ExploitDB)
Monitor paste sites for leaked credentials
Collect threat intelligence from security blogs

# Install BeautifulSoup4 and parser
pip install beautifulsoup4 lxml

Basic HTML Parsing

#!/usr/bin/env python3
"""
Basic web scraping with BeautifulSoup
"""
import requests
from bs4 import BeautifulSoup

def scrape_basic_example(url: str) -> None:
    """
    Scrape and parse HTML content.
    """
    # Fetch page
    response = requests.get(url, timeout=10)

    if response.status_code != 200:
        print(f"[!] Failed to fetch {url}: {response.status_code}")
        return

    # Parse HTML
    soup = BeautifulSoup(response.content, 'lxml')

    # Extract title
    title = soup.title.string if soup.title else "No title"
    print(f"[*] Page Title: {title}")

    # Find all links
    print(f"\n[*] Links found:")
    links = soup.find_all('a', href=True)

    for link in links[:10]:  # First 10 links
        href = link['href']
        text = link.get_text(strip=True)
        print(f"  - {text[:50]}: {href}")

    # Find all images
    print(f"\n[*] Images found:")
    images = soup.find_all('img', src=True)

    for img in images[:5]:
        src = img['src']
        alt = img.get('alt', 'No alt text')
        print(f"  - {alt}: {src}")


def extract_forms(url: str) -> None:
    """
    Extract all forms from a page (useful for testing).
    """
    response = requests.get(url, timeout=10)
    soup = BeautifulSoup(response.content, 'lxml')

    forms = soup.find_all('form')
    print(f"[*] Found {len(forms)} forms on {url}")

    for i, form in enumerate(forms, 1):
        print(f"\n[Form {i}]")
        print(f"  Action: {form.get('action', 'None')}")
        print(f"  Method: {form.get('method', 'GET').upper()}")

        # Extract input fields
        inputs = form.find_all('input')
        print(f"  Input fields:")

        for inp in inputs:
            name = inp.get('name', 'unnamed')
            input_type = inp.get('type', 'text')
            value = inp.get('value', '')
            print(f"    - {name} (type={input_type}, value={value})")


def scrape_security_blog() -> None:
    """
    Example: Scrape latest security advisories.
    """
    url = 'https://www.exploit-db.com/'

    try:
        response = requests.get(url, timeout=10)
        soup = BeautifulSoup(response.content, 'lxml')

        # Find exploit titles (example structure, may change)
        # This demonstrates technique, actual selectors depend on site
        exploits = soup.find_all('a', class_='exploit-link')[:5]

        print(f"[*] Recent exploits from ExploitDB:")
        for exploit in exploits:
            title = exploit.get_text(strip=True)
            link = exploit['href']
            print(f"  - {title}")
            print(f"    {link}\n")

    except Exception as e:
        print(f"[!] Scraping failed: {e}")


if __name__ == '__main__':
    # Example: Scrape httpbin test page
    scrape_basic_example('https://httpbin.org/html')

    print("\n" + "="*60 + "\n")

    # Extract forms
    extract_forms('https://httpbin.org/forms/post')

CSS Selectors for Precise Extraction

#!/usr/bin/env python3
"""
Using CSS selectors for precise data extraction
"""
from bs4 import BeautifulSoup
import requests

def css_selector_examples(url: str) -> None:
    """
    Demonstrate CSS selector usage.
    """
    response = requests.get(url, timeout=10)
    soup = BeautifulSoup(response.content, 'lxml')

    # Select by ID
    element = soup.select_one('#specific-id')
    if element:
        print(f"[*] Element with ID: {element.get_text(strip=True)}")

    # Select by class
    elements = soup.select('.class-name')
    print(f"[*] Elements with class: {len(elements)}")

    # Select by tag and class
    divs = soup.select('div.container')
    print(f"[*] Div containers: {len(divs)}")

    # Select nested elements
    links_in_nav = soup.select('nav a')
    print(f"[*] Links in navigation: {len(links_in_nav)}")

    # Select by attribute
    external_links = soup.select('a[target="_blank"]')
    print(f"[*] External links: {len(external_links)}")

    # Select specific children
    first_paragraph = soup.select_one('div.content > p:first-child')
    if first_paragraph:
        print(f"[*] First paragraph: {first_paragraph.get_text(strip=True)[:100]}")


# Common CSS selectors for security testing
"""
Selector              | Matches
----------------------|----------------------------------
#id                   | Element with id="id"
.class                | Elements with class="class"
tag                   | All  elements
tag.class             |  with class="class"
tag#id                |  with id="id"
parent > child        | Direct child
ancestor descendant   | Any descendant
[attribute]           | Elements with attribute
[attribute=value]     | Specific attribute value
:first-child          | First child element
:nth-child(n)         | Nth child element
"""


def scrape_table_data(url: str) -> None:
    """
    Extract data from HTML tables.
    """
    response = requests.get(url, timeout=10)
    soup = BeautifulSoup(response.content, 'lxml')

    # Find all tables
    tables = soup.find_all('table')

    for i, table in enumerate(tables, 1):
        print(f"\n[Table {i}]")

        # Extract headers
        headers = []
        header_row = table.find('thead')
        if header_row:
            headers = [th.get_text(strip=True) for th in header_row.find_all('th')]
            print(f"  Headers: {headers}")

        # Extract rows
        rows = table.find_all('tr')
        print(f"  Rows: {len(rows)}")

        for row in rows[:3]:  # First 3 rows
            cells = [td.get_text(strip=True) for td in row.find_all('td')]
            if cells:
                print(f"    {cells}")


if __name__ == '__main__':
    css_selector_examples('https://example.com')

Handling Dynamic Content

BeautifulSoup only parses static HTML. For JavaScript-rendered content, use Selenium:

#!/usr/bin/env python3
"""
Scraping JavaScript-heavy sites with Selenium
(For when BeautifulSoup can't access dynamic content)
"""
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

def scrape_with_selenium(url: str) -> None:
    """
    Use Selenium for JavaScript-rendered pages.
    """
    # Setup headless Chrome
    chrome_options = Options()
    chrome_options.add_argument('--headless')  # Run without GUI
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-dev-shm-usage')

    # Initialize driver
    driver = webdriver.Chrome(options=chrome_options)

    try:
        # Load page
        driver.get(url)

        # Wait for specific element to load (max 10 seconds)
        wait = WebDriverWait(driver, 10)
        element = wait.until(
            EC.presence_of_element_located((By.CLASS_NAME, 'dynamic-content'))
        )

        # Extract data
        print(f"[*] Page title: {driver.title}")

        # Find elements
        links = driver.find_elements(By.TAG_NAME, 'a')
        print(f"[*] Links found: {len(links)}")

        for link in links[:5]:
            print(f"  - {link.text}: {link.get_attribute('href')}")

        # Execute JavaScript
        result = driver.execute_script("return document.body.innerHTML;")
        print(f"\n[*] Page HTML length: {len(result)} chars")

    except Exception as e:
        print(f"[!] Selenium error: {e}")

    finally:
        driver.quit()


# Note: Install selenium and chromedriver:
# pip install selenium
# brew install chromedriver  # macOS
# apt install chromium-chromedriver  # Linux

Section 4: REST APIs and Security Data Sources

Working with Security APIs

Many security platforms provide APIs for threat intelligence, vulnerability data, and scanning:

VirusTotal: File/URL reputation, malware analysis
Shodan: Internet-connected device search
URLhaus: Malware URL database
AbuseIPDB: IP reputation and abuse reports
Have I Been Pwned: Breach notification

VirusTotal API Example

#!/usr/bin/env python3
"""
VirusTotal API integration for URL/file reputation checks
"""
import requests
import time
import hashlib

class VirusTotalClient:
    """
    VirusTotal API client for security checks.
    """

    def __init__(self, api_key: str):
        """
        Initialize with API key (get free key from virustotal.com).
        """
        self.api_key = api_key
        self.base_url = 'https://www.virustotal.com/api/v3'
        self.headers = {
            'x-apikey': self.api_key,
            'Accept': 'application/json'
        }

    def check_url(self, url: str) -> dict:
        """
        Check URL reputation.

        Returns:
            API response dict
        """
        # Submit URL for scanning
        scan_url = f'{self.base_url}/urls'
        data = {'url': url}

        response = requests.post(scan_url, headers=self.headers, data=data)

        if response.status_code != 200:
            return {'error': f'API error: {response.status_code}'}

        result = response.json()

        # Get scan ID
        scan_id = result['data']['id']

        # Wait for analysis to complete
        print(f"[*] Scanning {url}... (ID: {scan_id})")
        time.sleep(15)  # VirusTotal needs time to analyze

        # Get results
        analysis_url = f'{self.base_url}/analyses/{scan_id}'
        response = requests.get(analysis_url, headers=self.headers)

        if response.status_code == 200:
            return response.json()
        else:
            return {'error': f'Failed to get results: {response.status_code}'}

    def check_file_hash(self, file_hash: str) -> dict:
        """
        Check file hash (MD5, SHA1, or SHA256).

        Returns:
            Detection results
        """
        url = f'{self.base_url}/files/{file_hash}'
        response = requests.get(url, headers=self.headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 404:
            return {'error': 'File not found in VirusTotal database'}
        else:
            return {'error': f'API error: {response.status_code}'}

    def parse_results(self, results: dict) -> None:
        """
        Parse and display scan results.
        """
        if 'error' in results:
            print(f"[!] {results['error']}")
            return

        try:
            stats = results['data']['attributes']['stats']

            print(f"\n[*] Scan Results:")
            print(f"  Malicious: {stats.get('malicious', 0)}")
            print(f"  Suspicious: {stats.get('suspicious', 0)}")
            print(f"  Harmless: {stats.get('harmless', 0)}")
            print(f"  Undetected: {stats.get('undetected', 0)}")

            # Verdict
            if stats.get('malicious', 0) > 0:
                print(f"\n[!] WARNING: Detected as malicious by {stats['malicious']} engines!")
            else:
                print(f"\n[✓] No malicious detections")

        except KeyError as e:
            print(f"[!] Error parsing results: {e}")


# Usage example
if __name__ == '__main__':
    # Get your free API key from: https://www.virustotal.com/gui/join-us
    API_KEY = 'YOUR_API_KEY_HERE'

    vt = VirusTotalClient(API_KEY)

    # Check suspicious URL
    # results = vt.check_url('http://malware-traffic-analysis.net')
    # vt.parse_results(results)

    # Check file hash
    # malware_hash = 'd41d8cd98f00b204e9800998ecf8427e'  # Example MD5
    # results = vt.check_file_hash(malware_hash)
    # vt.parse_results(results)

    print("[!] Set API_KEY to use this example")

Shodan API Example

#!/usr/bin/env python3
"""
Shodan API for internet-connected device reconnaissance
"""
import requests

class ShodanClient:
    """
    Shodan API client for device search.
    """

    def __init__(self, api_key: str):
        """
        Initialize with Shodan API key (get from shodan.io).
        """
        self.api_key = api_key
        self.base_url = 'https://api.shodan.io'

    def search(self, query: str, limit: int = 10) -> dict:
        """
        Search Shodan for devices matching query.

        Args:
            query: Search query (e.g., 'apache', 'port:22', 'country:US')
            limit: Max results to return

        Returns:
            Search results dict
        """
        url = f'{self.base_url}/shodan/host/search'
        params = {
            'key': self.api_key,
            'query': query,
            'limit': limit
        }

        try:
            response = requests.get(url, params=params, timeout=10)

            if response.status_code == 200:
                return response.json()
            else:
                return {'error': f'API error: {response.status_code}'}

        except requests.exceptions.RequestException as e:
            return {'error': f'Request failed: {e}'}

    def get_host_info(self, ip: str) -> dict:
        """
        Get detailed information about an IP address.

        Args:
            ip: Target IP address

        Returns:
            Host information dict
        """
        url = f'{self.base_url}/shodan/host/{ip}'
        params = {'key': self.api_key}

        try:
            response = requests.get(url, params=params, timeout=10)

            if response.status_code == 200:
                return response.json()
            else:
                return {'error': f'API error: {response.status_code}'}

        except requests.exceptions.RequestException as e:
            return {'error': f'Request failed: {e}'}

    def parse_search_results(self, results: dict) -> None:
        """
        Parse and display Shodan search results.
        """
        if 'error' in results:
            print(f"[!] {results['error']}")
            return

        matches = results.get('matches', [])
        total = results.get('total', 0)

        print(f"[*] Total results: {total}")
        print(f"[*] Showing: {len(matches)}\n")

        for i, match in enumerate(matches, 1):
            ip = match.get('ip_str', 'Unknown')
            port = match.get('port', 0)
            org = match.get('org', 'Unknown')
            location = match.get('location', {})
            country = location.get('country_name', 'Unknown')

            print(f"[{i}] {ip}:{port}")
            print(f"    Organization: {org}")
            print(f"    Location: {country}")

            # Display banner (first 200 chars)
            banner = match.get('data', '')
            if banner:
                print(f"    Banner: {banner[:200]}")

            print()


# Usage
if __name__ == '__main__':
    API_KEY = 'YOUR_SHODAN_API_KEY'

    shodan = ShodanClient(API_KEY)

    # Search for Apache servers
    # results = shodan.search('apache', limit=5)
    # shodan.parse_search_results(results)

    # Get info about specific IP
    # info = shodan.get_host_info('8.8.8.8')
    # print(info)

    print("[!] Set API_KEY to use this example")

Lab 6: HTTP Security Tools

⚠️ Authorization Required: Only test against systems you own, authorized targets (DVWA, HackTheBox), or public bug bounty programs with explicit permission. Unauthorized scanning is illegal.

⏱️ 135 minutes total Difficulty: Intermediate

Part 1: Directory Bruteforcer (35 minutes)

Objective: Build a tool to discover hidden directories and files on web servers.

Requirements:

Create dirbrute.py that:
- Accepts target URL and wordlist file as arguments
- Tests each word as a directory/file path
- Identifies valid paths (200, 301, 302, 403 status codes)
- Supports custom file extensions (e.g., .php, .html, .txt)
- Implements rate limiting to avoid DoS
- Displays progress and results in real-time
Test against DVWA or local test server
Generate report of discovered paths

Example Usage:

python dirbrute.py http://testphp.vulnweb.com wordlist.txt -e php,html -t 10

Success Criteria:

Discovers at least 5 valid paths from wordlist
Handles timeouts and errors gracefully
Rate limiting prevents overwhelming server
Output shows status codes and response sizes

Hint: Wordlist Structure

def load_wordlist(file_path: str) -> list:
    """
    Load wordlist from file.
    """
    try:
        with open(file_path, 'r') as f:
            words = [line.strip() for line in f if line.strip()]
        return words
    except FileNotFoundError:
        print(f"[!] Wordlist not found: {file_path}")
        return []


def test_path(base_url: str, path: str, extensions: list = None) -> None:
    """
    Test if path exists on server.
    """
    paths_to_test = [path]

    # Add extension variants
    if extensions:
        for ext in extensions:
            paths_to_test.append(f"{path}.{ext}")

    for test_path in paths_to_test:
        url = f"{base_url.rstrip('/')}/{test_path.lstrip('/')}"

        try:
            response = requests.get(url, timeout=5, allow_redirects=False)

            # Interesting status codes
            if response.status_code in [200, 201, 301, 302, 403]:
                size = len(response.content)
                print(f"[{response.status_code}] {url} ({size} bytes)")

        except requests.exceptions.Timeout:
            pass  # Ignore timeouts
        except requests.exceptions.RequestException:
            pass  # Ignore other errors

        # Rate limiting
        time.sleep(0.1)  # 100ms delay between requests


# Common wordlist sources:
# - SecLists: https://github.com/danielmiessler/SecLists
# - DirBuster wordlists (built into Kali)
# - Custom wordlists based on target technology

Part 2: Subdomain Enumerator (40 minutes)

Objective: Discover subdomains of a target domain for reconnaissance.

Requirements:

Create subenum.py with multiple discovery methods:
- Wordlist bruteforcing: Test common subdomain names
- Certificate Transparency logs: Query crt.sh API
- DNS enumeration: Test for wildcard DNS
Verify each subdomain resolves (DNS lookup)
Attempt HTTP/HTTPS connection to active subdomains
Extract server headers and technologies
Export results to JSON/CSV

Example Usage:

python subenum.py example.com -w subdomains.txt -o results.json

Success Criteria:

Discovers at least 3 subdomains via wordlist
Queries crt.sh and extracts subdomains from certificates
Detects wildcard DNS if present
Outputs structured report with subdomain details

Hint: Certificate Transparency API

import requests
import json

def query_crt_sh(domain: str) -> set:
    """
    Query crt.sh for subdomains from certificate transparency logs.
    """
    url = f'https://crt.sh/?q=%.{domain}&output=json'

    try:
        response = requests.get(url, timeout=15)

        if response.status_code == 200:
            data = response.json()

            subdomains = set()
            for entry in data:
                name = entry.get('name_value', '')

                # Handle wildcard and multi-line entries
                for subdomain in name.split('\n'):
                    subdomain = subdomain.strip().replace('*.', '')
                    if subdomain.endswith(domain):
                        subdomains.add(subdomain)

            return subdomains

    except Exception as e:
        print(f"[!] crt.sh query failed: {e}")

    return set()


def verify_subdomain(subdomain: str) -> dict:
    """
    Verify subdomain exists and is accessible.
    """
    import socket

    result = {
        'subdomain': subdomain,
        'resolves': False,
        'ip': None,
        'http_status': None,
        'https_status': None,
    }

    # DNS resolution
    try:
        ip = socket.gethostbyname(subdomain)
        result['resolves'] = True
        result['ip'] = ip
    except socket.gaierror:
        return result  # DNS failed

    # HTTP check
    for protocol in ['http', 'https']:
        url = f'{protocol}://{subdomain}'
        try:
            response = requests.get(url, timeout=5, allow_redirects=True)
            result[f'{protocol}_status'] = response.status_code
        except:
            pass

    return result


# Usage
subdomains = query_crt_sh('example.com')
print(f"[*] Found {len(subdomains)} subdomains from CT logs")

for sub in list(subdomains)[:10]:
    print(f"  - {sub}")

Part 3: HTTP Header Analyzer (30 minutes)

Objective: Analyze HTTP response headers for security misconfigurations.

Requirements:

Create header_analyzer.py that:
- Fetches target URL and extracts all response headers
- Checks for missing security headers:
  - Strict-Transport-Security (HSTS)
  - X-Frame-Options (clickjacking protection)
  - X-Content-Type-Options (MIME sniffing)
  - Content-Security-Policy (CSP)
  - X-XSS-Protection (legacy XSS filter)
- Identifies information disclosure headers (Server, X-Powered-By)
- Analyzes cookie security flags (HttpOnly, Secure, SameSite)
- Generates security score and recommendations

Success Criteria:

Detects all missing security headers
Flags insecure cookies
Provides actionable remediation advice
Outputs color-coded report (green=secure, yellow=warning, red=critical)

Hint: Security Header Checks

def analyze_security_headers(url: str) -> dict:
    """
    Analyze security headers and return findings.
    """
    try:
        response = requests.get(url, timeout=10)
    except Exception as e:
        return {'error': f'Request failed: {e}'}

    headers = response.headers
    findings = {
        'url': url,
        'status': response.status_code,
        'missing_headers': [],
        'present_headers': [],
        'information_disclosure': [],
        'cookie_issues': [],
    }

    # Required security headers
    security_headers = {
        'Strict-Transport-Security': 'HSTS not set - site vulnerable to SSL stripping',
        'X-Frame-Options': 'Clickjacking protection missing',
        'X-Content-Type-Options': 'MIME sniffing protection missing',
        'Content-Security-Policy': 'CSP not implemented - XSS risk higher',
        'Referrer-Policy': 'Referrer leakage possible',
    }

    for header, description in security_headers.items():
        if header not in headers:
            findings['missing_headers'].append({
                'header': header,
                'risk': description
            })
        else:
            findings['present_headers'].append({
                'header': header,
                'value': headers[header]
            })

    # Information disclosure
    disclosure_headers = ['Server', 'X-Powered-By', 'X-AspNet-Version']
    for header in disclosure_headers:
        if header in headers:
            findings['information_disclosure'].append({
                'header': header,
                'value': headers[header],
                'risk': 'Version information may help attackers'
            })

    # Cookie analysis
    if 'Set-Cookie' in headers:
        cookies = headers.get('Set-Cookie', '')

        if 'HttpOnly' not in cookies:
            findings['cookie_issues'].append('HttpOnly flag missing - XSS can steal cookies')

        if 'Secure' not in cookies:
            findings['cookie_issues'].append('Secure flag missing - cookies sent over HTTP')

        if 'SameSite' not in cookies:
            findings['cookie_issues'].append('SameSite not set - CSRF risk')

    return findings


def print_report(findings: dict) -> None:
    """
    Print color-coded security report.
    """
    print(f"\n{'='*60}")
    print(f"Security Header Analysis: {findings['url']}")
    print(f"{'='*60}\n")

    # Missing headers (RED)
    if findings['missing_headers']:
        print(f"[!] MISSING SECURITY HEADERS ({len(findings['missing_headers'])}):")
        for item in findings['missing_headers']:
            print(f"  ❌ {item['header']}: {item['risk']}")
        print()

    # Present headers (GREEN)
    if findings['present_headers']:
        print(f"[✓] SECURITY HEADERS PRESENT ({len(findings['present_headers'])}):")
        for item in findings['present_headers']:
            print(f"  ✅ {item['header']}: {item['value'][:50]}")
        print()

    # Information disclosure (YELLOW)
    if findings['information_disclosure']:
        print(f"[!] INFORMATION DISCLOSURE ({len(findings['information_disclosure'])}):")
        for item in findings['information_disclosure']:
            print(f"  ⚠️  {item['header']}: {item['value']}")
        print()

    # Cookie issues (RED)
    if findings['cookie_issues']:
        print(f"[!] COOKIE SECURITY ISSUES:")
        for issue in findings['cookie_issues']:
            print(f"  ❌ {issue}")
        print()

    # Security score
    total_checks = len(security_headers)
    passed = len(findings['present_headers'])
    score = (passed / total_checks) * 100

    print(f"{'='*60}")
    print(f"Security Score: {score:.1f}% ({passed}/{total_checks} headers)")
    print(f"{'='*60}\n")


# Usage
findings = analyze_security_headers('https://example.com')
print_report(findings)

Part 4: Web Login Bruteforcer (30 minutes)

Objective: Build a form-based authentication bruteforcer (for authorized testing only).

Requirements:

Create loginbrute.py that:
- Accepts target login URL, username, and password wordlist
- Extracts form fields automatically (username, password, submit)
- Handles CSRF tokens (if present)
- Tests each password from wordlist
- Detects successful login (status code, response size, redirect)
- Implements rate limiting and backoff on failures
- Stops on successful authentication
Test against DVWA login form (setup in local VM)

⚠️ CRITICAL: Only test on systems you own or have explicit written permission to test.

Success Criteria:

Successfully logs in with correct credentials from wordlist
Handles CSRF token extraction and submission
Rate limiting prevents account lockout
Detects login success based on response analysis

Hint: Form Parsing and CSRF Handling

from bs4 import BeautifulSoup
import requests

def extract_form_fields(url: str, form_id: str = None) -> dict:
    """
    Extract form fields from login page.
    """
    session = requests.Session()
    response = session.get(url)
    soup = BeautifulSoup(response.content, 'lxml')

    # Find login form
    if form_id:
        form = soup.find('form', id=form_id)
    else:
        form = soup.find('form')  # First form

    if not form:
        return None

    fields = {}

    # Extract all input fields
    for input_tag in form.find_all('input'):
        name = input_tag.get('name')
        value = input_tag.get('value', '')
        input_type = input_tag.get('type', 'text')

        if name:
            fields[name] = value

    return {
        'session': session,
        'action': form.get('action'),
        'method': form.get('method', 'post').upper(),
        'fields': fields
    }


def bruteforce_login(url: str, username: str, passwords: list) -> None:
    """
    Bruteforce login form.
    """
    # Get form structure
    form_data = extract_form_fields(url)

    if not form_data:
        print("[!] Could not find login form")
        return

    session = form_data['session']
    fields = form_data['fields']
    action = form_data['action']

    # Determine username and password field names
    # (Usually 'username'/'user' and 'password'/'pass')
    username_field = None
    password_field = None

    for field_name in fields.keys():
        if 'user' in field_name.lower():
            username_field = field_name
        elif 'pass' in field_name.lower():
            password_field = field_name

    if not username_field or not password_field:
        print("[!] Could not identify username/password fields")
        print(f"[*] Available fields: {list(fields.keys())}")
        return

    print(f"[*] Username field: {username_field}")
    print(f"[*] Password field: {password_field}")
    print(f"[*] Testing {len(passwords)} passwords...\n")

    # Build full action URL
    if action.startswith('http'):
        submit_url = action
    elif action.startswith('/'):
        from urllib.parse import urlparse
        parsed = urlparse(url)
        submit_url = f"{parsed.scheme}://{parsed.netloc}{action}"
    else:
        submit_url = url

    # Try each password
    for password in passwords:
        # Update password field
        fields[username_field] = username
        fields[password_field] = password

        # Re-extract CSRF token if needed
        response = session.get(url)
        soup = BeautifulSoup(response.content, 'lxml')
        csrf_input = soup.find('input', {'name': 'csrf_token'})
        if csrf_input:
            fields['csrf_token'] = csrf_input.get('value')

        # Submit form
        try:
            response = session.post(submit_url, data=fields, timeout=10)

            # Check for successful login
            # (Customize based on application behavior)
            if response.status_code == 302:  # Redirect
                print(f"[✓] SUCCESS! Password: {password}")
                print(f"[*] Redirect to: {response.headers.get('Location')}")
                break
            elif 'welcome' in response.text.lower() or 'dashboard' in response.text.lower():
                print(f"[✓] SUCCESS! Password: {password}")
                break
            elif 'incorrect' in response.text.lower() or 'invalid' in response.text.lower():
                print(f"[-] Failed: {password}")
            else:
                print(f"[?] Unknown response for: {password} (status: {response.status_code})")

            # Rate limiting
            time.sleep(0.5)  # 500ms between attempts

        except Exception as e:
            print(f"[!] Error with password '{password}': {e}")


# Usage
passwords = ['password', '123456', 'admin', 'letmein', 'welcome']
bruteforce_login('http://localhost/dvwa/login.php', 'admin', passwords)

🎯 Lab Complete! You've built professional web security testing tools. These techniques are used daily by penetration testers, bug bounty hunters, and security researchers. Always remember: authorization is required before testing any system you don't own.

📤 Deliverables:

dirbrute.py - Directory bruteforcing tool
subenum.py - Subdomain enumeration tool
header_analyzer.py - Security header analyzer
loginbrute.py - Form-based login bruteforcer
Sample outputs/reports from each tool
Screenshots of tools in action

Additional Resources

Python Libraries

Security Testing Resources

Practice Targets (Authorized)

Further Learning

Black Hat Python, 2nd Edition - Chapter 5 (Web Hacking)
SANS SEC573 - Modules 2-3 (Web Automation)
Web Scraping with Python (freeCodeCamp)

Key Takeaways

✅ HTTP is a request-response protocol with headers, methods, and status codes
✅ requests library simplifies HTTP interactions vs raw sockets
✅ Sessions maintain cookies and connection pooling across requests
✅ BeautifulSoup parses HTML for web scraping and OSINT
✅ Security APIs (VirusTotal, Shodan) provide threat intelligence programmatically
✅ Directory bruteforcing discovers hidden web resources
✅ Subdomain enumeration expands attack surface mapping
✅ HTTP header analysis reveals security misconfigurations
✅ Form bruteforcing tests authentication strength
⚠️ Always obtain authorization before testing web applications

Week 06 Quiz

Test your understanding of HTTP and web interactions in Python.

Format: 10 multiple-choice questions. Passing score: 70%. Time: Untimed.

Take Quiz