CSY203 Week 02 - Week Content

Opening Framing: The Foundation of Testing

Before exploiting vulnerabilities, you must find them. Before finding vulnerabilities, you must understand the application. Information gathering and mapping is where professional testing begins—and where it differs most from amateur attempts.

A rushed tester jumps straight to SQL injection payloads. A professional tester first maps every endpoint, identifies every parameter, understands the technology stack, and discovers hidden functionality. This systematic approach finds vulnerabilities that automated scanners miss.

This week covers passive and active reconnaissance, application mapping, content discovery, and technology fingerprinting— building a complete picture of your target.

Key insight: The quality of your reconnaissance determines the quality of your findings.

1) Passive Reconnaissance

Gathering information without touching the target:

Passive Recon Sources:

Search Engines:
- Google dorking
- Bing, DuckDuckGo
- Cached pages
- Indexed files

Domain Information:
- WHOIS records
- DNS records (A, CNAME, MX, TXT)
- Certificate Transparency logs
- Historical DNS (SecurityTrails)

Code Repositories:
- GitHub (organization accounts)
- GitLab, Bitbucket
- Leaked credentials in commits
- API keys, secrets

Archive Services:
- Wayback Machine
- Archive.today
- Historical versions
- Removed content

Job Postings:
- Technology stack hints
- Security tools in use
- Infrastructure details

Google Dorking for Web Apps:

# Find login pages
site:target.com inurl:login
site:target.com inurl:admin
site:target.com intitle:"login"

# Find exposed files
site:target.com filetype:pdf
site:target.com filetype:xlsx
site:target.com filetype:sql
site:target.com filetype:log
site:target.com filetype:env

# Find configuration files
site:target.com filetype:xml
site:target.com filetype:conf
site:target.com filetype:config

# Find backup files
site:target.com filetype:bak
site:target.com filetype:old
site:target.com inurl:backup

# Find error messages
site:target.com "sql syntax"
site:target.com "mysql_fetch"
site:target.com "warning" "error"
site:target.com "stack trace"

# Find directories
site:target.com intitle:"index of"
site:target.com intitle:"directory listing"

# Exclude www
site:target.com -www

Certificate Transparency:

# Certificate Transparency logs reveal subdomains

# crt.sh
https://crt.sh/?q=%.target.com

# Returns all certificates issued for domain
# Reveals:
# - Subdomains (including internal!)
# - dev.target.com
# - staging.target.com
# - api-internal.target.com

# Automate with curl
curl -s "https://crt.sh/?q=%25.target.com&output=json" | jq '.[].name_value' | sort -u

# Tools:
# - Amass
# - Subfinder
# - Assetfinder

Key insight: Passive recon is undetectable and often reveals more than expected—including internal assets and forgotten systems.

2) Active Application Mapping

Systematically exploring the application:

Application Mapping Goals:

1. Enumerate all functionality
   - Every page
   - Every form
   - Every feature

2. Identify entry points
   - URL parameters
   - POST body parameters
   - Headers (cookies, auth)
   - File uploads

3. Understand data flow
   - Input → Processing → Output
   - Where does data go?
   - How is it transformed?

4. Map user roles
   - Anonymous
   - Authenticated
   - Admin
   - Different permission levels

Manual Crawling with Burp:

Systematic browsing:

1. Configure scope in Burp
   Target → Scope → Add target URL

2. Browse the application manually
   - Click every link
   - Submit every form
   - Test every feature
   - Try different user roles

3. Review Burp Site Map
   - Hierarchical view of application
   - Identify all endpoints
   - Note parameters

4. Examine each request
   - Parameters (GET, POST)
   - Cookies
   - Custom headers
   - Request body formats (JSON, XML)

5. Note interesting responses
   - Error messages
   - Different response lengths
   - Redirects
   - Set-Cookie headers

Automated Crawling:

# Burp Spider (built-in)
Target → Site map → Right-click → Spider

# Configure:
# - Crawl depth
# - Form submission
# - Scope limitations

# Limitations:
# - JavaScript-heavy apps need manual help
# - Form logic may not be followed correctly
# - Auth flows often break

# Best approach:
# 1. Manual browse with auth
# 2. Spider from authenticated state
# 3. Review and fill gaps manually

Creating Application Map:

Document your findings:

APPLICATION MAP
===============
Domain: target.com

Authentication:
├── /login (POST username, password)
├── /logout
├── /register (POST email, username, password)
├── /forgot-password (POST email)
└── /reset-password (GET token, POST new_password)

User Dashboard:
├── /dashboard
├── /profile (GET, POST - update profile)
├── /settings (GET, POST - change settings)
└── /notifications

API Endpoints:
├── /api/v1/users (GET - list, POST - create)
├── /api/v1/users/{id} (GET, PUT, DELETE)
├── /api/v1/products (GET, POST)
└── /api/v1/orders (GET, POST)

Admin (requires admin role):
├── /admin/dashboard
├── /admin/users
└── /admin/settings

Entry Points per Endpoint:
/api/v1/users/{id}
  - URL parameter: id (integer)
  - Headers: Authorization, Content-Type
  - Body (PUT): name, email, role

Key insight: Complete mapping takes time but reveals every potential vulnerability point.

3) Content Discovery

Finding hidden files and directories:

Why Content Discovery Matters:

Applications often have:
- Backup files (.bak, .old, ~)
- Configuration files (.config, .env)
- Development artifacts (.git, .svn)
- Admin interfaces
- API documentation
- Debug endpoints
- Forgotten functionality

These are often:
- Not linked from anywhere
- Not protected
- Contain sensitive information

Directory Brute Forcing:

# Gobuster
gobuster dir -u https://target.com -w /usr/share/wordlists/dirb/common.txt
gobuster dir -u https://target.com -w /usr/share/seclists/Discovery/Web-Content/raft-medium-directories.txt

# With extensions
gobuster dir -u https://target.com -w wordlist.txt -x php,asp,aspx,jsp,html,js

# With cookies (authenticated)
gobuster dir -u https://target.com -w wordlist.txt -c "session=abc123"

# Ffuf (faster)
ffuf -u https://target.com/FUZZ -w wordlist.txt
ffuf -u https://target.com/FUZZ -w wordlist.txt -e .php,.html,.txt

# With filtering
ffuf -u https://target.com/FUZZ -w wordlist.txt -fc 404  # Filter 404s
ffuf -u https://target.com/FUZZ -w wordlist.txt -fs 1234 # Filter by size

# Feroxbuster (recursive)
feroxbuster -u https://target.com -w wordlist.txt

Wordlist Selection:

# SecLists - essential wordlists
/usr/share/seclists/Discovery/Web-Content/

Common choices:
- common.txt (quick scan)
- raft-medium-directories.txt
- raft-large-directories.txt
- directory-list-2.3-medium.txt

Technology-specific:
- /Discovery/Web-Content/CMS/
- /Discovery/Web-Content/api/
- /Discovery/Web-Content/CGIs.txt

Custom wordlist generation:
# CeWL - scrape words from target
cewl https://target.com -d 2 -m 5 -w custom.txt

# Add common patterns
admin, backup, config, debug, dev, test, staging

Finding Sensitive Files:

# Common sensitive files to check:

Configuration:
/.env
/config.php
/wp-config.php
/web.config
/application.yml
/.htaccess

Version Control:
/.git/HEAD
/.git/config
/.svn/entries
/.hg/

Backup Files:
/backup.sql
/database.sql
/site.zip
/backup.tar.gz
/*.bak

Development:
/phpinfo.php
/info.php
/test.php
/debug
/.DS_Store
/Thumbs.db

Documentation:
/swagger.json
/api-docs
/openapi.yaml
/README.md
/CHANGELOG.md

Server Status:
/server-status
/server-info
/.well-known/

Git Repository Exposure:

# Check for exposed .git
curl https://target.com/.git/HEAD

# If accessible, dump entire repo:
# git-dumper
git-dumper https://target.com/.git/ ./git-dump

# Or manually:
wget --mirror -I .git https://target.com/.git/

# Then:
cd git-dump
git checkout -- .
git log
git show [commit]

# Often contains:
# - Source code
# - Credentials
# - Configuration
# - Development history

Key insight: Hidden content often contains the most critical vulnerabilities—backup files with credentials, exposed git repos with secrets.

4) Technology Fingerprinting

Identifying the application's technology stack:

Why Fingerprinting Matters:

Knowing the stack reveals:
- Known CVEs for specific versions
- Default credentials
- Common misconfigurations
- Attack techniques that apply

Technology Stack Components:
- Web server (Apache, Nginx, IIS)
- Programming language (PHP, Java, Python, .NET)
- Framework (Laravel, Spring, Django, Express)
- CMS (WordPress, Drupal, Joomla)
- Frontend framework (React, Angular, Vue)
- Database (MySQL, PostgreSQL, MongoDB)
- WAF (Cloudflare, AWS WAF, ModSecurity)

Fingerprinting Methods:

# HTTP Headers
curl -I https://target.com

Server: nginx/1.19.0
X-Powered-By: PHP/7.4.3
X-AspNet-Version: 4.0.30319

# Cookies
PHPSESSID → PHP
JSESSIONID → Java
ASP.NET_SessionId → .NET
connect.sid → Node.js/Express

# File Extensions
.php → PHP
.asp/.aspx → ASP.NET
.jsp → Java
.py (rare) → Python

# Response Patterns
- Error messages (framework-specific)
- Default pages
- URL structures (/wp-admin → WordPress)

Automated Fingerprinting:

# Wappalyzer (browser extension)
# Shows technologies as you browse

# WhatWeb (command line)
whatweb https://target.com
whatweb -v https://target.com  # Verbose

# Webanalyze
webanalyze -host https://target.com

# Nmap HTTP scripts
nmap -sV -p 80,443 --script=http-headers,http-server-header target.com

# Nuclei technology detection
nuclei -u https://target.com -t technologies/

CMS-Specific Enumeration:

# WordPress
wpscan --url https://target.com
wpscan --url https://target.com --enumerate u  # Users
wpscan --url https://target.com --enumerate p  # Plugins
wpscan --url https://target.com --enumerate t  # Themes

# Manual WordPress checks
/wp-admin/
/wp-login.php
/wp-content/plugins/
/wp-content/themes/
/xmlrpc.php
/wp-json/wp/v2/users

# Drupal
droopescan scan drupal -u https://target.com

# Joomla
joomscan --url https://target.com

# Generic
cmseek -u https://target.com

WAF Detection:

# Detecting WAF presence

# wafw00f
wafw00f https://target.com

# Manual detection
# Send obviously malicious request:
curl "https://target.com/?id=1' OR '1'='1"

# WAF indicators:
# - 403 Forbidden
# - Custom error pages
# - Different response headers
# - Request blocked message

# Common WAFs:
# - Cloudflare (cf-ray header)
# - AWS WAF
# - Akamai
# - ModSecurity
# - Imperva

# Why it matters:
# - Need to craft bypass payloads
# - Different WAFs have different weaknesses

Key insight: Technology stack knowledge focuses your testing. PHP apps need different tests than Java apps.

5) Identifying Entry Points

Entry points are where attackers inject malicious input:

Entry Point Categories:

URL Parameters:
https://target.com/page?id=123&action=view
- id, action are parameters
- Test each for injection

POST Body:
username=admin&password=secret
- Form submissions
- JSON/XML payloads

HTTP Headers:
- Cookie: session=abc
- Authorization: Bearer xyz
- User-Agent
- Referer
- X-Forwarded-For
- Custom headers

File Uploads:
- Filename
- File content
- MIME type

Path Parameters:
/api/users/123/orders/456
- 123 (user ID)
- 456 (order ID)

Documenting Entry Points:

Entry Point Documentation:

ENDPOINT: POST /api/users
Entry Points:
┌─────────────────┬────────────┬──────────────────────┐
│ Parameter       │ Location   │ Type                 │
├─────────────────┼────────────┼──────────────────────┤
│ name            │ Body       │ String               │
│ email           │ Body       │ String (email format)│
│ role            │ Body       │ String (enum?)       │
│ Authorization   │ Header     │ Bearer token         │
│ Content-Type    │ Header     │ application/json     │
└─────────────────┴────────────┴──────────────────────┘

Testing Priority:
1. email - SQL injection, format validation
2. role - Privilege escalation
3. name - XSS, length limits
4. Authorization - Token validation

Using Burp to Identify Parameters:

# Burp Proxy → HTTP History

For each request, examine:
1. URL parameters (visible in URL)
2. Body parameters (view in Raw or Params tab)
3. Cookies (Cookie header)
4. Other headers

# Burp feature: Engagement Tools → Find Parameters
# Lists all unique parameters discovered

# Param Miner extension
# Discovers hidden parameters:
# - Guesses common parameter names
# - Detects parameters that change behavior
# - Finds headers that affect application

Right-click request → Extensions → Param Miner → Guess params

Hidden Parameter Discovery:

# Arjun - parameter discovery
arjun -u https://target.com/page
arjun -u https://target.com/page -m POST

# Common hidden parameters:
debug=true
test=1
admin=1
source=1
id, user_id, account_id
role, privilege, permission
redirect, next, return_url
callback, jsonp
_method (method override)
page, limit, offset
sort, order
format (json, xml)
version, v, api_version

Key insight: Every entry point is a potential vulnerability. Missing one means missing potential findings.

Real-World Context: Recon in Bug Bounty

How reconnaissance differentiates successful hunters:

Surface Area Competition: In bug bounty, thousands of researchers test the same applications. Low-hanging fruit on main domains is found quickly. Success comes from finding assets others miss—subdomains, legacy systems, APIs.

Automation vs. Manual: The best hunters combine automated tools with manual analysis. Tools find breadth; humans find depth. Automated subdomain enumeration plus manual review of each discovered asset.

Continuous Monitoring: Top hunters monitor targets continuously. New subdomains, new functionality, new versions—each is an opportunity before others notice.

MITRE ATT&CK Mapping:

T1595 - Active Scanning: Content discovery, fingerprinting
T1592 - Gather Victim Host Information: Technology identification
T1589 - Gather Victim Identity Information: User enumeration

Key insight: Exceptional recon is often the difference between finding critical vulnerabilities and finding nothing.

Guided Lab: Comprehensive Reconnaissance

Perform complete reconnaissance of a target application.

Step 1: Passive Reconnaissance

# Choose target (your lab app or authorized target)

# Google dorking
site:target.com filetype:pdf
site:target.com inurl:admin
site:target.com "error"

# Certificate transparency
curl -s "https://crt.sh/?q=%25.target.com&output=json" | jq '.[].name_value' | sort -u

# Wayback Machine
https://web.archive.org/web/*/target.com/*

Step 2: Application Mapping

# Configure Burp scope
# Browse entire application manually
# Click every link, submit every form

# Review Target → Site Map
# Document all endpoints found

# Create application map diagram

Step 3: Content Discovery

# Directory brute forcing
ffuf -u https://target.com/FUZZ -w /usr/share/seclists/Discovery/Web-Content/common.txt

# Check for sensitive files
curl https://target.com/.git/HEAD
curl https://target.com/.env
curl https://target.com/robots.txt
curl https://target.com/sitemap.xml

Step 4: Technology Fingerprinting

# Automated fingerprinting
whatweb https://target.com

# Manual analysis
curl -I https://target.com
# Note Server, X-Powered-By, cookies

# WAF detection
wafw00f https://target.com

Step 5: Entry Point Documentation

# For each endpoint found:
# - List all parameters
# - Note parameter types
# - Identify testing priorities

# Use Burp → Target → Site Map → select endpoint → view params

Reflection (mandatory)

What did you discover that wasn't obvious from normal browsing?
Which content discovery technique found the most interesting results?
How would knowing the technology stack change your testing approach?
What entry points look most promising for vulnerability testing?

Week 02 Quiz

Test your understanding of Information Gathering and Application Mapping.

Format: 10 multiple-choice questions. Passing score: 70%. Time: Untimed.

Take Quiz

Week 2 Outcome Check

By the end of this week, you should be able to:

Perform passive reconnaissance without touching the target
Systematically map an application using Burp Suite
Discover hidden content using directory brute forcing
Identify technology stacks through fingerprinting
Document all entry points for testing
Create comprehensive application documentation

Next week: Authentication Vulnerabilities—attacking the identity verification systems.

🎯 Hands-On Labs (Free & Essential)

Apply what you learned through practical reconnaissance and information gathering exercises. Complete these labs before moving to reading resources.

🎮 TryHackMe: Passive Reconnaissance

What you'll do: Learn passive information gathering techniques that don't directly interact with the target. Practice WHOIS lookups, DNS enumeration, search engine reconnaissance, and social media intelligence gathering.

Why it matters: Passive recon is stealthy and legal—you're only viewing public information. Master these techniques to gather intelligence without alerting the target or triggering IDS/IPS.
Time estimate: 1.5-2 hours

Start TryHackMe Passive Recon →

🎮 TryHackMe: Active Reconnaissance

What you'll do: Learn active information gathering through direct interaction with the target. Practice port scanning with Nmap, service enumeration, banner grabbing, and vulnerability scanning.

Why it matters: Active recon reveals the attack surface—open ports, running services, software versions. This intel drives your entire testing strategy and identifies initial entry points.
Time estimate: 2-3 hours

Start TryHackMe Active Recon →

🎮 TryHackMe: Content Discovery

What you'll do: Learn techniques to discover hidden web content—directories, files, subdomains, and parameters. Practice with tools like dirb, gobuster, and ffuf.

Why it matters: Hidden admin panels, backup files, and undocumented APIs are goldmines for vulnerability discovery. Content discovery often finds the most critical vulnerabilities.
Time estimate: 1.5-2 hours

Start TryHackMe Content Discovery →

💡 Lab Strategy: Start with Passive Recon (safe, legal, stealthy), then Active Recon (direct interaction), finally Content Discovery (finding hidden attack surface). This progression mirrors real-world pentesting methodology: 500 total XP, 5-7 hours of reconnaissance mastery!

🛡️ Defensive Architecture & Secure Design Patterns

Recon shows attackers how to map your application. Defensive design removes what they can see and limits what they can learn.

Attack Surface Reduction

Every exposed endpoint is a potential risk. Minimize public-facing functionality and harden what must remain.

Attack surface reduction checklist:
- Remove debug routes, sample apps, and test endpoints
- Disable directory listing and verbose server banners
- Keep configs, backups, and secrets out of web roots
- Restrict admin paths to VPN, allowlists, or SSO
- Separate dev/staging from production
- Maintain a living asset inventory

Security Headers as Defensive Baseline

Headers reduce information disclosure and limit browser abuse before an attacker reaches the application logic.

Baseline headers:
Content-Security-Policy: default-src 'self'
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Referrer-Policy: no-referrer
Permissions-Policy: geolocation=()
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

Real-World Breach: Uber 2016 (Exposed Credentials)

Attackers found hard-coded AWS credentials in a public GitHub repo, leading to access of internal systems and S3 data. Lessons learned: secret scanning, least-privilege access, and rapid key rotation prevent small leaks from becoming breaches.

Defensive Labs

Lab: Implement Security Headers Baseline

Configure headers in a test app or web server and verify with `curl -I` and SecurityHeaders.com. Document before/after.

Lab: Create an Attack Surface Inventory

Build a complete endpoint inventory, classify each by exposure level, and propose hardening actions (remove, restrict, or monitor).

📚 Building on CSY101 Week-13: Use threat modeling to identify recon-driven abuse cases. CSY101 Week-14: Map controls to CIS Controls and NIST 800-53. CSY104 Week-11: Use CVSS scoring to prioritize remediation of exposed endpoints.

Reading Resources (Free + Authoritative)

Complete the required resources to build your foundation.

PortSwigger - Information Disclosure · 45-60 min · 50 XP · Resource ID: csy203_w2_r1 (Required)
SecLists - Security Wordlists · 30-45 min · 50 XP · Resource ID: csy203_w2_r2 (Required)
HackTricks - Web Methodology · Reference · 25 XP · Resource ID: csy203_w2_r3 (Optional)

Lab: Full Reconnaissance Assessment

Goal: Produce comprehensive reconnaissance documentation for a target application.

Part 1: Passive Reconnaissance

Perform Google dorking (document 10+ queries)
Check certificate transparency
Search Wayback Machine
Look for code repositories
Document all findings

Part 2: Application Mapping

Manually browse application through Burp
Create visual site map
Identify all user roles and functionality
Document authentication flow

Part 3: Content Discovery

Run directory brute forcing with 2+ wordlists
Check for sensitive files (list of 20+)
Test for exposed version control
Document all discovered content

Part 4: Technology Analysis

Fingerprint web server
Identify programming language/framework
Check for CMS
Detect WAF presence
Research known vulnerabilities for versions found

Part 5: Entry Point Inventory

Document all endpoints
List parameters for each endpoint
Categorize by input type
Prioritize for vulnerability testing

Deliverable (submit):

Passive recon findings document
Application map diagram
Content discovery results
Technology stack analysis
Complete entry point inventory
Testing priority recommendations

Checkpoint Questions

What is the difference between passive and active reconnaissance?
How can certificate transparency reveal subdomains?
What tool would you use for directory brute forcing?
Why is exposed .git directory dangerous?
How do you identify a PHP application from HTTP headers?
What are three categories of entry points in web applications?

Weekly Reflection

Reflection Prompt (200-300 words):

This week you learned systematic reconnaissance—the foundation of professional web application testing. You discovered hidden content, identified technologies, and mapped attack surfaces.

Reflect on these questions:

How does thorough reconnaissance change the testing process compared to immediately trying attacks?
What surprised you about what's discoverable through passive reconnaissance alone?
If you were a developer, what would you do differently after seeing how easily hidden content can be found?
How would you explain the importance of reconnaissance to someone who thinks security testing is just "running scanners"?

A strong reflection will connect reconnaissance methodology to real-world testing effectiveness and defensive implications.

Verified Resources & Videos

Subdomain Enumeration: OWASP Amass
Directory Fuzzing: ffuf - Fast Web Fuzzer
Fingerprinting: Wappalyzer

Reconnaissance is where professional testing begins. The documentation and mapping skills you develop now form the foundation for all vulnerability testing. A well-documented attack surface leads to systematic, comprehensive testing. Next week: authentication vulnerabilities—attacking the front door.