The Black Box Problem: In traditional software security, you might have access to
source code or at least a standardized operating system environment. In IoT and embedded systems,
you often start with a "black box"—a proprietary device with no documentation, custom hardware, and
a single binary blob representing the entire firmware image.
Why This Matters: Firmware Reverse Engineering (RE) is the art of turning this
binary blob back into understanding. It is the primary method for discovering vulnerabilities in IoT
devices, as vendors rarely publish source code. By analyzing firmware, you can uncover hardcoded
credentials, weak encryption keys, insecure API endpoints, and memory corruption vulnerabilities
that would be invisible from the network perspective.
Real-World Relevance: The Mirai Botnet (2016) exploited simple hardcoded
credentials found in firmware. More recently, the Ripple20 vulnerabilities (2020) in the
Treck TCP/IP stack were discovered by reverse-engineering the networking library used in millions of
devices, from infusion pumps to printers.
Week Learning Outcomes:
Analyze firmware structure to identify bootloaders, kernels, and
filesystems.
Extract filesystems from monolithic binary blobs using Binwalk and custom
scripts.
Disassemble ARM and MIPS binaries to understand control flow and logic.
Identify vulnerability patterns (strcpy, command injection) in assembly.
Analyzing firmware for version numbers and hardware details.
T1212: Exploitation for Defense Evasion
Finding flaws to bypass secure boot or authentication.
1) Firmware Anatomy & Extraction
Before you can analyze code, you must unpack the package. Firmware is rarely a single executable; it
is usually a complex image containing a bootloader, a kernel, and a root filesystem, all packed
together.
TYPICAL EMBEDDED LINUX FLASH LAYOUT:
+----------------+ <--- 0x00000000
| Bootloader | (e.g., U-Boot)
| (u-boot) | Initializes hardware, loads kernel
+----------------+
| U-Boot Env | Config variables (bootargs, IPs)
+----------------+
| Kernel | (e.g., uImage)
| (Linux) | The OS kernel, usually compressed (LZMA/GZIP)
+----------------+
| Filesystem | (e.g., SquashFS, JFFS2, UBI)
| (RootFS) | Contains /bin, /etc, /var, web server, binaries
+----------------+ <--- End of Flash
Common Components
1. Bootloaders (The Gatekeepers)
The bootloader is the first code to run. U-Boot is the industry standard.
Function: Initializes RAM, Serial (UART), and Network, then loads the kernel.
Security Critical: Often contains a "console" mode that allows attackers to
dump memory or bypass passwords.
Signatures: Look for strings like U-Boot 2020.01 or header magic
bytes 27 05 19 56 (uImage).
2. Filesystems (The Data)
Embedded systems use specialized read-only or compressed filesystems to save space and wear.
Filesystem
Description
Characteristics
Extraction Tool
SquashFS
Compressed Read-Only FS
Standard for Linux firmware. Highly compressed.
unsquashfs / binwalk
JFFS2
Journaling Flash FS
Read/Write, wear-leveling. Older devices.
jefferson
UBifs
Unsorted Block Image
Modern successor to JFFS2. Handles raw flash.
ubi_reader
CramFS
Compressed ROM FS
Older, simple, read-only.
cramfsck
Extraction Techniques (Binwalk)
Binwalk is the de facto standard tool for analyzing and extracting firmware images.
It searches the binary for "magic signatures" (headers) of known file types.
If binwalk returns nothing, calculate the entropy
(binwalk -E firmware.bin). A flat line at 1.0 (high entropy) indicates the firmware
is encrypted. Standard extraction will fail; you must find the decryption
mechanism (often in the bootloader or a previous update).
2) Introduction to Ghidra
Ghidra is a software reverse engineering (SRE) suite developed by the NSA. Unlike a
simple disassembler, it includes a powerful decompiler that attempts to reconstruct C code
from assembly, making analysis significantly faster.
The Critical Step: Loading the Binary
In desktop software (EXE/ELF), the file format tells the OS where to load code. In embedded
firmware, you are often dealing with a raw binary blob. You must tell Ghidra where
to put it in memory, or all the addresses—and thus all the jumps and function calls—will be wrong.
Finding the Base Address:
Looking at the strings can hint at the base address. If you see many pointers that look like
0x80001234, the base address is likely 0x80000000.
GHIDRA PROJECT WORKFLOW:
┌─────────────────────────────────────────────────────────────┐
│ 1. NEW PROJECT │
│ File -> New Project -> Non-Shared Project │
│ │
│ 2. IMPORT FILE │
│ - Drag & Drop binary │
│ - Format: Raw Binary (for firmware blobs) │
│ - Language: ARM Cortex Little Endian (typical) │
│ - Options: Click "Options..." │
│ * Base Address: 0x00000000 (Check datasheets!) │
│ * Block Name: RAM_FLASH │
│ │
│ 3. ANALYZE │
│ - Double click the file to open CodeBrowser │
│ - "Analyze" -> Select default options │
│ - Wait for bottom-right progress bar to finish! │
└─────────────────────────────────────────────────────────────┘
The Interface
1. Program Trees (Left Top)
Shows how the binary is organized in memory (sections like .text, .data, .bss).
2. Symbol Tree (Left Middle)
List of identified functions, labels, and imports. This is your navigation
menu.
3. Listing View (Center)
The raw assembly (Disassembly). Shows memory addresses, opcodes, and
instructions.
4. Decompiler (Right)
The "Magic". Pseudo-C representation of the current function. RENaming
variables here updates the Listing view.
Common Shortcuts
Key
Action
L
Rename Label/Variable (use this constantly!)
;
Add Comment (Pre/Post/EOL)
T
Set Data Type (transform undefined bytes to structs)
Ctrl+L
Retype Variable (in Decompiler)
3) Analyzing Firmware Binaries
Analyzing a binary without symbols is like navigating a city without street signs. You have to use
landmarks to orient yourself.
Strategy 1: The "String Reference" Trick
The fastest way to find interesting code is to follow the strings.
Open Window -> Defined Strings.
Filter for interesting keywords: "Password", "Admin", "Login", "Error", "Success".
Double-click a string to go to its location in memory (e.g., in the .rodata
section).
Right-click -> References -> Show References to Address.
This will jump you to the code that uses that string.
Logic: If you find the function that uses the string "Password Incorrect", you have
likely found the authentication routine. The condition before that print statement is the
password check.
Strategy 2: Finding "main" in Startup Code
In bare-metal firmware, there is no "main" symbol. The processor just starts executing at the Reset
Vector.
Look for the Loop: Startup code usually initializes hardware (GPIO, Clocks) and
then calls a large function before entering an infinite loop. That large function is your
main().
Look for libc init: If the firmware uses standard C libraries, look for calls
to __libc_start_main or setup of argc / argv.
Strategy 3: Identifying Library Functions
Since symbols are stripped, you won't see strcpy or printf by name. You
have to recognize them by behavior.
Function Behavior
Likely Identity
Takes 2 arguments. Copies bytes until 0x00.
strcpy (Dangerous!)
Takes 3 arguments. Copies N bytes.
memcpy or strncpy
Takes 2 arguments. Returns 0 if equal.
strcmp
Takes format string ("%s", "%d") and varargs.
printf / sprintf
4) Finding Vulnerabilities
Deep in the assembly, high-level logic flaws look like specific patterns of instructions.
Case Study: The Stack Buffer Overflow
The most common vulnerability in older firmware is the stack overflow due to `strcpy`.
Vendors are aware that researchers reverse their products. They employ techniques to make analysis
Difficult, Annoying, or Impossible.
1. Symbol Stripping
Compilers generate "symbols" (names of functions and variables) to help debuggers. Production
firmware is usually "stripped" (strip --strip-all binary).
Impact: Instead of check_password(), you see
FUN_000105a4().
Counter-Tactic: Look for debug strings ("Password incorrect") and trace where
they are used (XREFs).
2. Logic Obfuscation
Spaghetti Code: Compilers can be configured to insert useless jumps, dead code, and
convoluted control flow specifically to confuse decompilers.
3. Firmware Encryption
The firmware update file is encrypted (AES) and only decrypted by the bootloader in RAM.
Bypassing Encryption:
If you can't find the key in an older, unencrypted firmware version, you may need to perform a
hardware attack (Side Channel or UART glitching) to dump the decrypted RAM while the device is
running.
4. Secure Boot (Chain of Trust)
The CPU holds the public Key. The bootloader is signed. The Kernel is signed.
If you modify a single byte of the firmware (e.g., to add a backdoor or enable a root shell), the
signature check fails, and the device refuses to boot.
SECURE BOOT FLOW:
[CPU ROM] ensures [Bootloader] is signed.
↓
[Bootloader] ensures [Kernel] is signed.
↓
[Kernel] ensures [Filesystem] is valid.
* Attack Surface: If the signature verification code itself has a bug (e.g., buffer overflow in the signature parser), you can bypass the chain.
8) Advanced Concept: Hardware Extraction
Sometimes the firmware isn't available online. You have to go get it yourself, directly from the
chip.
Method A: The Serial Console (UART)
Most verified devices have a "debug port" left on the motherboard by developers. It speaks
UART (Universal Asynchronous Receiver-Transmitter).
Hardware Connection
[ PCB Board ] [ USB-to-TTL Adapter ]
TX -------------> RX
RX <------------- TX GND <------------- GND WARNING: Do NOT connect VCC unless your adapter
logic level matches the board (3.3V vs 5V).
The Attack: Connect via screen/minicom screen /dev/ttyUSB0 115200.
interrupt the boot process to get a U-Boot console, then use `md` (memory dump) to stream the
flash content out over the serial cable.
Method B: SPI Flash Dumping
If the console is locked, you can talk directly to the storage chip (SPI Flash).
Risk: You must power the chip correctly (usually 3.3V) without powering the
main CPU, or they will fight over the bus.
Guided Lab: Firmware Analysis Workflow
Objective: Perform a complete analysis cycle: Extract firmware, analyze filesystem,
and reverse engineer a vulnerable binary.
Required Tools: Linux VM (Kali/Ubuntu), Binwalk, Ghidra, Strings.
Scenario: You have obtained a firmware image router_firmware.bin from a
vendor's support site.
Part 1: Extraction & Enumeration (20 min)
Identify the file: $ binwalk router_firmware.bin Result: Detected SquashFS filesystem at offset 0x120000.
Extract contents: $ binwalk -eM router_firmware.bin Result: Created `_router_firmware.bin.extracted` directory.
Explore the filesystem: $ cd _router_firmware.bin.extracted/squashfs-root $ ls -R etc/ Task: Look for passwd, shadow, or configuration files
with credentials.
XP REWARD: +150 XP (Extraction Expert)
Part 2: String Analysis (15 min)
Before loading Ghidra, look for low-hanging fruit.
Identify dangerous binaries: $ cd bin $ strings login_manager | grep "%s" Context: Frequent use of %s format specifiers might suggest
sprintf usage.
Part 3: Ghidra Vulnerability Hunt (45 min)
Load `login_manager` into Ghidra: Drag binary into project. Select language (likely ARM or MIPS). Analyze.
Find the entry point: Go to Symbol Tree -> Exports -> main.
Trace the user input: Find where the username/password is read (look for recv or read).
Identify the vulnerability: Look for a strcpy(local_buffer, input_buffer).
Question: Is local_buffer smaller than the potential input? If yes,
it's a Buffer Overflow.
XP REWARD: +300 XP (Bug Hunter)
Building on Prior Knowledge
This week connects concepts from CSY101 (Linux CLI) and CSY202
(Network Protocols) to the embedded world.
From CSY203 (Web Sec)
The "Command Injection" you learned in web apps works exactly the same in
firmware C code: using system() with unsanitized input.
From CSY104 (Networking)
When you analyze the "init" scripts in firmware, you are essentially looking at
how the device sets up its Routing Table and Firewall (iptables), just like you configured
manually.
Outcome Check
Navigate Ghidra's interface and analyze binaries
Load and analyze ARM/MIPS firmware binaries
Identify common vulnerability patterns in decompiled code