Opening Framing
The Black Box Problem: In traditional software security, you might have access to source code or at least a standardized operating system environment. In IoT and embedded systems, you often start with a "black box"—a proprietary device with no documentation, custom hardware, and a single binary blob representing the entire firmware image.
Why This Matters: Firmware Reverse Engineering (RE) is the art of turning this binary blob back into understanding. It is the primary method for discovering vulnerabilities in IoT devices, as vendors rarely publish source code. By analyzing firmware, you can uncover hardcoded credentials, weak encryption keys, insecure API endpoints, and memory corruption vulnerabilities that would be invisible from the network perspective.
Real-World Relevance: The Mirai Botnet (2016) exploited simple hardcoded credentials found in firmware. More recently, the Ripple20 vulnerabilities (2020) in the Treck TCP/IP stack were discovered by reverse-engineering the networking library used in millions of devices, from infusion pumps to printers.
- Analyze firmware structure to identify bootloaders, kernels, and filesystems.
- Extract filesystems from monolithic binary blobs using Binwalk and custom scripts.
- Disassemble ARM and MIPS binaries to understand control flow and logic.
- Identify vulnerability patterns (strcpy, command injection) in assembly.
- Defeat basic anti-reverse engineering techniques (stripped symbols).
Enumerating filesystems extracted from firmware.
Analyzing firmware for version numbers and hardware details.
Finding flaws to bypass secure boot or authentication.
1) Firmware Anatomy & Extraction
Before you can analyze code, you must unpack the package. Firmware is rarely a single executable; it is usually a complex image containing a bootloader, a kernel, and a root filesystem, all packed together.
TYPICAL EMBEDDED LINUX FLASH LAYOUT:
+----------------+ <--- 0x00000000
| Bootloader | (e.g., U-Boot)
| (u-boot) | Initializes hardware, loads kernel
+----------------+
| U-Boot Env | Config variables (bootargs, IPs)
+----------------+
| Kernel | (e.g., uImage)
| (Linux) | The OS kernel, usually compressed (LZMA/GZIP)
+----------------+
| Filesystem | (e.g., SquashFS, JFFS2, UBI)
| (RootFS) | Contains /bin, /etc, /var, web server, binaries
+----------------+ <--- End of Flash
Common Components
1. Bootloaders (The Gatekeepers)
The bootloader is the first code to run. U-Boot is the industry standard.
- Function: Initializes RAM, Serial (UART), and Network, then loads the kernel.
- Security Critical: Often contains a "console" mode that allows attackers to dump memory or bypass passwords.
- Signatures: Look for strings like
U-Boot 2020.01or header magic bytes27 05 19 56(uImage).
2. Filesystems (The Data)
Embedded systems use specialized read-only or compressed filesystems to save space and wear.
| Filesystem | Description | Characteristics | Extraction Tool |
|---|---|---|---|
| SquashFS | Compressed Read-Only FS | Standard for Linux firmware. Highly compressed. | unsquashfs / binwalk |
| JFFS2 | Journaling Flash FS | Read/Write, wear-leveling. Older devices. | jefferson |
| UBifs | Unsorted Block Image | Modern successor to JFFS2. Handles raw flash. | ubi_reader |
| CramFS | Compressed ROM FS | Older, simple, read-only. | cramfsck |
Extraction Techniques (Binwalk)
Binwalk is the de facto standard tool for analyzing and extracting firmware images. It searches the binary for "magic signatures" (headers) of known file types.
If binwalk returns nothing, calculate the entropy
(binwalk -E firmware.bin). A flat line at 1.0 (high entropy) indicates the firmware
is encrypted. Standard extraction will fail; you must find the decryption
mechanism (often in the bootloader or a previous update).
2) Introduction to Ghidra
Ghidra is a software reverse engineering (SRE) suite developed by the NSA. Unlike a simple disassembler, it includes a powerful decompiler that attempts to reconstruct C code from assembly, making analysis significantly faster.
The Critical Step: Loading the Binary
In desktop software (EXE/ELF), the file format tells the OS where to load code. In embedded firmware, you are often dealing with a raw binary blob. You must tell Ghidra where to put it in memory, or all the addresses—and thus all the jumps and function calls—will be wrong.
0x80001234, the base address is likely 0x80000000.
GHIDRA PROJECT WORKFLOW:
┌─────────────────────────────────────────────────────────────┐
│ 1. NEW PROJECT │
│ File -> New Project -> Non-Shared Project │
│ │
│ 2. IMPORT FILE │
│ - Drag & Drop binary │
│ - Format: Raw Binary (for firmware blobs) │
│ - Language: ARM Cortex Little Endian (typical) │
│ - Options: Click "Options..." │
│ * Base Address: 0x00000000 (Check datasheets!) │
│ * Block Name: RAM_FLASH │
│ │
│ 3. ANALYZE │
│ - Double click the file to open CodeBrowser │
│ - "Analyze" -> Select default options │
│ - Wait for bottom-right progress bar to finish! │
└─────────────────────────────────────────────────────────────┘
The Interface
Shows how the binary is organized in memory (sections like .text, .data, .bss).
List of identified functions, labels, and imports. This is your navigation menu.
The raw assembly (Disassembly). Shows memory addresses, opcodes, and instructions.
The "Magic". Pseudo-C representation of the current function. RENaming variables here updates the Listing view.
Common Shortcuts
| Key | Action |
|---|---|
| L | Rename Label/Variable (use this constantly!) |
| ; | Add Comment (Pre/Post/EOL) |
| T | Set Data Type (transform undefined bytes to structs) |
| Ctrl+L | Retype Variable (in Decompiler) |
3) Analyzing Firmware Binaries
Analyzing a binary without symbols is like navigating a city without street signs. You have to use landmarks to orient yourself.
Strategy 1: The "String Reference" Trick
The fastest way to find interesting code is to follow the strings.
- Open Window -> Defined Strings.
- Filter for interesting keywords: "Password", "Admin", "Login", "Error", "Success".
- Double-click a string to go to its location in memory (e.g., in the
.rodatasection). - Right-click -> References -> Show References to Address.
- This will jump you to the code that uses that string.
Strategy 2: Finding "main" in Startup Code
In bare-metal firmware, there is no "main" symbol. The processor just starts executing at the Reset Vector.
- Look for the Loop: Startup code usually initializes hardware (GPIO, Clocks) and
then calls a large function before entering an infinite loop. That large function is your
main(). - Look for libc init: If the firmware uses standard C libraries, look for calls
to
__libc_start_mainor setup ofargc/argv.
Strategy 3: Identifying Library Functions
Since symbols are stripped, you won't see strcpy or printf by name. You
have to recognize them by behavior.
| Function Behavior | Likely Identity |
|---|---|
| Takes 2 arguments. Copies bytes until 0x00. | strcpy (Dangerous!) |
| Takes 3 arguments. Copies N bytes. | memcpy or strncpy |
| Takes 2 arguments. Returns 0 if equal. | strcmp |
| Takes format string ("%s", "%d") and varargs. | printf / sprintf |
4) Finding Vulnerabilities
Deep in the assembly, high-level logic flaws look like specific patterns of instructions.
Case Study: The Stack Buffer Overflow
The most common vulnerability in older firmware is the stack overflow due to `strcpy`.
void auth(char *input) {
char buf[64];
// No bounds check!
strcpy(buf, input);
}
If input is 100 bytes, `strcpy` keeps writing past the end of
`buf`. It overwrites the Saved Return Address (LR) on the stack.
Recognizing it in Assembly (ARM32)
In Ghidra, you'll see a call to a copy function with a stack buffer as the destination.
0x10408 SUB SP, SP, #0x40 ; Allocate 64 bytes (0x40) for 'buf'
0x1040c MOV R0, SP ; R0 = Destination (buf aka Stack Pointer)
0x10410 MOV R1, R4 ; R1 = Source (User Input)
0x10414 BL strcpy ; Branch Link to strcpy
; PROBLEM: strcpy doesn't know buf is only 64 bytes!
0x10418 ADD SP, SP, #0x40 ; Clean up stack
0x1041c POP {R4, PC} ; Return (Pop PC). If stack was smashed, PC is now 0x41414141.
Case Study: Command Injection
Look for snprintf or sprintf constructing a string that is later passed to
system() or popen().
sprintf(cmd_buf, "ping -c 1 %s", user_ip);
system(cmd_buf);
If user_ip is 1.2.3.4; cat /etc/shadow, the device executes:
ping -c 1 1.2.3.4; cat /etc/shadow.
5) Ghidra Scripting
Ghidra's real power comes from scripting. You can write Java or Python scripts to automate tedious tasks.
Example: XOR String Decoder
Malware authors often "obfuscate" strings by XORing them with a key byte to hide them from
strings commands.
# Decodes XOR-encoded bytes at the cursor location
# @keybinding Ctrl-Shift-X
# @category CSY304
from ghidra.program.model.data import StringDataType
def xor_decode(addr, length, key):
# Get memory at address
mem = currentProgram.getMemory()
b = mem.getBytes(addr, length)
# Decode
decoded = bytearray()
for byte in b:
decoded.append(byte ^ key)
# Print result
print("Decoded > " + "".join(map(chr, decoded)))
# Option: Patch memory with decoded bytes (for analysis)
# setBytes(addr, decoded)
# createData(addr, StringDataType())
# Usage: Put cursor on start of bytes, run script
addr = currentAddress
key = 0xAA # Example key found in code
length = 16
xor_decode(addr, length, key)
Useful API calls
currentProgram: The main object for accessing functions, memory, symbols.getFunctionContaining(address): Returns the function object for an address.getReferencesTo(address): Find XREFs (who calls this?).
6) Binary Diffing
COMPARING FIRMWARE VERSIONS:
WHY DIFF FIRMWARE:
┌─────────────────────────────────────────────────────────────┐
│ - Identify patched vulnerabilities │
│ - Find newly introduced bugs │
│ - Understand what changed between versions │
│ - "Patch diffing" to find silent security fixes │
└─────────────────────────────────────────────────────────────┘
TOOLS:
┌─────────────────────────────────────────────────────────────┐
│ BinDiff: │
│ - Google's binary comparison tool │
│ - Works with Ghidra and IDA │
│ - Function matching and similarity scores │
│ │
│ Diaphora: │
│ - Open-source alternative │
│ - IDA plugin (Ghidra port available) │
└─────────────────────────────────────────────────────────────┘
7) Defensive Architecture: Anti-Reverse Engineering
Vendors are aware that researchers reverse their products. They employ techniques to make analysis Difficult, Annoying, or Impossible.
1. Symbol Stripping
Compilers generate "symbols" (names of functions and variables) to help debuggers. Production
firmware is usually "stripped" (strip --strip-all binary).
- Impact: Instead of
check_password(), you seeFUN_000105a4(). - Counter-Tactic: Look for debug strings ("Password incorrect") and trace where they are used (XREFs).
2. Logic Obfuscation
Spaghetti Code: Compilers can be configured to insert useless jumps, dead code, and convoluted control flow specifically to confuse decompilers.
3. Firmware Encryption
The firmware update file is encrypted (AES) and only decrypted by the bootloader in RAM.
4. Secure Boot (Chain of Trust)
The CPU holds the public Key. The bootloader is signed. The Kernel is signed. If you modify a single byte of the firmware (e.g., to add a backdoor or enable a root shell), the signature check fails, and the device refuses to boot.
SECURE BOOT FLOW:
[CPU ROM] ensures [Bootloader] is signed.
↓
[Bootloader] ensures [Kernel] is signed.
↓
[Kernel] ensures [Filesystem] is valid.
* Attack Surface: If the signature verification code itself has a bug (e.g., buffer overflow in the signature parser), you can bypass the chain.
8) Advanced Concept: Hardware Extraction
Sometimes the firmware isn't available online. You have to go get it yourself, directly from the chip.
Method A: The Serial Console (UART)
Most verified devices have a "debug port" left on the motherboard by developers. It speaks UART (Universal Asynchronous Receiver-Transmitter).
The Attack: Connect via screen/minicom screen /dev/ttyUSB0 115200.
interrupt the boot process to get a U-Boot console, then use `md` (memory dump) to stream the
flash content out over the serial cable.
Method B: SPI Flash Dumping
If the console is locked, you can talk directly to the storage chip (SPI Flash).
- Tool: SOIC8 Clip (pomona clip) + Flashrom + Raspberry Pi / BusPirate.
- Command:
flashrom -p linux_spi:dev=/dev/spidev0.0 -r firmware_dump.bin - Risk: You must power the chip correctly (usually 3.3V) without powering the main CPU, or they will fight over the bus.
Guided Lab: Firmware Analysis Workflow
Objective: Perform a complete analysis cycle: Extract firmware, analyze filesystem, and reverse engineer a vulnerable binary.
Required Tools: Linux VM (Kali/Ubuntu), Binwalk, Ghidra, Strings.
Scenario: You have obtained a firmware image router_firmware.bin from a
vendor's support site.
Part 1: Extraction & Enumeration (20 min)
- Identify the file:
$ binwalk router_firmware.bin
Result: Detected SquashFS filesystem at offset 0x120000. - Extract contents:
$ binwalk -eM router_firmware.bin
Result: Created `_router_firmware.bin.extracted` directory. - Explore the filesystem:
$ cd _router_firmware.bin.extracted/squashfs-root$ ls -R etc/
Task: Look forpasswd,shadow, or configuration files with credentials.
Part 2: String Analysis (15 min)
Before loading Ghidra, look for low-hanging fruit.
- Search for hardcoded secrets:
$ grep -rEi "password|admin|root|key" .
Note:-r(recursive),-E(extended regex),-i(case insensitive). - Identify dangerous binaries:
$ cd bin$ strings login_manager | grep "%s"
Context: Frequent use of%sformat specifiers might suggestsprintfusage.
Part 3: Ghidra Vulnerability Hunt (45 min)
- Load `login_manager` into Ghidra:
Drag binary into project. Select language (likely ARM or MIPS). Analyze. - Find the entry point:
Go toSymbol Tree -> Exports -> main. - Trace the user input:
Find where the username/password is read (look forrecvorread). - Identify the vulnerability:
Look for astrcpy(local_buffer, input_buffer).
Question: Islocal_buffersmaller than the potential input? If yes, it's a Buffer Overflow.
Building on Prior Knowledge
This week connects concepts from CSY101 (Linux CLI) and CSY202 (Network Protocols) to the embedded world.
The "Command Injection" you learned in web apps works exactly the same in
firmware C code: using system() with unsanitized input.
When you analyze the "init" scripts in firmware, you are essentially looking at how the device sets up its Routing Table and Firewall (iptables), just like you configured manually.
Outcome Check
- Navigate Ghidra's interface and analyze binaries
- Load and analyze ARM/MIPS firmware binaries
- Identify common vulnerability patterns in decompiled code
- Use cross-references to trace function calls
- Write basic Ghidra scripts for automated analysis
Resources & Cheatsheets
Essential Assembly Instructions (ARM)
| Inst | Meaning | C Equivalent |
|---|---|---|
MOV R0, R1 |
Move R1 into R0 | r0 = r1; |
LDR R0, [R1] |
Load Register | r0 = *r1; |
STR R0, [R1] |
Store Register | *r1 = r0; |
BL func |
Branch with Link | func(); |
CMP R0, #0 |
Compare | if (r0 == 0) |
BEQ loc |
Branch if Equal | goto loc; |
Glossary of Terms
- Base Address
- The memory address where the firmware expects to be loaded. If wrong, all absolute jumps point to garbage.
- Endianness
- The order of bytes. Little Endian (Least Significant Byte first) is standard for ARM/x86. Network traffic is Big Endian.
- GOT (Global Offset Table)
- A table used by dynamic linkers to resolve functions in shared libraries.
- XREF (Cross Reference)
- A list of all locations in the code that call a function or access a data variable.
- Strings
- ASCII or Unicode text embedded in the binary. The easiest starting point for RE.