Skip to content

Week 05 Quiz

Test your understanding of the weekly concepts.

Format: 10 multiple-choice questions. Passing score: 70%. Time: Untimed.

Take Quiz
CSY304 Week 05 Advanced

Keybindings and shortcuts for efficiency.

IoT & Embedded Systems Security

Track your progress through this week's content

Opening Framing

The Black Box Problem: In traditional software security, you might have access to source code or at least a standardized operating system environment. In IoT and embedded systems, you often start with a "black box"—a proprietary device with no documentation, custom hardware, and a single binary blob representing the entire firmware image.

Why This Matters: Firmware Reverse Engineering (RE) is the art of turning this binary blob back into understanding. It is the primary method for discovering vulnerabilities in IoT devices, as vendors rarely publish source code. By analyzing firmware, you can uncover hardcoded credentials, weak encryption keys, insecure API endpoints, and memory corruption vulnerabilities that would be invisible from the network perspective.

Real-World Relevance: The Mirai Botnet (2016) exploited simple hardcoded credentials found in firmware. More recently, the Ripple20 vulnerabilities (2020) in the Treck TCP/IP stack were discovered by reverse-engineering the networking library used in millions of devices, from infusion pumps to printers.

Week Learning Outcomes:
  • Analyze firmware structure to identify bootloaders, kernels, and filesystems.
  • Extract filesystems from monolithic binary blobs using Binwalk and custom scripts.
  • Disassemble ARM and MIPS binaries to understand control flow and logic.
  • Identify vulnerability patterns (strcpy, command injection) in assembly.
  • Defeat basic anti-reverse engineering techniques (stripped symbols).
MITRE ATT&CK for IoT Mapping:
T1083: File and Directory Discovery

Enumerating filesystems extracted from firmware.

T1592: Gather Victim Host Information

Analyzing firmware for version numbers and hardware details.

T1212: Exploitation for Defense Evasion

Finding flaws to bypass secure boot or authentication.

1) Firmware Anatomy & Extraction

Before you can analyze code, you must unpack the package. Firmware is rarely a single executable; it is usually a complex image containing a bootloader, a kernel, and a root filesystem, all packed together.

TYPICAL EMBEDDED LINUX FLASH LAYOUT:
+----------------+ <--- 0x00000000
|   Bootloader   |  (e.g., U-Boot)
|    (u-boot)    |  Initializes hardware, loads kernel
+----------------+
|   U-Boot Env   |  Config variables (bootargs, IPs)
+----------------+
|     Kernel     |  (e.g., uImage)
|    (Linux)     |  The OS kernel, usually compressed (LZMA/GZIP)
+----------------+
|   Filesystem   |  (e.g., SquashFS, JFFS2, UBI)
|    (RootFS)    |  Contains /bin, /etc, /var, web server, binaries
+----------------+ <--- End of Flash

Common Components

1. Bootloaders (The Gatekeepers)

The bootloader is the first code to run. U-Boot is the industry standard.

  • Function: Initializes RAM, Serial (UART), and Network, then loads the kernel.
  • Security Critical: Often contains a "console" mode that allows attackers to dump memory or bypass passwords.
  • Signatures: Look for strings like U-Boot 2020.01 or header magic bytes 27 05 19 56 (uImage).

2. Filesystems (The Data)

Embedded systems use specialized read-only or compressed filesystems to save space and wear.

Filesystem Description Characteristics Extraction Tool
SquashFS Compressed Read-Only FS Standard for Linux firmware. Highly compressed. unsquashfs / binwalk
JFFS2 Journaling Flash FS Read/Write, wear-leveling. Older devices. jefferson
UBifs Unsorted Block Image Modern successor to JFFS2. Handles raw flash. ubi_reader
CramFS Compressed ROM FS Older, simple, read-only. cramfsck

Extraction Techniques (Binwalk)

Binwalk is the de facto standard tool for analyzing and extracting firmware images. It searches the binary for "magic signatures" (headers) of known file types.

Terminal - Firmware Analysis
$ binwalk firmware.bin DECIMAL HEXADECIMAL DESCRIPTION -------------------------------------------------------------------------------- 0 0x0 TRX firmware header, little endian, ... 28 0x1C LZMA compressed data, properties: 0x5D, ... 2190100 0x216B14 Squashfs filesystem, little endian, version 4.0... $ binwalk -eM firmware.bin # -e: Extract known file types # -M: Matryoshka mode (recursively scan extracted files) $ ls _firmware.bin.extracted/ squashfs-root 0.trk 1C.lzma
Warning: Entropy Analysis

If binwalk returns nothing, calculate the entropy (binwalk -E firmware.bin). A flat line at 1.0 (high entropy) indicates the firmware is encrypted. Standard extraction will fail; you must find the decryption mechanism (often in the bootloader or a previous update).

2) Introduction to Ghidra

Ghidra is a software reverse engineering (SRE) suite developed by the NSA. Unlike a simple disassembler, it includes a powerful decompiler that attempts to reconstruct C code from assembly, making analysis significantly faster.

The Critical Step: Loading the Binary

In desktop software (EXE/ELF), the file format tells the OS where to load code. In embedded firmware, you are often dealing with a raw binary blob. You must tell Ghidra where to put it in memory, or all the addresses—and thus all the jumps and function calls—will be wrong.

Finding the Base Address: Looking at the strings can hint at the base address. If you see many pointers that look like 0x80001234, the base address is likely 0x80000000.
GHIDRA PROJECT WORKFLOW:
┌─────────────────────────────────────────────────────────────┐
│ 1. NEW PROJECT                                              │
│    File -> New Project -> Non-Shared Project                │
│                                                             │
│ 2. IMPORT FILE                                              │
│    - Drag & Drop binary                                     │
│    - Format: Raw Binary (for firmware blobs)                │
│    - Language: ARM Cortex Little Endian (typical)           │
│    - Options: Click "Options..."                            │
│      * Base Address: 0x00000000 (Check datasheets!)         │
│      * Block Name: RAM_FLASH                                │
│                                                             │
│ 3. ANALYZE                                                  │
│    - Double click the file to open CodeBrowser              │
│    - "Analyze" -> Select default options                    │
│    - Wait for bottom-right progress bar to finish!          │
└─────────────────────────────────────────────────────────────┘

The Interface

1. Program Trees (Left Top)

Shows how the binary is organized in memory (sections like .text, .data, .bss).

2. Symbol Tree (Left Middle)

List of identified functions, labels, and imports. This is your navigation menu.

3. Listing View (Center)

The raw assembly (Disassembly). Shows memory addresses, opcodes, and instructions.

4. Decompiler (Right)

The "Magic". Pseudo-C representation of the current function. RENaming variables here updates the Listing view.

Common Shortcuts

Key Action
L Rename Label/Variable (use this constantly!)
; Add Comment (Pre/Post/EOL)
T Set Data Type (transform undefined bytes to structs)
Ctrl+L Retype Variable (in Decompiler)

3) Analyzing Firmware Binaries

Analyzing a binary without symbols is like navigating a city without street signs. You have to use landmarks to orient yourself.

Strategy 1: The "String Reference" Trick

The fastest way to find interesting code is to follow the strings.

  1. Open Window -> Defined Strings.
  2. Filter for interesting keywords: "Password", "Admin", "Login", "Error", "Success".
  3. Double-click a string to go to its location in memory (e.g., in the .rodata section).
  4. Right-click -> References -> Show References to Address.
  5. This will jump you to the code that uses that string.
Logic: If you find the function that uses the string "Password Incorrect", you have likely found the authentication routine. The condition before that print statement is the password check.

Strategy 2: Finding "main" in Startup Code

In bare-metal firmware, there is no "main" symbol. The processor just starts executing at the Reset Vector.

Strategy 3: Identifying Library Functions

Since symbols are stripped, you won't see strcpy or printf by name. You have to recognize them by behavior.

Function Behavior Likely Identity
Takes 2 arguments. Copies bytes until 0x00. strcpy (Dangerous!)
Takes 3 arguments. Copies N bytes. memcpy or strncpy
Takes 2 arguments. Returns 0 if equal. strcmp
Takes format string ("%s", "%d") and varargs. printf / sprintf

4) Finding Vulnerabilities

Deep in the assembly, high-level logic flaws look like specific patterns of instructions.

Case Study: The Stack Buffer Overflow

The most common vulnerability in older firmware is the stack overflow due to `strcpy`.

C Source (Vulnerable)
void auth(char *input) {
    char buf[64];
    // No bounds check!
    strcpy(buf, input); 
}
What Happens

If input is 100 bytes, `strcpy` keeps writing past the end of `buf`. It overwrites the Saved Return Address (LR) on the stack.

Recognizing it in Assembly (ARM32)

In Ghidra, you'll see a call to a copy function with a stack buffer as the destination.

Ghidra Listing View
0x10408   SUB   SP, SP, #0x40        ; Allocate 64 bytes (0x40) for 'buf'
0x1040c   MOV   R0, SP               ; R0 = Destination (buf aka Stack Pointer)
0x10410   MOV   R1, R4               ; R1 = Source (User Input)
0x10414   BL    strcpy               ; Branch Link to strcpy
                                     ; PROBLEM: strcpy doesn't know buf is only 64 bytes!
0x10418   ADD   SP, SP, #0x40        ; Clean up stack
0x1041c   POP   {R4, PC}             ; Return (Pop PC). If stack was smashed, PC is now 0x41414141.

Case Study: Command Injection

Look for snprintf or sprintf constructing a string that is later passed to system() or popen().

sprintf(cmd_buf, "ping -c 1 %s", user_ip);
system(cmd_buf);

If user_ip is 1.2.3.4; cat /etc/shadow, the device executes: ping -c 1 1.2.3.4; cat /etc/shadow.

5) Ghidra Scripting

Ghidra's real power comes from scripting. You can write Java or Python scripts to automate tedious tasks.

Example: XOR String Decoder

Malware authors often "obfuscate" strings by XORing them with a key byte to hide them from strings commands.

Ghidra Script (Python)
# Decodes XOR-encoded bytes at the cursor location
# @keybinding Ctrl-Shift-X
# @category CSY304

from ghidra.program.model.data import StringDataType

def xor_decode(addr, length, key):
    # Get memory at address
    mem = currentProgram.getMemory()
    b = mem.getBytes(addr, length)

    # Decode
    decoded = bytearray()
    for byte in b:
        decoded.append(byte ^ key)
    
    # Print result
    print("Decoded > " + "".join(map(chr, decoded)))
    
    # Option: Patch memory with decoded bytes (for analysis)
    # setBytes(addr, decoded)
    # createData(addr, StringDataType())

# Usage: Put cursor on start of bytes, run script
addr = currentAddress
key = 0xAA  # Example key found in code
length = 16 

xor_decode(addr, length, key)

Useful API calls

6) Binary Diffing

COMPARING FIRMWARE VERSIONS:

WHY DIFF FIRMWARE:
┌─────────────────────────────────────────────────────────────┐
│ - Identify patched vulnerabilities                          │
│ - Find newly introduced bugs                                │
│ - Understand what changed between versions                  │
│ - "Patch diffing" to find silent security fixes             │
└─────────────────────────────────────────────────────────────┘

TOOLS:
┌─────────────────────────────────────────────────────────────┐
│ BinDiff:                                                    │
│ - Google's binary comparison tool                           │
│ - Works with Ghidra and IDA                                 │
│ - Function matching and similarity scores                   │
│                                                             │
│ Diaphora:                                                   │
│ - Open-source alternative                                   │
│ - IDA plugin (Ghidra port available)                        │
└─────────────────────────────────────────────────────────────┘

7) Defensive Architecture: Anti-Reverse Engineering

Vendors are aware that researchers reverse their products. They employ techniques to make analysis Difficult, Annoying, or Impossible.

1. Symbol Stripping

Compilers generate "symbols" (names of functions and variables) to help debuggers. Production firmware is usually "stripped" (strip --strip-all binary).

2. Logic Obfuscation

Spaghetti Code: Compilers can be configured to insert useless jumps, dead code, and convoluted control flow specifically to confuse decompilers.

3. Firmware Encryption

The firmware update file is encrypted (AES) and only decrypted by the bootloader in RAM.

Bypassing Encryption: If you can't find the key in an older, unencrypted firmware version, you may need to perform a hardware attack (Side Channel or UART glitching) to dump the decrypted RAM while the device is running.

4. Secure Boot (Chain of Trust)

The CPU holds the public Key. The bootloader is signed. The Kernel is signed. If you modify a single byte of the firmware (e.g., to add a backdoor or enable a root shell), the signature check fails, and the device refuses to boot.

SECURE BOOT FLOW:
[CPU ROM] ensures [Bootloader] is signed.
   ↓
[Bootloader] ensures [Kernel] is signed.
   ↓
[Kernel] ensures [Filesystem] is valid.

* Attack Surface: If the signature verification code itself has a bug (e.g., buffer overflow in the signature parser), you can bypass the chain.

8) Advanced Concept: Hardware Extraction

Sometimes the firmware isn't available online. You have to go get it yourself, directly from the chip.

Method A: The Serial Console (UART)

Most verified devices have a "debug port" left on the motherboard by developers. It speaks UART (Universal Asynchronous Receiver-Transmitter).

Hardware Connection
[ PCB Board ] [ USB-to-TTL Adapter ] TX -------------> RX RX <------------- TX GND <------------- GND WARNING: Do NOT connect VCC unless your adapter logic level matches the board (3.3V vs 5V).

The Attack: Connect via screen/minicom screen /dev/ttyUSB0 115200. interrupt the boot process to get a U-Boot console, then use `md` (memory dump) to stream the flash content out over the serial cable.

Method B: SPI Flash Dumping

If the console is locked, you can talk directly to the storage chip (SPI Flash).

Guided Lab: Firmware Analysis Workflow

Objective: Perform a complete analysis cycle: Extract firmware, analyze filesystem, and reverse engineer a vulnerable binary.

Required Tools: Linux VM (Kali/Ubuntu), Binwalk, Ghidra, Strings.

Scenario: You have obtained a firmware image router_firmware.bin from a vendor's support site.

Part 1: Extraction & Enumeration (20 min)

  1. Identify the file:
    $ binwalk router_firmware.bin
    Result: Detected SquashFS filesystem at offset 0x120000.
  2. Extract contents:
    $ binwalk -eM router_firmware.bin
    Result: Created `_router_firmware.bin.extracted` directory.
  3. Explore the filesystem:
    $ cd _router_firmware.bin.extracted/squashfs-root
    $ ls -R etc/
    Task: Look for passwd, shadow, or configuration files with credentials.
XP REWARD: +150 XP (Extraction Expert)

Part 2: String Analysis (15 min)

Before loading Ghidra, look for low-hanging fruit.

  1. Search for hardcoded secrets:
    $ grep -rEi "password|admin|root|key" .
    Note: -r (recursive), -E (extended regex), -i (case insensitive).
  2. Identify dangerous binaries:
    $ cd bin
    $ strings login_manager | grep "%s"
    Context: Frequent use of %s format specifiers might suggest sprintf usage.

Part 3: Ghidra Vulnerability Hunt (45 min)

  1. Load `login_manager` into Ghidra:
    Drag binary into project. Select language (likely ARM or MIPS). Analyze.
  2. Find the entry point:
    Go to Symbol Tree -> Exports -> main.
  3. Trace the user input:
    Find where the username/password is read (look for recv or read).
  4. Identify the vulnerability:
    Look for a strcpy(local_buffer, input_buffer).
    Question: Is local_buffer smaller than the potential input? If yes, it's a Buffer Overflow.
XP REWARD: +300 XP (Bug Hunter)

Building on Prior Knowledge

This week connects concepts from CSY101 (Linux CLI) and CSY202 (Network Protocols) to the embedded world.

From CSY203 (Web Sec)

The "Command Injection" you learned in web apps works exactly the same in firmware C code: using system() with unsanitized input.

From CSY104 (Networking)

When you analyze the "init" scripts in firmware, you are essentially looking at how the device sets up its Routing Table and Firewall (iptables), just like you configured manually.

Outcome Check

Resources & Cheatsheets

Essential Assembly Instructions (ARM)

Inst Meaning C Equivalent
MOV R0, R1 Move R1 into R0 r0 = r1;
LDR R0, [R1] Load Register r0 = *r1;
STR R0, [R1] Store Register *r1 = r0;
BL func Branch with Link func();
CMP R0, #0 Compare if (r0 == 0)
BEQ loc Branch if Equal goto loc;
Ghidra Official Cheat Sheet

Keybindings and shortcuts for efficiency.

+10 XP
Ghidra Source Code

Reading the source explains the features.

+20 XP
Azeria Labs (ARM Exploit)

The gold standard for ARM reverse engineering tutorials.

+50 XP

Glossary of Terms

Base Address
The memory address where the firmware expects to be loaded. If wrong, all absolute jumps point to garbage.
Endianness
The order of bytes. Little Endian (Least Significant Byte first) is standard for ARM/x86. Network traffic is Big Endian.
GOT (Global Offset Table)
A table used by dynamic linkers to resolve functions in shared libraries.
XREF (Cross Reference)
A list of all locations in the code that call a function or access a data variable.
Strings
ASCII or Unicode text embedded in the binary. The easiest starting point for RE.