Skip to content
CSY101 Week 02 Beginner

Practice how data is encoded, stored, and interpreted. Complete these labs before moving to reading resources.

Cybersecurity Essentials

Track your progress through this week's content

Week Introduction

๐Ÿ’ก Mental Model

Computers are interpretation machines. They don't "understand" data โ€” they execute precise instructions on patterns of 1s and 0s. Security vulnerabilities often emerge when data is interpreted in ways developers didn't anticipate.

This week builds your technical foundation: how computers represent information, why representation matters for security, and how attackers exploit the gap between "what data is" and "how it's interpreted."

Learning Outcomes (Week 2 Focus)

By the end of this week, you should be able to:

  • LO2 - Technical Foundations: Explain how computers represent data at the binary level and why this matters for security
  • LO6 - Software Security: Understand how data representation errors become vulnerabilities (buffer overflows, injection attacks)
  • LO4 - Risk Reasoning: Connect technical concepts to real security risks

Lesson 2.1 ยท Bits, Bytes, and Binary (The Foundation)

Core principle: Computers are machines that manipulate electrical signals. At the hardware level, everything reduces to two states: voltage present (1) or voltage absent (0).

Key terminology:

  • Bit: A single binary digit (0 or 1) โ€” the smallest unit of information
  • Byte: A group of 8 bits โ€” can represent 256 different values (2^8)
  • Binary: The base-2 number system computers use (only digits 0 and 1)

Example: The decimal number 65 in binary is 01000001. This same byte sequence can also represent:

  • The letter 'A' in ASCII encoding
  • Part of an instruction in machine code
  • A pixel intensity value in an image

Security insight: Because the same bytes can have multiple meanings, attackers exploit contexts where data is interpreted differently than intended. This is the foundation of injection attacks and memory corruption exploits.

Layers of data representation: Physical (Voltage) -> Binary (01000001) -> Interpretation (ASCII 'A', Int 65, Instruction 0x41)
The same binary pattern can be interpreted in multiple ways.

Lesson 2.2 ยท Encoding: From Data to Meaning

Critical concept: Raw bytes have no inherent meaning. Meaning comes from encoding schemes โ€” agreed-upon rules for interpreting bit patterns.

Common encoding schemes:

Security-critical example: Consider the byte sequence 3C 73 63 72 69 70 74 3E

Why this matters: Cross-site scripting (XSS) attacks work by injecting data that gets interpreted as code. SQL injection exploits the same principle. The vulnerability isn't the bytes themselves โ€” it's the context in which they're interpreted.

Defender principle: Always validate data based on how it will be interpreted, not just what it contains. Context determines risk.

Lesson 2.3 ยท Memory, Storage, and the Security Boundary

๐Ÿ’ก Mental model

Think of computer memory as a giant array of numbered boxes (addresses), each holding one byte. Programs request boxes, fill them with data, and eventually return them. Security problems arise when programs access the wrong boxes.

Storage hierarchy (fastest to slowest):

Memory Hierarchy Pyramid showing Registers (fastest/smallest) to Disk (slowest/largest) with security boundary at RAM/Disk
The memory hierarchy creates a trade-off between speed and persistence.

Security-critical concept: Memory safety

When a program requests memory for 10 bytes but writes 20, those extra 10 bytes overwrite adjacent memory. This is called a buffer overflow.

Example scenario:

Diagram showing data spilling from an 8-byte buffer into the return address memory space
A buffer overflow occurs when data exceeds its allocated space and corrupts adjacent memory.

Why persistence matters: Data in RAM disappears on reboot. Data on disk survives. This distinction affects incident response (volatile evidence vs persistent artifacts) and data protection (encryption at rest vs encryption in transit).

Defender takeaway: Memory safety vulnerabilities remain among the most dangerous classes of bugs. Modern defenses include bounds checking, memory-safe languages (Rust, Go), and Address Space Layout Randomization (ASLR).

Lesson 2.4 ยท From Representation Errors to Exploits

Core insight: Computers execute instructions with perfect precision but zero judgment. They cannot detect "unreasonable" data โ€” they simply process what they're given.

Common vulnerability patterns rooted in representation:

Real-world impact: The 2017 Equifax breach (143 million records exposed) began with an Apache Struts vulnerability โ€” essentially a failure to safely handle user-provided data in HTTP headers.

Defender principle: Never trust data at boundaries. Validate length, type, encoding, and range before processing. Assume all input is hostile until proven safe.

Self-Check Questions (Test Your Understanding)

Answer these in your own words (2-3 sentences each):

  1. Why do computers need encoding schemes? What happens if sender and receiver use different encodings?
  2. Explain why the same byte sequence (e.g., 01000001) can have different meanings. Give two examples.
  3. What is a buffer overflow? Why is it dangerous from a security perspective?
  4. How does the memory hierarchy (RAM vs disk) affect security incident response?
  5. Why is "trusting input" a fundamental security mistake? Connect this to data representation.

Lab 2 ยท Representation Risks in Real Systems

Time estimate: 30-45 minutes

Objective: Analyze how data representation errors create security vulnerabilities in real systems. You will identify where interpretation mismatches create risk and propose mitigations.

Step 1: Choose Your Context (5 minutes)

Select one scenario from this list (or propose your own):

Why it matters: Every boundary where data enters a system is a potential vulnerability point.

Step 2: Identify the Data Flow (10 minutes)

For your chosen scenario, trace how data moves through the system:

Example for web form username:

Step 3: Find the Dangerous Interpretation (10 minutes)

Identify where the same data could be interpreted two different ways:

Example for username:

Step 4: Map the Attack Scenario (10 minutes)

Describe a realistic attack exploiting the representation mismatch:

Example attack:

Step 5: Propose Defenses (5 minutes)

Identify at least two controls that would prevent the attack:

Example defenses:

Step 6: Synthesis (5 minutes)

Write a short paragraph (3-5 sentences) answering:

"How does this vulnerability connect to data representation concepts from this week? Why can't the system automatically detect the malicious intent?"

Example answer:

This XSS vulnerability exists because computers cannot distinguish "text meant for display" from "text that happens to contain code syntax." The byte sequence <script> is just data until a browser interprets it in an HTML context. The system can't detect malicious intent because intent isn't encoded in the bytes โ€” only humans understand context and purpose. Defense requires explicitly telling the computer how to interpret data safely in each context.

Success Criteria (What "Good" Looks Like)

Your lab is successful if you:

Extension (For Advanced Students)

If you finish early, explore these questions:

๐ŸŽฏ Hands-On Labs (Free & Essential)

Practice how data is encoded, stored, and interpreted. Complete these labs before moving to reading resources.

๐ŸŽฎ TryHackMe: CyberChef - The Basics

What you'll do: Decode and transform data using common representations (hex, base64, ASCII) and learn how simple encoding changes meaning.
Why it matters: Most security issues begin with data being interpreted in the wrong context. CyberChef is a practical way to see those transformations.
Time estimate: 1-1.5 hours

Start TryHackMe CyberChef Basics โ†’

๐Ÿ PicoCTF Practice: General Skills (Bases + Strings)

What you'll do: Solve beginner challenges like Bases, Strings, and Warmed Up to decode data and understand binary/ASCII representations.
Why it matters: These challenges train you to recognize how the same bytes can represent text, numbers, or encoded payloads.
Time estimate: 1-2 hours

Start PicoCTF General Skills โ†’

๐Ÿ’ก Lab Tip: Keep a small cheat sheet of common encodings (binary, hex, base64, URL encoding). Being able to spot them quickly will save you hours later.

Resources (Free + Authoritative)

Work through these in order. Each builds technical foundation for security reasoning.

๐Ÿ“˜ CS50 - Binary, ASCII, and Unicode

What to read: The sections on "Binary," "ASCII," and "Unicode" from CS50's Week 0 notes.
Why it matters: Harvard's intro CS course explains encoding clearly. Focus on why computers need multiple encoding schemes.
Time estimate: 20 minutes

Open Resource

๐ŸŽฅ Computerphile - How Memory Works (Video)

What to watch: First 15 minutes on memory addressing and storage hierarchy.
Why it matters: Visual explanation of how programs access memory โ€” essential for understanding buffer overflows.
Time estimate: 15 minutes

Open Resource

๐Ÿ“˜ OWASP - Input Validation Cheat Sheet

What to read: Introduction and "Syntactic Validation" sections only.
Why it matters: Shows how professionals defend against representation-based attacks. Don't memorize โ€” understand the principles.
Time estimate: 15 minutes

Open Resource

๐Ÿ“˜ Cloudflare Learning - What is UTF-8?

What to read: Entire article (short, accessible).
Why it matters: UTF-8 is the dominant text encoding. Understanding it prevents encoding-based bypasses.
Time estimate: 10 minutes

Open Resource

Tip: Completion and XP persist via localStorage. If progress doesn't update immediately, refresh once.

๐Ÿ“ Week 02 Quiz

Test your understanding of the CIA triad, security principles, and defense in depth.

Format: 10 multiple-choice questions ยท Passing score: 70% ยท Time: Untimed

Take Quiz

Weekly Reflection Prompt

Aligned to LO2 (Technical Foundations) and LO6 (Software Security)

Write 200-300 words answering this prompt:

Choose one vulnerability type from this week (buffer overflow, injection, integer overflow, encoding mismatch). Explain how it connects to data representation concepts.

In your answer, include:

What good looks like: You explain the mechanism (how representation creates vulnerability), not just the outcome. You show understanding that computers process bytes literally, without semantic understanding.