Computer Architecture
CPU Design, Memory Systems & Instruction Processing
1. CPU Components
Control Unit (CU)
The "brain" of the CPU that directs all operations.
Functions:
- • Fetches instructions from memory
- • Decodes instructions
- • Controls data flow between components
- • Generates timing signals
- • Manages instruction sequencing
Types:
- Hardwired: Fixed logic gates, fast but inflexible
- Microprogrammed: Uses microcode, flexible but slower
Arithmetic Logic Unit (ALU)
Performs all arithmetic and logical operations.
Arithmetic Operations:
- • Addition (+)
- • Subtraction (-)
- • Multiplication (×)
- • Division (÷)
- • Increment/Decrement
Logical Operations:
- • AND, OR, NOT, XOR
- • Shift (left, right)
- • Rotate
- • Comparison
- • Complement
CPU Registers
Program Counter (PC)
Holds address of the next instruction to execute
Instruction Register (IR)
Holds the current instruction being executed
Memory Address Register (MAR)
Holds address for memory read/write operations
Memory Data Register (MDR/MBR)
Holds data being transferred to/from memory
Accumulator (ACC)
Stores intermediate arithmetic/logical results
Stack Pointer (SP)
Points to top of the stack in memory
Status/Flag Register (PSW)
Contains condition flags: Zero (Z), Carry (C), Sign (S), Overflow (V)
2. Instruction Cycle
Fetch-Decode-Execute Cycle
Fetch
Read instruction from memory address in PC
MAR ← PC; MDR ← Memory[MAR]; IR ← MDR; PC ← PC + 1
Decode
Interpret opcode and determine operation
CU decodes IR, identifies operands
Execute
Perform the operation using ALU
ALU performs operation, results stored
Store (Write-back)
Write results to register or memory
Results → Register/Memory
Instruction Format
Zero-address (Stack)
Operations use stack; PUSH, POP, ADD
One-address (Accumulator)
One operand + accumulator; LOAD X, ADD X
Two-address
Two operands; ADD R1, R2 (R1 ← R1 + R2)
Three-address
Dest + two sources; ADD R1, R2, R3
Addressing Modes
3. Memory Hierarchy
Memory Pyramid
Memory Types
RAM (Volatile)
SRAM (Static)
- • Uses flip-flops
- • Faster, more expensive
- • Used for cache
DRAM (Dynamic)
- • Uses capacitors
- • Needs refresh
- • Used for main memory
ROM (Non-Volatile)
ROM: Read-only, factory programmed
PROM: One-time programmable
EPROM: UV erasable
EEPROM: Electrically erasable
Flash: Block erasable, SSDs/USB
Memory Performance
Key Metrics:
- Access Time: Time to read/write
- Cycle Time: Minimum time between accesses
- Bandwidth: Data transfer rate
- Latency: Delay before transfer begins
Typical Access Times:
- Registers: < 1 ns
- L1 Cache: 1-2 ns
- L2 Cache: 3-10 ns
- RAM: 50-100 ns
- SSD: 50-100 μs
- HDD: 5-10 ms
4. Cache Memory
Cache Concepts
Cache Hit
Data found in cache (fast access)
Cache Miss
Data not in cache (fetch from memory)
Hit Rate
h = Hits / Total Accesses
Miss Rate
m = 1 - h
Average Memory Access Time (AMAT)
AMAT = Hit_Time + (Miss_Rate × Miss_Penalty)
Example: 2ns + (0.05 × 100ns) = 2 + 5 = 7ns
Cache Mapping Techniques
Direct Mapping
Each memory block maps to exactly one cache line.
Cache Line = (Memory Address) MOD (Number of Lines)
- + Simple, fast
- - High conflict misses
Fully Associative
Any block can go to any cache line.
- + No conflict misses
- - Complex hardware (parallel search)
- - Expensive for large caches
Set-Associative (n-way)
Cache divided into sets; block maps to a set, can go in any line within set.
Set = (Address) MOD (Number of Sets)
- 2-way: 2 lines per set
- 4-way: 4 lines per set
- Balances simplicity and hit rate
Replacement Policies
LRU (Least Recently Used)
Replace block not used for longest time
FIFO (First In First Out)
Replace oldest block
Random
Replace randomly selected block
LFU (Least Frequently Used)
Replace least accessed block
Write Policies
Write-Through
Write to both cache and memory simultaneously
- + Simple, consistent
- - Slower writes
Write-Back
Write to cache only; memory updated on eviction
- + Faster writes
- - More complex (dirty bit needed)
5. Pipelining
Pipeline Concept
Pipelining overlaps execution of multiple instructions, like an assembly line.
Classic 5-Stage Pipeline (MIPS):
Fetch
Decode
Execute
Memory
Write
Pipeline Performance
Speedup: S = n / (1 + (n-1)/k) ≈ k (for large n)
Throughput: 1 instruction per cycle (ideal)
Latency: k cycles for one instruction
Where n = number of instructions, k = pipeline stages
Pipeline Hazards
Structural Hazards
Hardware resource conflict (e.g., one memory port)
Solution: Duplicate resources, separate I/D caches
Data Hazards
Instruction depends on result of previous instruction
RAW (Read After Write): Most common
WAR (Write After Read): Anti-dependency
WAW (Write After Write): Output dependency
Solutions: Forwarding/bypassing, stalling, reordering
Control Hazards
Branch instruction changes program flow
Solutions:
• Stall (wait for branch resolution)
• Branch prediction (static/dynamic)
• Delayed branch (execute delay slot)
• Speculative execution
Branch Prediction
Static Prediction
- • Always predict not taken
- • Always predict taken
- • Backward taken, forward not taken (BTFNT)
Dynamic Prediction
- • 1-bit predictor
- • 2-bit saturating counter
- • Branch History Table (BHT)
- • Branch Target Buffer (BTB)
6. RISC vs CISC
| Feature | RISC | CISC |
|---|---|---|
| Instruction Set | Small, simple | Large, complex |
| Instruction Length | Fixed (32-bit) | Variable (1-15 bytes) |
| Execution | 1 cycle per instruction | Multiple cycles |
| Addressing Modes | Few (Load/Store) | Many |
| Registers | Many (32+) | Few (8-16) |
| Control Unit | Hardwired | Microprogrammed |
| Memory Access | Load/Store only | Direct memory operations |
| Examples | ARM, MIPS, SPARC | x86, x64, VAX |
Modern Reality
Modern processors blur the lines between RISC and CISC. Intel x86 processors decode complex CISC instructions into simpler micro-operations (RISC-like) internally for pipelining efficiency.
7. Bus Architecture
Bus Types
Data Bus
Carries actual data
Width = word size (32-bit, 64-bit)
Bidirectional
Address Bus
Specifies memory/I/O location
Width determines addressable memory
Unidirectional (CPU → Memory)
Control Bus
Carries control signals
Read/Write, Clock, Interrupt
Bidirectional
Bus Standards
Internal Buses:
- • FSB (Front Side Bus) - Legacy
- • QPI (QuickPath) - Intel
- • HyperTransport - AMD
I/O Buses:
- • PCIe (PCI Express)
- • USB (Universal Serial Bus)
- • SATA (Serial ATA)
- • NVMe (Non-Volatile Memory Express)
8. Key Takeaways for CpE Students
Essential Formulas & Concepts
Performance
- • CPU Time = IC × CPI × T
- • CPI = Cycles/Instruction
- • MIPS = IC / (CPU Time × 10⁶)
- • Speedup = Time_old / Time_new
Cache
- • AMAT = Hit_Time + Miss_Rate × Miss_Penalty
- • Hit Rate + Miss Rate = 1
- • n-way set = 2ⁿ blocks per set
Memory
- • Address bits = log₂(Memory Size)
- • 32-bit = 4GB addressable
- • 64-bit = 16 EB addressable
Pipeline
- • Ideal CPI = 1
- • Speedup ≈ k (stages)
- • Hazards reduce throughput
Key Architecture Differences
Von Neumann: Single memory for instructions and data (bottleneck)
Harvard: Separate instruction and data memories (faster, used in embedded)
Modified Harvard: Separate caches, shared main memory (modern CPUs)