πŸ“š Computer Organization & Architecture

Unit 3: I/O Architecture and Performance Enhancement

✨ Created by Ankush Raj ✨

πŸ”Œ 1. Peripheral Devices

Definition: Peripheral devices are external hardware components connected to the computer to perform input, output, storage, communication, or controlling functions. They are not part of the CPU or main memory but work as an extension to the computer system.

Types & Examples

Type Function Examples
Input Devices Send data to computer Keyboard, Mouse, Scanner, Webcam, Microphone
Output Devices Receive data from computer Monitor, Printer, Speakers, Projector
Storage Devices Store data permanently HDD, SSD, USB Drive, SD Card
Communication Devices Enable data transmission Modem, Router, Bluetooth Adapter, NIC
[Peripheral Device] ↔ [I/O Interface] ↔ [System Bus] ↔ [CPU/Memory]

πŸ”— 2. I/O Interface

Definition: An I/O Interface is a hardware module that acts as a communication bridge between slow I/O devices and fast CPU/Memory. It handles signal conversion, buffering, timing mismatch, and provides control/status information to CPU.

Key Functions

  1. Command Decoding – Interprets control signals from CPU
  2. Data Buffering – Temporarily stores data to match speed differences
  3. Signal Conversion – Converts digital/analog signals
  4. Timing & Control – Synchronizes operations between CPU and devices
  5. Error Detection – Identifies transmission errors using parity bits
  6. Status Reporting – Informs CPU about device readiness

Interface Types

Serial Interface

Data transmitted bit-by-bit sequentially

Examples: USB, RS-232, SATA

Parallel Interface

Multiple bits transmitted simultaneously

Examples: Printer Port, IDE

πŸ”„ 3. Asynchronous Data Transfer

Definition: Asynchronous data transfer is a method where CPU and I/O device do not share a common clock. Both operate at different speeds and synchronize using control signals to ensure data integrity.

Methods

A) Strobe Control

A strobe signal indicates when data is valid on the data bus.

Data: _____|β–ˆβ–ˆβ–ˆβ–ˆ|_____
Strobe: _____|β€Ύβ€Ύ|______
βœ” Single control line
βœ” Unidirectional communication
βœ” Simpler implementation
βœ” Less reliable

B) Handshaking Method

Two-way acknowledgment system for reliable data transfer.

Source Destination |--Data Ready-->| |<-Data Accepted-| |--Remove Data-->|
Advantages:
βœ… No clock synchronization needed
βœ… Flexible timing
βœ… Better error detection
βœ… More reliable data transfer
Example: USB communication, printer data transfer

⚑ 4. Interrupts

Definition: An interrupt is a hardware or software signal that requests immediate attention from the CPU, temporarily suspending current execution and shifting control to an Interrupt Service Routine (ISR).

Interrupt Processing Steps

  1. Device sends interrupt signal
  2. CPU completes current instruction
  3. CPU saves Program Counter (PC) and registers
  4. Control transfers to ISR
  5. ISR executes
  6. Restore saved context
  7. Resume interrupted program
Normal β†’ Interrupt β†’ Save β†’ ISR β†’ Restore β†’ Resume

Types of Interrupts

Type Description Example
Hardware External devices Keyboard, Timer, Mouse
Software Program instructions System call, INT 21H
Maskable Can be disabled I/O device interrupts
Non-Maskable Cannot be disabled Power failure, Memory error
Vectored ISR address predefined 8086 interrupts
Non-Vectored Device provides address Requires polling

πŸ“€ 5. Modes of Data Transfer

1. Programmed I/O (Polling)

Definition: CPU continuously checks device status in a loop and transfers data by itself. The CPU wastes time waiting β†’ also called busy waiting.
Process: 1. CPU sends read command 2. CPU checks status (busy wait) 3. If ready, CPU reads data 4. Repeat for next byte
Advantages:
βœ… Simple implementation
βœ… No additional hardware
Disadvantages:
❌ CPU time wasted
❌ Inefficient for slow devices
❌ Cannot multitask
Example: Reading character from keyboard

2. Interrupt-Driven I/O

Definition: CPU issues a request β†’ continues its own work β†’ gets interrupted only when device becomes ready. CPU time is saved.
Process: 1. CPU sends I/O command 2. CPU continues other tasks 3. Device completes operation 4. Device sends interrupt 5. CPU services interrupt
Advantages:
βœ… CPU not wasted
βœ… Better efficiency
βœ… Supports multitasking
Disadvantages:
❌ Interrupt overhead
❌ Complex implementation
Example: Keyboard input, mouse events

3. Direct Memory Access (DMA)

Definition: DMA allows high-speed devices to transfer data directly between I/O device and main memory without using CPU for every byte. CPU only initializes DMA.

DMA Controller Components

  • Address Register – Memory address pointer
  • Word Count Register – Number of bytes to transfer
  • Control Register – R/W mode, transfer type
  • Status Register – Transfer status

DMA Transfer Process

Step 1: CPU initializes DMA β”œβ”€ Source address β”œβ”€ Destination address β”œβ”€ Byte count └─ Control settings Step 2: DMA requests bus Step 3: DMA transfers data Memory ↔ DMA ↔ I/O Device Step 4: DMA sends interrupt

DMA Transfer Modes

Burst Mode

DMA takes complete bus control and transfers entire block at once

Cycle Stealing

DMA steals one bus cycle at a time, minimal CPU interruption

Transparent Mode

DMA transfers only when CPU is not using the bus

Advantages:
βœ… Fastest data transfer
βœ… CPU completely free
βœ… Suitable for high-speed devices
βœ… Efficient for bulk data
Examples: Disk to RAM transfer, Video streaming, Network data transfer

Comparison of Data Transfer Modes

Feature Programmed I/O Interrupt I/O DMA
CPU Usage Continuous Minimal Initial only
Speed Slowest Medium Fastest
Efficiency Very Low Medium High
Hardware Cost Low Medium High
Best For Simple devices Keyboards, mice Disk, video

πŸ–₯️ 6. I/O Processor (IOP)

Definition: An I/O Processor is a dedicated processor responsible for controlling I/O operations independently of the CPU. It can execute its own instructions related to I/O tasks.

Functions of IOP

  1. Executes I/O channel programs
  2. Manages multiple I/O devices simultaneously
  3. Performs data buffering and formatting
  4. Handles error detection and correction
  5. Sends completion interrupts to CPU
  6. Reduces CPU workload significantly
CPU
|
System Bus
|
I/O Processor
/ | \
Dev1 Dev2 Dev3
Advantages:
βœ… CPU freed from I/O management
βœ… Parallel I/O operations possible
βœ… Better system throughput
βœ… Improved overall performance

βš™οΈ 1. Parallel Processing

Definition: Parallel processing is a technique where multiple operations are performed simultaneously using multiple functional units or processors. Instead of completing tasks one-by-one, the system processes multiple instructions/data streams at the same time.

Goals of Parallel Processing

  • ⚑ Increase execution speed
  • ⚑ Improve throughput
  • ⚑ Better resource utilization
  • ⚑ Handle complex computations efficiently

Types of Parallelism

Bit-Level

Processing multiple bits together (8-bit β†’ 64-bit)

Instruction-Level

Multiple instructions executed simultaneously

Task-Level

Different tasks on different processors

Data-Level

Same operation on multiple data elements

πŸ“Š 2. Flynn's Classification

Definition: Flynn's taxonomy classifies computer architectures based on the number of concurrent instruction streams (I) and data streams (D) that can be processed.

1. SISD (Single Instruction, Single Data)

RetryARContinue

Concept: Traditional von Neumann architecture

Working: One instruction on one data at a time

Control β†’ Processing β†’ Data

Examples: Intel 8086, Early processors

2. SIMD (Single Instruction, Multiple Data)

Concept: One instruction operates on multiple data simultaneously

Working: Vector/array processing

Single Instruction
↓
[PE1] [PE2] [PE3]
↓ ↓ ↓
Data1 Data2 Data3

Examples: GPUs, Intel SSE/AVX

Applications: Image processing, scientific computing

3. MISD (Multiple Instruction, Single Data)

Concept: Multiple instructions on same data

Working: Rarely implemented

Examples: Fault-tolerant systems

Note: Theoretical, not commonly used

4. MIMD (Multiple Instruction, Multiple Data)

Concept: Multiple processors, different instructions on different data

Working: True parallel computing

[CPU1] [CPU2] [CPU3]
↓ ↓ ↓
Data1 Data2 Data3

Examples: Multi-core CPUs, distributed systems

Applications: Web servers, databases

πŸ”€ 3. Instruction Level Parallelism (ILP)

Definition: ILP refers to the capability of CPU to execute multiple independent instructions simultaneously by overlapping their execution stages.

Techniques to Achieve ILP

  1. Pipelining – Overlapping instruction stages
  2. Superscalar – Multiple instructions per cycle
  3. Out-of-Order Execution – Execute ready instructions first
  4. Branch Prediction – Predict branch outcomes
  5. Register Renaming – Eliminate false dependencies
  6. Speculative Execution – Execute before knowing if needed
Example of Independent Instructions:
I1: R1 = R2 + R3 } Can I2: R4 = R5 - R6 } execute } parallel I3: R7 = R1 * 2 } Must wait for I1

🏭 4. Pipeline Processing

Definition: Pipeline processing divides instruction execution into several stages. Each stage works in parallel on different instructions, increasing total throughput.

Classic 5-Stage Pipeline

Stage 1: IF - Instruction Fetch Stage 2: ID - Instruction Decode Stage 3: EX - Execute Stage 4: MEM - Memory Access Stage 5: WB - Write Back

Pipeline Execution Diagram

Time→ 1 2 3 4 5 6
I1: IF ID EX ME WB
I2: IF ID EX ME WB
I3: IF ID EX ME
I4: IF ID EX

Analogy

Factory Assembly Line: Like car manufacturing where:
  • Worker 1: Installs chassis
  • Worker 2: Installs engine
  • Worker 3: Paints car
  • Worker 4: Installs wheels
  • Worker 5: Final inspection
All work simultaneously on different cars!

Performance Metrics

1. Throughput

Throughput = Instructions completed per unit time

2. Speedup

Speedup = Non-pipelined time / Pipelined time

3. Efficiency

Efficiency = (Speedup / Number of stages) Γ— 100%
Example Calculation:
Given: 5-stage pipeline Each stage = 10ns Non-pipelined: 5 Γ— 10ns = 50ns/instruction 10 inst = 500ns Pipelined: First inst: 50ns Next 9: 9Γ—10ns = 90ns Total = 140ns Speedup = 500/140 = 3.57Γ—
Advantages:
βœ… Increased throughput
βœ… Better CPU utilization
βœ… Faster execution
βœ… No hardware duplication
Limitations:
❌ Pipeline hazards cause stalls
❌ Complex control logic
❌ Branch instructions create problems

⚠️ 5. Pipeline Hazards

Definition: Hazards are conditions that delay or disrupt smooth instruction flow through the pipeline, causing stalls or bubbles.

1. Structural Hazards

What is it?

Hardware resource conflict - two instructions need same resource at same time.

Example:
Problem: I1 needs memory (data read) I2 needs memory (instruction) β†’ Only one memory port β†’ CONFLICT!

Solutions:

  • βœ… Duplicate hardware resources
  • βœ… Separate instruction & data caches
  • βœ… Pipeline reorganization
  • βœ… Resource scheduling

2. Data Hazards

What is it?

Instruction depends on result of previous instruction that hasn't completed yet.

Types:

A) RAW (Read After Write) - True Dependency
I1: R2 = R1 + R3 I2: R4 = R2 + R5 ← Needs R2 (not ready!) MOST COMMON hazard
B) WAR (Write After Read) - Anti Dependency
I1: R4 = R1 + R5 I2: R1 = R2 + R3 ← Writes R1 after I1 reads
C) WAW (Write After Write) - Output Dependency
I1: R2 = R1 + R3 I2: R2 = R4 + R5 ← Both write to R2

Solutions:

1. Data Forwarding/Bypassing
Pass result directly from one stage to another
EX/MEM β†’ directly to β†’ EX (Result available immediately)
2. Pipeline Stalling
Insert NOP (bubbles) to create delay
I1: R2 = R1 + R3 [NOP - bubble] I2: R4 = R2 + R5 ← R2 ready!
3. Compiler Scheduling
Reorder instructions to avoid dependencies
Original: I1: R2 = R1 + R3 I2: R4 = R2 + R5 ← depends Reordered: I1: R2 = R1 + R3 I3: R6 = R7 + R8 ← independent I2: R4 = R2 + R5 ← I1 complete!

3. Control Hazards (Branch Hazards)

What is it?

Due to branch/jump instructions - pipeline doesn't know which instruction to fetch next.

Example:
I1: BEQ R1, R2, Label ← Branch instruction I2: ADD R3, R4, R5 ← May not execute I3: SUB R6, R7, R8 ... Label: MUL R9, R10, R11 ← Jump to here? Problem: Fetch I2 or Label? Don't know until I1 completes!

Solutions:

1. Branch Prediction
Static:
  • Always predict taken/not taken
  • Backward branches β†’ taken (loops)
  • Forward branches β†’ not taken
Dynamic:
  • Use branch history
  • 1-bit: Remember last outcome
  • 2-bit: Change after 2 mispredictions
2. Branch Delay Slot
Place useful instruction after branch
BEQ R1, R2, Label ADD R3, R4, R5 ← Delay slot (always executes)
3. Multiple Streams
Fetch from both paths simultaneously
4. Pipeline Flushing
If misprediction, flush wrong instructions and restart

Comparison of Hazards

Hazard Cause Main Solution Impact
Structural Resource conflict Duplicate resources Medium
Data (RAW) Data dependency Forwarding Low-Medium
Control Branch instructions Branch prediction High

πŸ“ˆ 6. Performance Metrics

1. Throughput

Throughput is the number of instructions completed per unit time. Higher = Better.
Formula: Throughput = Instructions / Time Example: 100 instructions in 150ns = 100/150ns = 0.67 instructions/ns

2. Speedup

Speedup is ratio of performance improvement.
Formula: Speedup = Time(non-pipelined) / Time(pipelined) Ideal Speedup = Number of stages (k) Example: Non-pipelined: 500ns Pipelined: 140ns Speedup = 500/140 = 3.57Γ—

3. Efficiency

Efficiency measures how effectively pipeline stages are utilized.
Formula: Efficiency = (Actual Speedup/Ideal Speedup) Γ— 100% Example: Actual Speedup = 3.57 Ideal = 5 (5 stages) Efficiency = (3.57/5) Γ— 100% = 71.4%

4. CPI (Cycles Per Instruction)

CPI indicates average cycles needed per instruction.
Formula: CPI = Total Cycles / Number of Instructions Ideal pipelined CPI = 1 (one instruction per cycle after pipeline fills)

⚑ Quick Revision - Unit 3

πŸ”Ή I/O Subsystems - Flow

Peripheral β†’ Interface β†’ Async β†’ Interrupt β†’ Transfer Modes β†’ DMA β†’ IOP

πŸ”Ή Peripheral Devices

Type Examples
Input Keyboard, Mouse, Scanner
Output Monitor, Printer, Speakers
Storage HDD, SSD, USB
Communication Modem, Router, NIC

πŸ”Ή I/O Interface Functions

Remember: CBSTCE
  • Command decoding
  • Buffering
  • Signal conversion
  • Timing & control
  • Control signals
  • Error detection

πŸ”Ή Asynchronous Transfer Methods

Strobe Control

βœ” Single signal

βœ” Simpler

βœ– Less reliable

Handshaking

βœ” Two-way acknowledgment

βœ” More reliable

βœ” Used in USB

πŸ”Ή Interrupt Types

Hardware ─── External devices Software ─── Program instructions Maskable ─── Can be disabled Non-Maskable ─ Cannot disable Vectored ─── ISR predefined Non-Vectored ─ Device provides ISR

πŸ”Ή Data Transfer Modes - MOST IMPORTANT!

Mode CPU Usage Speed Best For
Programmed I/O CPU busy Slowest Simple devices
Interrupt-Driven CPU free Medium Keyboard, mouse
DMA CPU not used Fastest Disk, video
Remember: Programmed I/O = CPU busy | Interrupt = CPU free | DMA = CPU not involved

πŸ”Ή DMA Key Points

  • βœ… Direct Memory Access
  • βœ… CPU only initializes
  • βœ… Data: Device ↔ Memory (no CPU)
  • βœ… Interrupt on completion
  • βœ… 3 modes: Burst, Cycle Stealing, Transparent

πŸ”Ή I/O Processor (IOP)

Dedicated processor for I/O β”œβ”€ Executes I/O instructions β”œβ”€ Manages multiple devices β”œβ”€ Performs buffering └─ Reduces CPU workload

πŸ”Ή Parallel Processing

Goal: Many operations at same time β†’ Faster execution

πŸ”Ή Flynn's Classification - Super Important!

Type Example Use Case
SISD Old processors Traditional
SIMD GPU Image processing
MISD Rare Fault-tolerant
MIMD Multicore CPUs Parallel computing
Easy Memory Trick: SISD = Simple | SIMD = GPU | MISD = Rare | MIMD = Multicore

πŸ”Ή Pipeline Processing

5 Stages: IF β†’ ID β†’ EX β†’ MEM β†’ WB IF = Instruction Fetch ID = Instruction Decode EX = Execute MEM = Memory Access WB = Write Back
Analogy: Assembly line in factory - each worker does one task, all work simultaneously!

πŸ”Ή Pipeline Hazards - Must Know!

Hazard Cause Solution
Structural Resource conflict Duplicate resources
Data (RAW) Data dependency Forwarding / Stalling
Control Branch instructions Branch prediction

πŸ”Ή Data Hazards Types

RAW (Read After Write) ← MOST COMMON ← True dependency WAR (Write After Read) ← Anti-dependency WAW (Write After Write) ← Output dependency

πŸ”Ή Performance Terms

Throughput

Instructions per time

Higher = Better

Speedup

Non-pipelined / Pipelined

Ideal = Stages

Efficiency

(Actual/Ideal) Γ— 100%

Closer to 100% = Better

CPI

Cycles Per Instruction

Ideal pipelined = 1

πŸ”Ή Last-Minute Formula Sheet

βœ… Speedup = Time(non-pipe) / Time(pipe) βœ… Throughput = Instructions / Time βœ… Efficiency = (Actual Speedup / Ideal) Γ— 100% βœ… CPI = Total Cycles / Instructions βœ… Pipeline Time = (k + n - 1) Γ— tp k=stages, n=instructions tp=time per stage βœ… Non-pipeline Time = n Γ— k Γ— tp

πŸ”Ή Important One-Liners for Exam

βœ” DMA is fastest data transfer
βœ” NMI cannot be disabled
βœ” Handshaking more reliable than strobe
βœ” SIMD used in GPUs
βœ” RAW is most common data hazard
βœ” Branch prediction solves control hazards
βœ” Forwarding solves data hazards
βœ” Ideal speedup = number of stages
βœ” IOP reduces CPU workload
βœ” Interrupt-driven better than programmed I/O

πŸ”Ή Exam Tips

If question asks:
  • "Fastest transfer?" β†’ DMA
  • "CPU free?" β†’ Interrupt/DMA
  • "Used in GPUs?" β†’ SIMD
  • "Most common hazard?" β†’ Data (RAW)
  • "Branch solution?" β†’ Branch prediction
  • "Pipeline stages?" β†’ IF, ID, EX, MEM, WB
```