COA Unit 3 - I/O Architecture & Pipeline Processing

🔌 1. Peripheral Devices

Definition: Peripheral devices are external hardware components connected to the computer to perform input, output, storage, communication, or controlling functions. They are not part of the CPU or main memory but work as an extension to the computer system.

Types & Examples

Type	Function	Examples
Input Devices	Send data to computer	Keyboard, Mouse, Scanner, Webcam, Microphone
Output Devices	Receive data from computer	Monitor, Printer, Speakers, Projector
Storage Devices	Store data permanently	HDD, SSD, USB Drive, SD Card
Communication Devices	Enable data transmission	Modem, Router, Bluetooth Adapter, NIC

[Peripheral Device] ↔ [I/O Interface] ↔ [System Bus] ↔ [CPU/Memory]

🔗 2. I/O Interface

Definition: An I/O Interface is a hardware module that acts as a communication bridge between slow I/O devices and fast CPU/Memory. It handles signal conversion, buffering, timing mismatch, and provides control/status information to CPU.

Key Functions

Command Decoding – Interprets control signals from CPU
Data Buffering – Temporarily stores data to match speed differences
Signal Conversion – Converts digital/analog signals
Timing & Control – Synchronizes operations between CPU and devices
Error Detection – Identifies transmission errors using parity bits
Status Reporting – Informs CPU about device readiness

Interface Types

Serial Interface

Data transmitted bit-by-bit sequentially

Examples: USB, RS-232, SATA

Parallel Interface

Multiple bits transmitted simultaneously

Examples: Printer Port, IDE

🔄 3. Asynchronous Data Transfer

Definition: Asynchronous data transfer is a method where CPU and I/O device do not share a common clock. Both operate at different speeds and synchronize using control signals to ensure data integrity.

Methods

A) Strobe Control

A strobe signal indicates when data is valid on the data bus.

Data: _____|████|_____
Strobe: _____|‾‾|______

✔ Single control line
✔ Unidirectional communication
✔ Simpler implementation
✔ Less reliable

B) Handshaking Method

Two-way acknowledgment system for reliable data transfer.

Source         Destination
|--Data Ready-->|
|<-Data Accepted-|
|--Remove Data-->|
                    

Advantages:
✅ No clock synchronization needed
✅ Flexible timing
✅ Better error detection
✅ More reliable data transfer

Example: USB communication, printer data transfer

⚡ 4. Interrupts

Definition: An interrupt is a hardware or software signal that requests immediate attention from the CPU, temporarily suspending current execution and shifting control to an Interrupt Service Routine (ISR).

Interrupt Processing Steps

Device sends interrupt signal
CPU completes current instruction
CPU saves Program Counter (PC) and registers
Control transfers to ISR
ISR executes
Restore saved context
Resume interrupted program

Normal → Interrupt → Save → ISR → Restore → Resume

Types of Interrupts

Type	Description	Example
Hardware	External devices	Keyboard, Timer, Mouse
Software	Program instructions	System call, INT 21H
Maskable	Can be disabled	I/O device interrupts
Non-Maskable	Cannot be disabled	Power failure, Memory error
Vectored	ISR address predefined	8086 interrupts
Non-Vectored	Device provides address	Requires polling

📤 5. Modes of Data Transfer

1. Programmed I/O (Polling)

Definition: CPU continuously checks device status in a loop and transfers data by itself. The CPU wastes time waiting → also called busy waiting.

Process:
CPU sends read command
CPU checks status (busy wait)
If ready, CPU reads data
Repeat for next byte
                    

Advantages:
✅ Simple implementation
✅ No additional hardware

Disadvantages:
❌ CPU time wasted
❌ Inefficient for slow devices
❌ Cannot multitask

Example: Reading character from keyboard

2. Interrupt-Driven I/O

Definition: CPU issues a request → continues its own work → gets interrupted only when device becomes ready. CPU time is saved.

Process:
CPU sends I/O command
CPU continues other tasks
Device completes operation
Device sends interrupt
CPU services interrupt
                    

Advantages:
✅ CPU not wasted
✅ Better efficiency
✅ Supports multitasking

Disadvantages:
❌ Interrupt overhead
❌ Complex implementation

Example: Keyboard input, mouse events

3. Direct Memory Access (DMA)

Definition: DMA allows high-speed devices to transfer data directly between I/O device and main memory without using CPU for every byte. CPU only initializes DMA.

DMA Controller Components

Address Register – Memory address pointer
Word Count Register – Number of bytes to transfer
Control Register – R/W mode, transfer type
Status Register – Transfer status

DMA Transfer Process

Step 1: CPU initializes DMA
  ├─ Source address
  ├─ Destination address
  ├─ Byte count
  └─ Control settings

Step 2: DMA requests bus

Step 3: DMA transfers data
  Memory ↔ DMA ↔ I/O Device

Step 4: DMA sends interrupt
                    

DMA Transfer Modes

Burst Mode

DMA takes complete bus control and transfers entire block at once

Cycle Stealing

DMA steals one bus cycle at a time, minimal CPU interruption

Transparent Mode

DMA transfers only when CPU is not using the bus

Advantages:
✅ Fastest data transfer
✅ CPU completely free
✅ Suitable for high-speed devices
✅ Efficient for bulk data

Examples: Disk to RAM transfer, Video streaming, Network data transfer

Comparison of Data Transfer Modes

Feature	Programmed I/O	Interrupt I/O	DMA
CPU Usage	Continuous	Minimal	Initial only
Speed	Slowest	Medium	Fastest
Efficiency	Very Low	Medium	High
Hardware Cost	Low	Medium	High
Best For	Simple devices	Keyboards, mice	Disk, video

🖥️ 6. I/O Processor (IOP)

Definition: An I/O Processor is a dedicated processor responsible for controlling I/O operations independently of the CPU. It can execute its own instructions related to I/O tasks.

Functions of IOP

Executes I/O channel programs
Manages multiple I/O devices simultaneously
Performs data buffering and formatting
Handles error detection and correction
Sends completion interrupts to CPU
Reduces CPU workload significantly

CPU
|
System Bus
|
I/O Processor
/ | \
Dev1 Dev2 Dev3

Advantages:
✅ CPU freed from I/O management
✅ Parallel I/O operations possible
✅ Better system throughput
✅ Improved overall performance

⚙️ 1. Parallel Processing

Definition: Parallel processing is a technique where multiple operations are performed simultaneously using multiple functional units or processors. Instead of completing tasks one-by-one, the system processes multiple instructions/data streams at the same time.

Goals of Parallel Processing

⚡ Increase execution speed
⚡ Improve throughput
⚡ Better resource utilization
⚡ Handle complex computations efficiently

Types of Parallelism

Bit-Level

Processing multiple bits together (8-bit → 64-bit)

Instruction-Level

Multiple instructions executed simultaneously

Task-Level

Different tasks on different processors

Data-Level

Same operation on multiple data elements

📊 2. Flynn's Classification

Definition: Flynn's taxonomy classifies computer architectures based on the number of concurrent instruction streams (I) and data streams (D) that can be processed.

1. SISD (Single Instruction, Single Data)

RetryARContinue

Concept: Traditional von Neumann architecture

Working: One instruction on one data at a time

Control → Processing → Data

Examples: Intel 8086, Early processors

2. SIMD (Single Instruction, Multiple Data)

Concept: One instruction operates on multiple data simultaneously

Working: Vector/array processing

Single Instruction
↓
[PE1] [PE2] [PE3]
↓ ↓ ↓
Data1 Data2 Data3

Examples: GPUs, Intel SSE/AVX

Applications: Image processing, scientific computing

3. MISD (Multiple Instruction, Single Data)

Concept: Multiple instructions on same data

Working: Rarely implemented

Examples: Fault-tolerant systems

Note: Theoretical, not commonly used

4. MIMD (Multiple Instruction, Multiple Data)

Concept: Multiple processors, different instructions on different data

Working: True parallel computing

[CPU1] [CPU2] [CPU3]
↓ ↓ ↓
Data1 Data2 Data3

Examples: Multi-core CPUs, distributed systems

Applications: Web servers, databases

🔀 3. Instruction Level Parallelism (ILP)

Definition: ILP refers to the capability of CPU to execute multiple independent instructions simultaneously by overlapping their execution stages.

Techniques to Achieve ILP

Pipelining – Overlapping instruction stages
Superscalar – Multiple instructions per cycle
Out-of-Order Execution – Execute ready instructions first
Branch Prediction – Predict branch outcomes
Register Renaming – Eliminate false dependencies
Speculative Execution – Execute before knowing if needed

Example of Independent Instructions:

I1: R1 = R2 + R3  } Can
I2: R4 = R5 - R6  } execute
} parallel
I3: R7 = R1 * 2   } Must wait
for I1

🏭 4. Pipeline Processing

Definition: Pipeline processing divides instruction execution into several stages. Each stage works in parallel on different instructions, increasing total throughput.

Classic 5-Stage Pipeline

Stage 1: IF  - Instruction Fetch
Stage 2: ID  - Instruction Decode
Stage 3: EX  - Execute
Stage 4: MEM - Memory Access
Stage 5: WB  - Write Back

Pipeline Execution Diagram

Time→ 1 2 3 4 5 6
I1: IF ID EX ME WB
I2: IF ID EX ME WB
I3: IF ID EX ME
I4: IF ID EX

Analogy

Factory Assembly Line: Like car manufacturing where:

Worker 1: Installs chassis
Worker 2: Installs engine
Worker 3: Paints car
Worker 4: Installs wheels
Worker 5: Final inspection

All work simultaneously on different cars!

Performance Metrics

1. Throughput

Throughput = Instructions completed per unit time

2. Speedup

Speedup = Non-pipelined time / Pipelined time

3. Efficiency

Efficiency = (Speedup / Number of stages) × 100%

Example Calculation:

Given: 5-stage pipeline
Each stage = 10ns
Non-pipelined:
5 × 10ns = 50ns/instruction
10 inst = 500ns
Pipelined:
First inst: 50ns
Next 9: 9×10ns = 90ns
Total = 140ns
Speedup = 500/140 = 3.57×

Advantages:
✅ Increased throughput
✅ Better CPU utilization
✅ Faster execution
✅ No hardware duplication

Limitations:
❌ Pipeline hazards cause stalls
❌ Complex control logic
❌ Branch instructions create problems

⚠️ 5. Pipeline Hazards

Definition: Hazards are conditions that delay or disrupt smooth instruction flow through the pipeline, causing stalls or bubbles.

1. Structural Hazards

What is it?

Hardware resource conflict - two instructions need same resource at same time.

Example:

Problem:
I1 needs memory (data read)
I2 needs memory (instruction)
→ Only one memory port
→ CONFLICT!

Solutions:

✅ Duplicate hardware resources
✅ Separate instruction & data caches
✅ Pipeline reorganization
✅ Resource scheduling

2. Data Hazards

What is it?

Instruction depends on result of previous instruction that hasn't completed yet.

Types:

A) RAW (Read After Write) - True Dependency

I1: R2 = R1 + R3
I2: R4 = R2 + R5
← Needs R2 (not ready!)
MOST COMMON hazard

B) WAR (Write After Read) - Anti Dependency

I1: R4 = R1 + R5
I2: R1 = R2 + R3
← Writes R1 after I1 reads

C) WAW (Write After Write) - Output Dependency

I1: R2 = R1 + R3
I2: R2 = R4 + R5
← Both write to R2

Solutions:

1. Data Forwarding/Bypassing

Pass result directly from one stage to another

EX/MEM → directly to → EX
(Result available immediately)

2. Pipeline Stalling

Insert NOP (bubbles) to create delay

I1: R2 = R1 + R3
[NOP - bubble]
I2: R4 = R2 + R5  ← R2 ready!

3. Compiler Scheduling

Reorder instructions to avoid dependencies

Original:
I1: R2 = R1 + R3
I2: R4 = R2 + R5 ← depends
Reordered:
I1: R2 = R1 + R3
I3: R6 = R7 + R8 ← independent
I2: R4 = R2 + R5 ← I1 complete!

3. Control Hazards (Branch Hazards)

What is it?

Due to branch/jump instructions - pipeline doesn't know which instruction to fetch next.

Example:

I1: BEQ R1, R2, Label
← Branch instruction
I2: ADD R3, R4, R5
← May not execute
I3: SUB R6, R7, R8
...
Label: MUL R9, R10, R11
← Jump to here?
Problem: Fetch I2 or Label?
Don't know until I1 completes!

Solutions:

1. Branch Prediction

Static:

Always predict taken/not taken
Backward branches → taken (loops)
Forward branches → not taken

Dynamic:

Use branch history
1-bit: Remember last outcome
2-bit: Change after 2 mispredictions

2. Branch Delay Slot

Place useful instruction after branch

BEQ R1, R2, Label
ADD R3, R4, R5
← Delay slot (always executes)

3. Multiple Streams

Fetch from both paths simultaneously

4. Pipeline Flushing

If misprediction, flush wrong instructions and restart

Comparison of Hazards

Hazard	Cause	Main Solution	Impact
Structural	Resource conflict	Duplicate resources	Medium
Data (RAW)	Data dependency	Forwarding	Low-Medium
Control	Branch instructions	Branch prediction	High

📈 6. Performance Metrics

1. Throughput

Throughput is the number of instructions completed per unit time. Higher = Better.

Formula:
Throughput = Instructions / Time
Example:
100 instructions in 150ns
= 100/150ns
= 0.67 instructions/ns

2. Speedup

Speedup is ratio of performance improvement.

Formula:
Speedup = Time(non-pipelined)
/ Time(pipelined)
Ideal Speedup =
Number of stages (k)
Example:
Non-pipelined: 500ns
Pipelined: 140ns
Speedup = 500/140 = 3.57×

3. Efficiency

Efficiency measures how effectively pipeline stages are utilized.

Formula:
Efficiency =
(Actual Speedup/Ideal Speedup)
× 100%
Example:
Actual Speedup = 3.57
Ideal = 5 (5 stages)
Efficiency = (3.57/5) × 100%
= 71.4%

4. CPI (Cycles Per Instruction)

CPI indicates average cycles needed per instruction.

Formula:
CPI = Total Cycles
/ Number of Instructions
Ideal pipelined CPI = 1
(one instruction per cycle
after pipeline fills)

⚡ Quick Revision - Unit 3

🔹 I/O Subsystems - Flow

Peripheral → Interface → Async
→ Interrupt → Transfer Modes
→ DMA → IOP

🔹 Peripheral Devices

Type	Examples
Input	Keyboard, Mouse, Scanner
Output	Monitor, Printer, Speakers
Storage	HDD, SSD, USB
Communication	Modem, Router, NIC

🔹 I/O Interface Functions

Remember: CBSTCE

Command decoding
Buffering
Signal conversion
Timing & control
Control signals
Error detection

🔹 Asynchronous Transfer Methods

Strobe Control

✔ Single signal

✔ Simpler

✖ Less reliable

Handshaking

✔ Two-way acknowledgment

✔ More reliable

✔ Used in USB

🔹 Interrupt Types

Hardware ─── External devices
Software ─── Program instructions
Maskable ─── Can be disabled
Non-Maskable ─ Cannot disable
Vectored ─── ISR predefined
Non-Vectored ─ Device provides ISR

🔹 Data Transfer Modes - MOST IMPORTANT!

Mode	CPU Usage	Speed	Best For
Programmed I/O	CPU busy	Slowest	Simple devices
Interrupt-Driven	CPU free	Medium	Keyboard, mouse
DMA	CPU not used	Fastest	Disk, video

                    Remember: Programmed I/O = CPU busy | Interrupt = CPU free | DMA = CPU not involved
                

🔹 DMA Key Points

✅ Direct Memory Access
✅ CPU only initializes
✅ Data: Device ↔ Memory (no CPU)
✅ Interrupt on completion
✅ 3 modes: Burst, Cycle Stealing, Transparent

🔹 I/O Processor (IOP)

Dedicated processor for I/O
├─ Executes I/O instructions
├─ Manages multiple devices
├─ Performs buffering
└─ Reduces CPU workload

🔹 Parallel Processing

Goal: Many operations at same time → Faster execution

🔹 Flynn's Classification - Super Important!

Type	Example	Use Case
SISD	Old processors	Traditional
SIMD	GPU	Image processing
MISD	Rare	Fault-tolerant
MIMD	Multicore CPUs	Parallel computing

Easy Memory Trick: SISD = Simple | SIMD = GPU | MISD = Rare | MIMD = Multicore

🔹 Pipeline Processing

5 Stages:
IF → ID → EX → MEM → WB
IF  = Instruction Fetch
ID  = Instruction Decode
EX  = Execute
MEM = Memory Access
WB  = Write Back

Analogy: Assembly line in factory - each worker does one task, all work simultaneously!

🔹 Pipeline Hazards - Must Know!

Hazard	Cause	Solution
Structural	Resource conflict	Duplicate resources
Data (RAW)	Data dependency	Forwarding / Stalling
Control	Branch instructions	Branch prediction

🔹 Data Hazards Types

RAW (Read After Write)
← MOST COMMON
← True dependency
WAR (Write After Read)
← Anti-dependency
WAW (Write After Write)
← Output dependency

🔹 Performance Terms

Throughput

Instructions per time

Higher = Better

Speedup

Non-pipelined / Pipelined

Ideal = Stages

Efficiency

(Actual/Ideal) × 100%

Closer to 100% = Better

CPI

Cycles Per Instruction

Ideal pipelined = 1

🔹 Last-Minute Formula Sheet

✅ Speedup =
Time(non-pipe) / Time(pipe)
✅ Throughput =
Instructions / Time
✅ Efficiency =
(Actual Speedup / Ideal) × 100%
✅ CPI =
Total Cycles / Instructions
✅ Pipeline Time =
(k + n - 1) × tp
k=stages, n=instructions
tp=time per stage
✅ Non-pipeline Time =
n × k × tp

🔹 Important One-Liners for Exam

✔ DMA is fastest data transfer
✔ NMI cannot be disabled
✔ Handshaking more reliable than strobe
✔ SIMD used in GPUs
✔ RAW is most common data hazard
✔ Branch prediction solves control hazards
✔ Forwarding solves data hazards
✔ Ideal speedup = number of stages
✔ IOP reduces CPU workload
✔ Interrupt-driven better than programmed I/O

🔹 Exam Tips

If question asks:

"Fastest transfer?" → DMA
"CPU free?" → Interrupt/DMA
"Used in GPUs?" → SIMD
"Most common hazard?" → Data (RAW)
"Branch solution?" → Branch prediction
"Pipeline stages?" → IF, ID, EX, MEM, WB