Device Strategies

12 Jun 2019

The Problem with Polling I/O

We can see that the OS is spinning in a loop twice:

Checking for the device to become idle.
Checking for the device to finish the I/O request, so that the results can be retrieved. We call this busy waiting; CPU cycles that could be devoted to executing applications are wasted. Instead, we want to overlap the CPU and I/O to free up the CPU while the I/O device is processing a read/write.

Device Manager I/O Strategies

Underneath the blocking/non-blocking synchonous/asynchonous system call API, the OS can implement several strategies for I/O with devices:

Direct I/O with polling - the device manager busy-waits.
Direct I/O with interrrupts - more efficient than busy waiting
DMA (direct memory access) with interrupts

Hardware Interrupts

The CPU incorporates a hardware interrupt flag. Whenever a device is finished with a read/write, it communicates to the CPU and raises the flag. This frees up the CPU to execute other tasks without needing to keep polling devices. Upon an interrupt, the CPU interrrupts normal execution and invokes the OS’s interrupt handler. Eventually, after the interrupt is handled and the I/O results processed, the OS resumes normal execution.

Interrupt Handler

Disable interrupts
Save the processor state
- Save the executing app’s program counter (PC) and CPU register data.
Find the device causing the interrupt
- Ask the interrupt controller to find the interrupt offset, or poll the devices.
Jump to the appropriate device handler
- Index into the Interrupt Vector using the interrupt offset.
- An Interrupt Service Routine (ISR) either refers to the interrupt handler, or the device handler.
Re-enable interrupts

When is polling better than interrupt handling?

Setting up and handling the interrupts takes overhead; so does handling the scheduling of the processes. If it is always a short wait for the I/O, then polling is the better option. Same if the wait is predictable.

The problem with interrupt driven I/O is that data transfer from disk can become a bottleneck if there is a lot of I/O copying data back and forth between memory and devices.

Example:

Say we read a 1 MB file from disk into memory. The disk is only capable of delivering 1 KB blocks so, every time a 1 KB block is ready to be copied, an interrupt is raised, interrupting the CPU. This slows down the execution of normal programs and the OS. In the worst case, the CPU could be interrupted after the transfer of every byte/character.

Direct Memory Access (DMA)

The idea behind DMA is to bypass the CPU for large data copies, and only raise an interrupt at the very end of the data transfer, instead of at every intermediate block. Modern systems offload some of this work to a special purpose processor: the Direct-Memory-Access (DMA) controller. The DMA controller operates the memory bus directly, placing addresses on the bus to perform transfers without the help of the main CPU.

Since both the CPU and the DMA controller have to move data to/from main memory, how do they share main memory?

Burst mode
- While DMA is transferring, CPU is blocked from accessing memory.
Interleaved mode or “cycle stealing”
- DMA transfers one word to/from memory, then CPU accesses memory, then DMA, then CPU, etc… - interleaved
Transparent mode
- DMA only transfers when the CPU is not using the system bus. This is the most efficient, but it’s difficult to detect.

Port-Mapped I/O

Port-mapped (non-memory mapped) I/O typically requires special I/O machine instructions to read/write from/to device controller registers.

e.g. Intel x86 CPUs have IN/OUT

OUT dest, src - writes to a device port dest from CPU register src
IN dest, src - reads from a device port src to CPU register src Only an OS in kernel mode can execute these instructions. Later Intel introduced INS, OUTS (for strings), and INSB/INSW/INSD (different word widths, etc.)

The disadvantage of Port-mapped I/O is that it is pretty limited. IN and OUT can only store and load. It also doesn’t have the full range of memory operations for normal CPU instructions.

Memory-Mapped I/O

Memory-mapped I/O is where device registers and device memory are mapped to the system address space (the system’s memory). With memory-mapped I/O, we can address memory directly using normal instructions to speak to an I/O address.

e.g. lead R3, 0xC0FF01 -> the memory address 0xC0FF01 is mapped to an I/O device’s register. The Memory Management Unit (MMU) maps memory values and data to/from device registers. Device registers are assigned to a block of memory. When a value is written into that I/O-mapped memory, the device sees the value, loads the appropriate value, and executes the appropriate command to reflect on the device.

Typically, devices are mapped into lower memory. Frame buffers for displays take the most memory, since most other devices have smaller buffers, but even a large display might only take 10-100 MB of memory, which in modest in the modern address spaces that are in the GBs.

What is the difference between Port and Memory Mapped I/O?

Port-mapped I/O uses a separate, dedicated address space and is accessed via a dedicated set of microprocessor instructions.
Memory-mapped I/O is mapped into the same address space as program memory and/or user memory, and is accessed in the normal way.