Threads

26 Jun 2019

What is a thread?

A thread is a logical flow or unit of execution that runs within the context of a process. It has its own program counter (PC), register state, and stack. In addition, it shares the memory address space with other threads in the same process (shares the same code, data, and resources e.g. open files). A thread is also called a lightweight process; it has low overhead compared to a separate process.

Thread Libraries

There are three main thread libraries in use today:

POSIX pthreads
- May be provided either as user-level or kernel-level.
- A POSIX standard API for thread creation and synchronization.
- API specifies behavior of the thread library, implementation is up to development of the library.
Win32
- Kernel-level library on Windows.
Java
- Java threads are managed by the JVM.
- Typically implemented using the threads model provided by underlying OS.

The pthreads API

Thread management: The first class of functions work directly on threads - creating, terminating, joining, etc.
Semaphores: Provide for create, destroy, wait, and post on semaphores.
Mutexes: provide for creating, destroying, locking, and unlocking mutexes.
Condition variables: include functions to create, destroy, wait, and signal based upon specified variable values.

Thread Creation

pthread_create(tid, attr, start_routine, arg);

Returns the new thread ID via the tid argument.
The attr parameter is used to set thread attributes, Null for the default values.
The start_routine is the C routine that the thread will execute once it is created.
A single argument may be passed to start_routine via arg. It must be passed by reference as a pointer cast of type void.

Thread Termination and Join

pthread_exit(value;)

Used by a thread to terminate.
The return value is passed as a pointer.

pthread_join(tid, value_ptr);

the pthread_join() subroutine blocks the calling thread until the specified threadid thread terminates.
Returns 0 on success, and negative on failure. The returned value is a pointer returned by reference. If you don’t care about the return value, you can pass NULL for the second argument.

User-Space Threads

User-space threads are usually cooperatively multitasked, i.e. user threads within a process voluntarily give up the CPU to each other. The threads will synchronize with each other via the user space threading package or library, which provides an interface to create, delete threads in the smae process. The OS is unaware of user-space threads; it only sees user-space processes. If one user-space thread blocks, the entire process blocks in a many-to-one scenario.

pthreads is a POSIX threading API

Implementations of the pthreads API differ underneath the API. They could be user space threads, but there is also pthreads support for kernel threads.
A user-space thread is also called a fiber (cooperativly multitasked)
A kernel-space thread is also called a lightweight process (preemptively multitasked)

Kernel Threads

Kernel threads are supported by the OS. The kernel sees these threads and schedules at the granularity of threads. Most modern OS like Linux, Mac OS X, and Windows support kernel threads. Mapping of user-level threads to kernel threads is usually one-to-one, e.g. Linux and Windows, but could be many-to-one, or many-to-many.

Benefits of Multithreaded Architecture

Responsiveness
- Inter-thread communication is easier and faster than inter-process communication.
- Useful for interactive applications.
Resource sharing
- Memory and resources are shared within a process.
Low context-switching overhead
- Low creating and managing overhead.
Scalability
- Threads can run in parallel with a multicore system.

These benefits are accompanied by costs that include increased difficulty for the programmer/system designer, who has to identify the taskes to divide, balance the load on different cores, figure out how to split the data, and identify synchronization mechanisms. Testing and debugging also becomes difficult due to many different path of executions that need to be considered.

So, why are processes still used when threads bring so many advantages?

Some tasks are sequential and not easily parallelizable, and hence are single-threaded by nature.
There is no fault isolation between threads.
- If a thread crashes, it can bring down other threads.
- If a process crashes, other processes continue to execute, because each process operates within its own address space. One process crashing has limited effect on another process.
  - Caveat: a crashed process may fail to release synchronization locks, open files, etc., thus affecting other processes. However, the OS can use the PCB’s information to help cleanly recover from a crash and free up resources.
Writing thread-safe/reentrant code is difficult. Processes can avoid this by having separate address spaces and separate copies of the data and heap.

Thread Safety

Code is Thread safe if it behaves correctly during simultaneous or concurrent exection by multiple threads. It must satisfy the need for multiple threads to access the same shared data, and the need for a shared piece of data to be accessed bgy only one thread at any given time.

If two threads share and execute the same code, then unprotected use of shared global, static, or heap variables is not thread safe.

Code is Reentrant if it behaves correctly whan a single thread that is interrupted in the middle of executing reenters the same code later.