[TriLUG] Questions about Threading

Tue Mar 26 14:16:29 EST 2002

On Tue, Mar 26, 2002 at 12:08:06PM -0500, Jeff Bollinger wrote:
> I'm not a programmer, but I am curious about threading and the advantage 
> to having a multi-processor linux box (I guess running an smp kernel). 
> Do programs have to be written to take advantage of a dual-processor, or 
> will cycles be distributed evenly during a process?  How do you know if 
> your program will support multiple processors?

The short answers:
1. Yes, programs must be written to take advantage of multiple processors.
Generally, this is done with child processes or kernel scheduled
thread libraries.  POSIX threads is the most common API for doing
threading.  Cycles are not automagically distributed across a process in
the sense that a single-threaded program can be running on both
processors at once.  The process may, however, switch back and forth
between processors.  

2. Linux does threading by using clone() to duplicate a process for
threading.  This creates a new process with its own PID that shares the
bulk of its resources with the the parent process.  If you see a program
spawn multiple processes with the same name (e.g. all those soffice.bin
processes), then you know it's using threads.

The much longer answer:
A single threaded program won't be able to take advantage of multiple
processors in the way you seek.  At best, it may get small gains from
the kernel's I/O threads.  For example, if the program writes data to a
file, the actual write() call will modify some file buffers in the
kernel and then return to processing while another processor handles the
flushing of those buffers to disk.  I say these gains are small because
the actual amount of time the processor spends on flushing buffers is
small: setting up and scheduling the DMA transfers, yielding the
processor to others while the DMA and drive controller hardware do their
thing, and then cleaning up when an interrupt signals the task is done.

If the program uses threads then it may or may not benefit from
additional processors depending on the implementation of the thread
library.  There are, to my knowledge, three styles of threading: process
scheduled, kernel scheduled and hybrid.

Process scheduled threads are more efficient for scheduling as the
scheduler runs in user-space.  Thus processes don't have to transition
into the kernel to communicate between each other or to schedule time on
the processor.  The downside is that all the threads are running in a
single process.  When one of the threads makes a blocking system call
all the threads are blocked.

The second way is to have the kernel schedule them.  In effect, the
kernel creates new processes which it independently schedules across all
processors.  This means that when one thread blocks the others are still
runnable, but the cost to schedule them thorough context switching is
considerably more expensive.  Communication, too, can be more expensive
requiring pages in processor caches to be invalidated and reloaded.

The third way is to hybridize the kernel and the userland scheduling.
The one I've been reading about recently is called "Scheduler
Activations." [1]  In effect, the kernel and the userland cooperate in
scheduling operations by letting the userland do the thread scheduling
and the kernel do the processor allocating.

In a nutshell, here's how it works.  The program, when it starts up, is
given a processor context.  This allows the process to run on one
processor.  When the process starts additional threads, it has the
option of asking the kernel for additional processor contexts.  The
additional processor contexts allow the program to use multiple
processors in the system across which to schedule its threads.  If the
program has no threads to assign to a processor context, the context is
returned to the kernel.

When one of the program's threads makes a blocking system call, the
kernel notifies the program that the call is blocking, and returns the
processor context back to the program for other threads to use.  When
the blocked call returns, the kernel interrupts one of the programs
processor contexts and notifies the program that a) this system call is
now returning, and b) this processor context was interrupted to bring
you this message.  The program is now free to reschedule the threads as
it sees fit.

The advantage here is that scheduler activations provides for the most
common uses of kernel threading (system calls that block) and process
threads (inexpensive and more intelligent thread switching.)

This is a simplification of the process.  I highly recommend reading the
papers below before commenting on it.

jf
[1] ftp://deas-ftp.harvard.edu/techreports/tr-31-95.ps.gz
[2] http://www.elfie.org/sa/p95-anderson.pdf
-- 
John Franklin
franklin at elfie.org
ICBM: N37 12'54", W80 27'14" Z+2100'