[TriLUG] Questions about Threading

Tue Mar 26 13:43:02 EST 2002

On Tue, 2002-03-26 at 10:50, Jeremy P wrote:
> On Tue, 26 Mar 2002, Jeff Bollinger wrote:
> 
> > I'm not a programmer, but I am curious about threading and the
> > advantage to having a multi-processor linux box (I guess running an
> > smp kernel).  Do programs have to be written to take advantage of a
> > dual-processor, or will cycles be distributed evenly during a process?  
> > How do you know if your program will support multiple processors?
> 
> This is my general, non-programming understanding of this; hopefully
> someone can correct any misconceptions.
> 
> Programs can be written to take direct advantage of multiple processors,
> but as far as I know, most for Linux don't.  There seems to be some
> argument about different threading models, and until that's sorted out, we
> probably won't see much multithreaded stuff.  The problem is that these
> things are very different among the Unix platforms, and most open source
> projects want their software to work on Linux, Solaris, BSD, etc. -- and
> each has very different systems for multithreaded processes.

Yes and no.  Essentially all Unixes, including Linux, now have fairly
decent support for the Posix threading model.

Note that Linus defined a thread as a CoE (Context of Execution) and
threads in Linux are implemented using a clone() system call that is
nearly identical to a fork() so that Linux threads essentially *are* 
processes.  The reasons for this choice were:

 - full processes in Linux are already *very* lightweight and 
   incur a (provably) minimal penalty due to context switches 
   on many architectures (eg. context switch for linux for x86 
   is something like 50-100x faster than solaris for x86)

 - treating threads as processes within the kernel allows them 
   to be easily and cheaply scheduled on both uni- and multi-proc 
   systems

 - treating threads within the kernel as processes helps keep 
   the kernel (scheduling, MM, etc) code much simpler

> My understanding is that "symmetric" multi-processing, the S in SMP,
> implies that each heavyweight (non-threaded) process is assigned to a

No.  "Symmetric" means the processors are "equal" in that they all have
equal access to the system memory.  An example of non-SMP systems is
NUMA (Non-Uniform Memory Access) where the processors have private
memory ranges and must query each other for memory addresses outside
their local chunk.

> given processor.  If a computer only runs one single-process program, this
> does you no good.  But since programs like Apache often use many
> sub-processes, the scheduler distributes the load pretty evenly by
> assigning some to one processor, and some to another.  Also, the kernel
> apparently can easily switch the processes around to balance the load.  
> If you run "top" on a multi-processor system, you can monitor this; on a
> loaded system the CPU% will add up to n*100%, with 100% for each
> processor.

A multi-processor box *will* help a little even in the case that your
code is not threaded.  The OS and daemon CPU-load will tend to run on
the processor that isn't, at any given moment, running your non-threaded
app, thus giving it a (slight?) performance boost.

And note that processes are not assigned to CPUs in any lasting sense. 
The OS scheduler may try to keep a process on a particular CPU (cache
affinity) but, in general, processes rapidly bounce in and out of
context on both uni-proc and SMP systems.

Ed

-- 
Edward H. Hill III, PhD
Post-Doctoral Researcher   |  Email:       ed at eh3.com, ehill at mines.edu
Division of ESE            |  URL:         http://www.eh3.com
Colorado School of Mines   |  Phone:       303-273-3483
Golden, CO  80401          |  Fax:         303-273-3311
Key fingerprint = 5BDE 4DA1 66BE 4F7B BC17  3A0C 932B 7266 1E76 F123