CSC/ECE 506 Fall 2007/wiki2 helperThreads
A helper thread is a thread that does some of the work of the main thread in advance of the main thread so that the main thread can work more quickly. The Olukotun text only scratches the surface on all the different ways that helper threads can be used. Survey these ways, making sure to include the Slipstream approach developed at NCSU.
What is a helper thread?
The potential of chip multiprocessors (CMPs) can be exploited in a better way if the applications are divided into semi - independent parts, or threads that can operate simultaneously across the processors within a system. The simplest way to use parallel threads within a CMP to increase the performance of a single thread is to have helper threads. A 'helper' thread performs work on behalf of main thread in an effort to accelerate its performance.
Uses of helper threads
Predicting branch at early stages
Helper threads are made up of program copies of the main thread stripped of all unessential parts which are not of absolute necessity for the helper thread to achieve it tasks. Hence helper threads runs ahead of main thread. Hence helper threads run ahead of main thread. Hence helper threads can do computations that are necessary to decide the direction of the branch. This will inturn remove branch mispredictions which may occur when branch predictions are used.
Prefetching of Data Since helper threads run ahead of main thread, they can predict the which data in memory is needed in future by the main thread.they prefetch the data required by the main thread. The helper thread prefetches the data and places it in the nearest cache accessible to the main thread. This avoids most of the L1 cache misses that would have been encountered by the main thread had if helper thread was not present. As the memory latency is reduced, the execution of main thread speeds up.
Memory Bug Reduction:
Memory related bugs such as reads from unitialised memory, read or writes using dangling pointers or
memory leaks are difficult to detect by code inspection because they may invoive different code fragments and exist in differnt modules or source code files. Compilers are of little help becuase it fails to disambiguate pointers. Hence, in practice memory bug detection relies on run time checkers that insert monitor code to the application during testing. In CMPs, the program which detects the memory bugs can run as a helper thread.
Ex: Heapmon, a memory bug checker monitors application heap space to detect heap memory bugs. The memory
access events in the application thread are automatically forwarded to helper thread using appropriate hardware mechanisms. Also the redundant and unnecessary memory accessess are filtered out. Hence the helper thread approach completely decouples bug monitoring from appication execution and the filtering mechanism reduces bug - check frequency. Conseqently, HeapMon achieves a very low performance overhead.
Fault tolerance:
The helper thread is a subset of main program. This partial redundancy can be used for detecting and recovering
from transient hardware faults.
Slipstream approach for helper threads:
Disadvantages of helper threads:
A very tight synchronisation is needed between the main thread and the helper threads in oder to keep them the proper distance ahead of main thread. If the helper threads are too far ahead, then they will cause cache thrashing by prefetching data and then replaing it with subsequent prefetches beofre the main thread can even use it. If they are not far enough ahead, they might not be able to prefetch cache lines in time.
Only certain types of single threaded programs can be sped up with these techniques. Most integer applications have regular data access patterns and hence there will be only few misses to eliminate. The applications with larger memory footprints are easily parellelisable and hence doesn't run on a single main thread. I floating point applications the data access patterns are fairly regular and hence can be prefetched easily using hardware or software prefetch mechanisms. Hence the range of programs that can be accelerated using helper threads is fairly limited.