CSC/ECE 506 Fall 2007/wiki4 7 jp07
Helper Threads
One of the problems when using parallel machines is that the machine is only trying to execute sequential code. Therefore, much of the benefit of having the ability to run multiple threads simultaneously is lost. This is true of many multi-threading paradigms including Simultaneous Multithread Systems (SMTs) (link), Symmetric Multiprocessors (SMPs) (link), and Chip Multiprocessors (CMPs) (link).
The natural solution it seems would be to rewrite or recompile the programs to make use of parallel execution. But, in some cases this may be too time consuming or even unfeasible due to the nature of the program. Therefore, there is a middle ground where the program is not truly parallelized but the multithreading capabilities are utilized to improve execution time. This technique is known as helper threads.
Helper threads run in parallel to the main thread, and do work for the main thread to improve it's performance [Olokuton]. Typically these threads will execute parts of the program "ahead" of the main thread, in an attempt to predict branches and/or values before the main thread completes. This is done to help shadow the penalty of long latency instructions. The figure below illustrates the basic concepts of helper thread execution.
Note that in a CMP the two contexts would be separate chips, or in an SMT they would be separate thread contexts. The main sequential program will have some knowledge from previous training or via the compiler that a potential long latency instruction such as a cache miss is upcoming. Through some history or prior knowledge the helper thread runs ahead of the main thread executing the instruction vital only to the long-latency instruction. This thread completes ahead of the main thread, so that when the main thread finally reaches the long latency instruction, the helper thread can forward the computed result.
Applications of Helper Threads
Speculative Branch Prediction
One class of helper thread applications is prediction helper threads early [1] The technique proposed uses Simultaneous Subordinate Multithreading (SSMT) to run helper threads called "microthreads". These threads consist of microcode, which is code written specifically for manipulating hardware structures within the processor. A SPAWN instruction within the program is used to indicate when a microthread should be initiated. For the branch prediction mechanism the scheme dynamically decides on likely mispredicted branches and constructs microthreads to predict these branches.
Difficult to predict branches are determined using a path cache that stores information about previous branch mispredictions and tracks the difficulty of these branches to predict. Before a microthread is created a branch must go through a training interval where the difficulty is determined.
The microthreads are created using a post retirement buffer (PRB). Upon retirement, instructions are inserted into the post retirement buffer along with dependence information. When a difficult branch retires a scanner scans the PRB for the recent dependent instructions that the branch depended on and creates a microthread. This microthread is stored and used when the microthread is spawned before the next branch occurs. This way the helper thread computes the branch target before the actual thread reaches the branch.
Speculative Value Prediction
Slipstream Technology
Pre-computation Slices
Additional Links
- Dynamic Speculative Precomputation
- Slipstream Processors
- A Survey on Helper Threads
- HeapMON -- Helper Thread Bug Detection Scheme with contributions from NCSU
- Helper Threads via Virtual Multithreading
- Difficult Path Branch Prediction using Helper Threads
- Compiler Support for Helper Threading