CSC/ECE 506 Fall 2007: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
 
(32 intermediate revisions by 4 users not shown)
Line 16: Line 16:
* '''''Sections 1.1 and 1.1.2'''''
* '''''Sections 1.1 and 1.1.2'''''
** Update performance trends in multiprocessors.
** Update performance trends in multiprocessors.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1_Introduction Introduction] - This summary gives a brief introduction to Parallel Programming.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1_Introduction Introduction to Parallel Programming] - This summary gives a brief introduction to Parallel Programming.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_1_ab Performance trends in multiprocessors] - This summary discusses Moore's Law in the future and multiprocessor architecture's price vs. performance. It also concludes on how the relationship between the development of microprocessors and Moore's Law will be affected in the future.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_1_ab Performance trends in multiprocessors] - This summary discusses Moore's Law in the future and multiprocessor architecture's price vs. performance. It also concludes on how the relationship between the development of microprocessors and Moore's Law will be affected in the future.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/Chapter_6_Multi_Core_Architecture


* '''''Section 1.1.1, first half: Scientific/engineering application trends'''''   
* '''''Section 1.1.1, first half: Scientific/engineering application trends'''''   
Line 25: Line 26:
** How much memory, processor time, etc.?   
** How much memory, processor time, etc.?   
** How high is the speedup?
** How high is the speedup?
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_2_3K8i Scientific/engineering application trends] - This summary discusses trends in scientific and engineering computing, hardware trends, software trends, and A=applications for HPC.  
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_2_3K8i Scientific/engineering application trends] - This summary discusses trends in scientific and engineering computing, hardware trends, software trends, and applications for HPC.  
 


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_2_cv High Performance Computing Trends in Scientific and Engineering Applications] - This summary discusses high performance computing trends in scientific and engineering applications, hardware used in high performance computing, and software applications.  
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_2_cv High Performance Computing Trends in Scientific and Engineering Applications] - This summary discusses high performance computing trends in scientific and engineering applications, hardware used in high performance computing, and software applications.  
Line 32: Line 32:


* '''''Section 1.1.1, second half: Commercial application trends '''''   
* '''''Section 1.1.1, second half: Commercial application trends '''''   
** What characterizes present-day applications?   
** What characterizes present-day ?   
** How much memory, processor time, etc.?   
** How much memory, processor time, etc.?   
** How high is the speedup?
** How high is the speedup?
Line 66: Line 66:
** How has data parallelism found its way into shared-memory and message-passing machines?  An early example would be MMX.   
** How has data parallelism found its way into shared-memory and message-passing machines?  An early example would be MMX.   
** Would you change the number of layers in Fig. 1.13?
** Would you change the number of layers in Fig. 1.13?
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.2_Message_Passing Introduction to Message Passing] - This summary gives a brief introduction to Message Passing.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.2_Message_Passing Introduction to Message Passing] - This summary gives a brief introduction to Message Passing.


Line 93: Line 91:
** Are blade servers an extension of message passing?   
** Are blade servers an extension of message passing?   
** How have blade architectures evolved over the past 10 years?
** How have blade architectures evolved over the past 10 years?
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_8_s5 Latest Developments in Message Passing] - This summary defines the message passing model, the latest developments in message passing.  It also highlights the advantages and implementation of  Message Passing Interface (MPI).  It also explores the blade servers' architecture, evolution, and future.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/2.1_Introduction Introduction to Blade Servers] - This summary give a brief introduction to blade servers.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/2.1_Introduction Introduction to Blade Servers] - This summary give a brief introduction to blade servers.
Line 101: Line 100:


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/2.2_Evolution Evolution of Blade Servers] - This summary simply focuses on the evolution from standalone conventional server to the blade servers that have become popular today.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/2.2_Evolution Evolution of Blade Servers] - This summary simply focuses on the evolution from standalone conventional server to the blade servers that have become popular today.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/2.4_Blade_Enclosure Enclosure] - This summary simply focuses on the technology of blade enclosures.  It explores the aspect of power, cooling, networking, and storage for a blade enclosure.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/2.4_Blade_Enclosure Enclosure] - This summary simply focuses on the technology of blade enclosures.  It explores the aspect of power, cooling, networking, and storage for a blade enclosure.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/2.5_Advantages_of_Blade_Servers Advantages of Blade Servers] - This summary simply focuses on the numerous advantages of blade servers: Reduced Space Requirements, Reduced Power Consumption and Improved Power Management, Lower Management Cost, Simplified Cabling, Future Proofing Through Modularity, and Easier Physical Deployment.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/2.5_Advantages_of_Blade_Servers Advantages of Blade Servers] - This summary simply focuses on the numerous advantages of blade servers: Reduced Space Requirements, Reduced Power Consumption and Improved Power Management, Lower Management Cost, Simplified Cabling, Future Proofing Through Modularity, and Easier Physical Deployment.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/2.6_Blade_Servers_and_Message_passing How Blade Servers are an Extension of Message Passing] - This summary simply focuses on how blade servers an extension of message passing.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/2.6_Blade_Servers_and_Message_passing How Blade Servers are an Extension of Message Passing] - This summary simply focuses on how blade servers an extension of message passing.




* '''''Section 1.2.5:  Trends in vector processing and array processing.'''''   
* '''''Section 1.2.5:  Trends in vector processing and array processing.'''''   
** New machines have recently been announced.  Why will this be an important architectural dimension in the coming years?
** New machines have recently been announced.  Why will this be an important architectural dimension in the coming years?
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_9_arubha Trends in vector processing and array processing - Summary 2] - This summary highlights current trends, past trends and emerging trends in vector processing and array processing. It also discusses the advantages of vector processing and the pitfalls of vector processing as well.


 
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_9_vr Vector processing and array processing] - This summary provides a detail description, the history, and definition of vector and array processors. It highlights the future and new trends in vector processing and array processing.  
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_9_arubha Trends in vector processing and array processing - Summary 2] - This summary highlights current trends, past trends and emerging trends in vector processing and array processing. It also discusses the advantages of vector processing and the pitfalls of vector processing as well.




Line 125: Line 119:
** Or if not, why are these styles not evolving with time?
** Or if not, why are these styles not evolving with time?


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_10_aj Dataflow and Systolic Architectures - Summary 1] - This summary give a detailed description of the new developments in dataflow and systolic architectures. It even explores why systolic architecture has not truly evolved with time (to the extent of other architectures).
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_10_aj New Developments in Dataflow and Systolic Architectures] - This summary give a detailed description of the new developments in dataflow and systolic architectures. It even explores why systolic architecture has not truly evolved with time (to the extent of other architectures).


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_10_mt Dataflow and Systolic Architectures - Summary 2] - This summary give a detailed description of the new developments in dataflow and systolic architectures. It also looks at the current state of both dataflow architectures and systolic architectures. It even explores several papers that propose different applications for systolic architecture.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_10_mt Dataflow & Systolic Architectures] - This summary give a detailed description of the new developments in dataflow and systolic architectures. It also looks at the current state of both dataflow architectures and systolic architectures. It even explores several papers that propose different applications for systolic architecture.




Line 135: Line 129:
** I doubt that other topics covered in these sections have changed much, but do check.
** I doubt that other topics covered in these sections have changed much, but do check.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_1_11 ] - This summary gives a brief overview of the SSCI Protocol, a brief overview of the SCI Protocol, and discusses why additional states are needed.  
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki_11_e4 Communication and programming model] - This summary provides detailed description of ordering and synchronization of communication and programming model. It also provides multiple external references.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_2_tl Communication and programming models - Summary 2] - This summary gives a detailed description of directory-based cache coherence. It also explores Simple Scalable Coherent Interface (SSCI) and the Scalable Coherent Interface (SCI).
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_1_satkar Communication and programming models - Summary 3] - This summary gives a detailed description of true sharing and false sharing. It discusses the problem with false sharing, strategies to combat false sharing, and diminishing true-sharing misses. 




* '''''Sections 1.3.3 and 1.3.4:  Most changes here are probably related to performance metrics. '''''  
* '''''Sections 1.3.3 and 1.3.4:  Most changes here are probably related to performance metrics. '''''  
** Cite other models for measuring artifacts such as data-transfer time, overhead, occupancy, and communication cost. Focus on the models that are most useful in practice.
** Cite other models for measuring artifacts such as data-transfer time, overhead, occupancy, and communication cost. Focus on the models that are most useful in practice.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_1.3.3_1.3.4_chase2007 Fundamental Design Issues - Communication and Replication] - This summary give a detailed description of communication and replication. It also explores computer  performance, overhead and occupancy, performance metrics, data transfer time, and communication cost.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_1.3.3_1.3.4_chase2007 Fundamental Design Issues - Communication and Replication] - This summary give a detailed description of communication and replication. It also explores computer  performance, overhead and occupancy, performance metrics, data transfer time, and communication cost.


Line 161: Line 150:


=== Topics ===
=== Topics ===
* '''''Sections 1.1 and 1.1.2'''''
* '''''Parallelizing an application'''''
** Update performance trends in multiprocessors.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_1_ab Performance trends in multiprocessors] - This summary discusses Moore's Law in the future and multiprocessor architecture's price vs. performance. It also concludes on how the relationship between the development of microprocessors and Moore's Law will be affected in the future.
 
 
* '''''Animations'''''
** Do an animation of how consistency can be violated given a particular code sequence from the textbook (which sequence will be named on the signup sheet).
** Show how multilevel inclusion in caches interacts with cache coherence.  Specifically, take a code sequence that results in level-2 cache misses in at least two processors, and show what information is transferred to which cache levels, assuming that the L1 cache is direct mapped and the L2 cache is two-way associative.
 
* '''''Wiki page'''''
** Pick another parallel application, not covered in the text, and less than 7 years old, and describe the various steps in parallelizing it (decomposition, assignment, orchestration, and mapping).  You may use an example from the peer-reviewed literature, or a Web page.  You do not have to go into great detail, but you should describe enough about these four stages to make the algorithm interesting.
** Pick another parallel application, not covered in the text, and less than 7 years old, and describe the various steps in parallelizing it (decomposition, assignment, orchestration, and mapping).  You may use an example from the peer-reviewed literature, or a Web page.  You do not have to go into great detail, but you should describe enough about these four stages to make the algorithm interesting.
** Create a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used. Compare this with two or three recent single-core designs.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_4_LA LAMMPS and a Flowchart of Molecular Dynamics Sequential code] - This summary explores LAMMPS (Large Scale Atomic/Molecular Massively Parallel System) algorithm, the sequential algorithm. It also explores the concepts of Decomposition & Assignment, Orchestration, and Mapping for the LAMMPS programming model.
** MSIMD architectures have garnered quite a bit of contention recently. Read a few papers on these architectures and write a survey of applications for which they would be suitable.  If possible, talk about the steps in parallelizing these applications (decomposition, assignment, orchestration, and mapping).
** On p. 300 of the test, cache-to-cache sharing is introduced. If a cache has an up-to-date copy of a block, should it supply it, or should it wait for memory to do it?  What do current multiprocessors do?  In current machines, is cache-to-cache sharing faster or slower than waiting for memory to respond?


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_4_BV MapReduce] - This summary explores MapReduce, a programming model. It also explores the concepts of Decomposition & Assignment, Orchestration, and Mapping for the MapReduce programming model.


4: Parallelizing an application
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_4_md Shuffled Complex Evolution Metropolis (SCEM-UA)] - This summary explores Shuffled Complex Evolution Metropolis (SCEM-UA), a programming model. It also explores the concepts of Decomposition & Assignment, Orchestration, and Mapping for the Shuffled Complex Evolution Metropolis programming model.
5: Cache sizes in multicore architecture
6: MSIMD applications
7: Cache-to-cache sharing




* '''''Cache sizes in multicore architectures '''''
** Create a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used.  Compare this with two or three recent single-core designs.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_5_as Basic Summary of Cache Sizes in Multicore Architectures] - This summary created a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/0321-2670057/wiki2_5_089321 Cache Organization in Multicore Processor] - This summary defines a multi-core processor and cache organization in multicore, creates a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used.


* '''''Sections 2.2, 2.2.1 and 2.2.2'''''
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_5_as Cache Sizes in Multicore Architectures] - This summary highlights cache sizes in multicore architectures. It creates a table containing details of the current multi-core processor architectures along with their intricate details like number of levels, cache size, etc. It also has multiple external references.
** Special Topic: Parallelizing an application


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_3_pa LAMMPS and a Flowchart of Molecular Dynamics Sequential code] - This summary picks a parallel application, not covered in the text, and less than 7 years old, and describe the various steps in parallelizing it (decomposition, assignment, orchestration, and mapping). It also explores LAMMPS (Large Scale Atomic/Molecular Massively Parallel System) algorithm, the sequential algorithm. It also explores the concepts of Decomposition & Assignment, Orchestration, and Mapping for the LAMMPS programming model.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_4_BV MapReduce] - This summary explores MapReduce, a programming model. It also explores the concepts of Decomposition & Assignment, Orchestration, and Mapping for the MapReduce programming model.
* '''''MSIMD architectures and applications'''''
** MSIMD architectures have garnered quite a bit of contention recently.  Read a few papers on these architectures and write a survey of applications for which they would be suitable.  If possible, talk about the steps in parallelizing these applications (decomposition, assignment, orchestration, and mapping).
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_3_pb MSIMD architectures] - This summary introduces introduces MSIMD architectures and the problems that lie within this architecture. It also explores 3-D Wafer Stack Neurocomputing, Neural Network Vision application, resource optimization of a parallel computer for multiple vector processing, a simulation of a MSIMD System with resequencing, and SIMD Architecture for feature tracking.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_4_md Shuffled Complex Evolution Metropolis (SCEM-UA)] - This summary explores Shuffled Complex Evolution Metropolis (SCEM-UA), a programming model. It also explores the concepts of Decomposition & Assignment, Orchestration, and Mapping for the Shuffled Complex Evolution Metropolis programming model.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_6_sbh Applications of the MSIMD architecture] - This summary highlights applications of the MSIMD architecture (The GPA Machine and The Warwick Pyramid Machine). It also explores the Artificial Neural Networks.




 
* '''''Cache-to-cache sharing'''''
* '''''Sections 2.2, 2.2.1 and 2.2.2'''''
** On p. 300 of the text, cache-to-cache sharing is introduced. If a cache has an up-to-date copy of a block, should it supply it, or should it wait for memory to do it?  What do current multiprocessors do?  In current machines, is cache-to-cache sharing faster or slower than waiting for memory to respond?
** Special Topic: Cache sizes in multicore architectures
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_7_amassg Cache-to-cache sharing] - This summary introduces the need for cache-to-cache sharing and examines its disadvantages.  It also explores the current uses of cache-to-cache sharing.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_5_as Cache sizes in multicore architectures] - This summary created a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used. It also compares current multicore architectures with two or three recent single-core designs.
 
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/0321-2670057/wiki2_5_089321 Cache sizes in multicore architectures] - This summary created a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used. It also compares current multicore architectures with two or three recent single-core designs.
 
 
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_5_as Trends in vector processing and array processing - Summary 1] - This summary highlights cache sizes in multicore architectures.


= Peer-reviewed Assignment 3 =
= Peer-reviewed Assignment 3 =
Line 220: Line 194:


* '''''True and false sharing'''''
* '''''True and false sharing'''''
'''''True and false sharing.  In Lectures 9 and 10, we covered performance results for true- and false-sharing misses. The results showed that some applications experienced degradation due to false sharing, and that this problem was greater with larger cache lines.  But these data are at least 9 years old, and for multiprocessors that are smaller than those in use today.  Comb the ACM Digital Library, IEEE Xplore, and the Web for more up-to-date results.  What strategies have proven successful in combating false sharing? Is there any research into ways of diminishing true-sharing misses, e.g., by locating communicating processes on the same processor?  Wouldn't this diminish parallelism and thus hurt performance?'''''
** True and false sharing.  In Lectures 9 and 10, we covered performance results for true- and false-sharing misses. The results showed that some applications experienced degradation due to false sharing, and that this problem was greater with larger cache lines.  But these data are at least 9 years old, and for multiprocessors that are smaller than those in use today.  Comb the ACM Digital Library, IEEE Xplore, and the Web for more up-to-date results.  What strategies have proven successful in combating false sharing? Is there any research into ways of diminishing true-sharing misses, e.g., by locating communicating processes on the same processor?  Wouldn't this diminish parallelism and thus hurt performance?


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_1_satkar False Sharing] - This summary gives a detailed description of true sharing and false sharing. It discusses the problem with false sharing, strategies to combat false sharing, and diminishing true-sharing misses.  
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_1_satkar Diminishing True and False Sharing] - This summary gives a detailed description of true sharing and false sharing. It discusses the problem with false sharing, strategies to combat false sharing, and diminishing true-sharing misses.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_1_ncdt Techniques to reduce false sharing misses and techniques to reduce true sharing misses] - This summary introduces the concepts of both true sharing and false sharing. It also explores the effects of false sharing, techniques to reduce false sharing misses, and techniques to reduce true sharing misses. 
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_7_qaz False Sharing Miss and Miss Caused due to True Sharing] - This summary explores false sharing misses and misses caused due to true sharing.






* '''''Simple Scalable Coherent Interface (SSCI) and the Scalable Coherent Interface (SCI)'''''
* '''''Simple Scalable Coherent Interface (SSCI) and the Scalable Coherent Interface (SCI)'''''
'''''SCI.  The IEEE Scalable Coherent Interface is a superset of the SSCI protocol we have been considering in class.  A lot has been written about it, but it is still difficult to comprehend.  Using SSCI as a starting point, explain why additional states are necessary, and give (or cite) examples that demonstrate how they work.  Ideally, this would still be an overview of the working of the protocol, referencing more detailed documentation on the Web.'''''
** SCI.  The IEEE Scalable Coherent Interface is a superset of the SSCI protocol we have been considering in class.  A lot has been written about it, but it is still difficult to comprehend.  Using SSCI as a starting point, explain why additional states are necessary, and give (or cite) examples that demonstrate how they work.  Ideally, this would still be an overview of the working of the protocol, referencing more detailed documentation on the Web.


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_2_tl Simple Scalable Coherent Interface (SSCI)/Scalable Coherent Interface (SCI)/Directory-based cache coherence] - This summary gives a detailed description of directory-based cache coherence. It also explores Simple Scalable Coherent Interface (SSCI) and the Scalable Coherent Interface (SCI).  
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_2_tl Simple Scalable Coherent Interface (SSCI)/Scalable Coherent Interface (SCI)/Directory-based cache coherence] - This summary gives a detailed description of directory-based cache coherence. It also explores Simple Scalable Coherent Interface (SSCI) and the Scalable Coherent Interface (SCI).  


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_2_aY3w SSCI Protocol/SCI Protocol/Discussion of necessary additional states] - This summary gives a brief overview of the SSCI Protocol, a brief overview of the SCI Protocol, and discusses why additional states are needed.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_2_aY3w SSCI Protocol/SCI Protocol/Discussion of necessary additional states] - This summary gives a brief overview of the SSCI Protocol, a brief overview of the SCI Protocol, and discusses why additional states are needed.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_2_1r Examples of SSCI Protocol] - This summary gives a brief overview of the Simple Scalable Coherent Interface (SSCI) Protocol, a brief overview of the Scalable Coherent Interface (SCI) Protocol, and discusses why additional states are needed. It also explores the SSCI and the SCI protocols.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki3_8_a1 SSCI and IEE SCI Protocol Similarities] - This summary gives a brief overview of the Simple Scalable Coherent Interface (SSCI) Protocol, explores the 2 main configurations of the Scalable Coherent Interface (SCI) Protocol, and discusses why additional states are needed. It also discusses the similarities between the SSCI and the SCI protocols.


= Peer-reviewed Assignment 4 =
= Peer-reviewed Assignment 4 =
Line 242: Line 223:
* 12/03/2007 Peer-reviewed 1 Resubmission
* 12/03/2007 Peer-reviewed 1 Resubmission
* 12/05/2007 Peer-reviewed 1 Final review
* 12/05/2007 Peer-reviewed 1 Final review
* 12/07/2007 Peer-reviewed 1 Review of review  
* 12/07/2007 Peer-reviewed 1 Review of review


=== Topics ===
* '''''Helper'''''
** A helper thread is a thread that does some of the work of the main thread in advance of the main thread so that the main thread can work more quickly. The Olukotun text only scratches the surface on all the different ways that helper threads can be used. Survey these ways, making sure to include the Slipstream approach developed at NCSU.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_4_helperThreads Helper Threads] - This summary defines what a helper thread is and a slipstream approach for helper threads. It also examines the advantages and disadvantages of helper threads. It also explores the uses of helper threads for: predicting a branch at early stages, prefetching of data, and a memory bug reduction.


japoovey        7)wiki_helper
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki4_2_helperThreads Helper Threads (with illustration of how they work)] - This summary defines what a helper thread is and a slipstream approach for helper threads. It also examines the advantages and disadvantages of helper threads. It also explores the uses of helper threads for: predicting a branch at early stages, prefetching of data, and a memory bug reduction.
jparranc        7)wiki_helper
lreddy          7)wiki_helper
mvrao          7)wiki_helper
jdbraman        8)wiki_interconnection
tmhedber        8)wiki_interconnection


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki4_5_1008 Helper Threads (with multiple illustrations)] - This summary defines what a helper thread is and terms associated with helper threads. It also has 3 figures that illustrate 3 respective concepts: Additions to the Superscalar Processor, Speedup of Two Tasks per CMP versus One Task per CMP, Slipstream-based Self-invalidation. It also explores use of helper threads, types of helper threads, examples of helper threads, and the issues with using helper threads.


=== Topics ===
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki4_7_2815 Helper Threads (with illustration of how they work and multiple external references)] - This summary introduces the concept of helper threads. It also explores applications of helper threads: speculative branch prediction, speculative prefetching, and the slipstream technology.
* '''''Helper'''''
** A helper thread is a thread that does some of the work of the main thread in advance of the main thread so that the main thread can work more quickly. The Olukotun text only scratches the surface on all the different ways that helper threads can be used. Survey these ways, making sure to include the Slipstream approach developed at NCSU.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_2_helperThreads LAMMPS and a Flowchart of Molecular Dynamics Sequential code] - This summary defines what a helper thread is and a slipstream approach for helper threads. It also examines the advantages and disadvantages of helper threads. It also explores the uses of helper threads for: predicting a branch at early stages, prefetching of data, and a memory bug reduction.  


[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_7_2815 Helper Threads] - This summary highlights the recent design trends in shared address spaces, evolution of interconnect technology, current high end SMPs, and explores the evolution of supercomputers since the Cray T3E.




* '''''Interconnection'''''
* '''''Interconnection'''''
** A helper thread is a thread that does some of the work of the main thread in advance of the main thread so that the main thread can work more quickly. The Olukotun text only scratches the surface on all the different ways that helper threads can be used. Survey these ways, making sure to include the Slipstream approach developed at NCSU.
** A helper thread is a thread that does some of the work of the main thread in advance of the main thread so that the main thread can work more quickly. The Olukotun text only scratches the surface on all the different ways that helper threads can be used. Survey these ways, making sure to include the Slipstream approach developed at NCSU.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki2_2_helperThreads LAMMPS and a Flowchart of Molecular Dynamics Sequential code] - This summary defines what a helper thread is and a slipstream approach for helper threads. It also examines the advantages and disadvantages of helper threads. It also explores the uses of helper threads for: predicting a branch at early stages, prefetching of data, and a memory bug reduction.
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki4_001_a1 Current Supercomputer Interconnect Topologies] - This summary highlights current supercomputer interconnect topologies. It also examines the Gigabit Ethernet, Infiniband, Myrinet, and their pros and cons.
 
[http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki4_8_xk Interconnect Networks] - This summary provides a basic overview of interconnect networks. It also examines the basic topologies and real-world implementations of those topologies.
 
=== Summary of links ===
 
* [[CSC/ECE 506 Fall 2007/PR1|PR1]]
* [[CSC/ECE 506 Fall 2007/wiki1 1.3.3 1.3.4 chase2007|wiki1 1.3.3 1.3.4 chase2007]]
* [[CSC/ECE 506 Fall 2007/wiki1 10 aj|wiki1 10 aj]]
* [[CSC/ECE 506 Fall 2007/wiki1 10 mt|wiki1 10 mt]]
* [[CSC/ECE 506 Fall 2007/wiki1 11|wiki1 11]]
* [[CSC/ECE 506 Fall 2007/wiki1 12 dp3|wiki1 12 dp3]]
* [[CSC/ECE 506 Fall 2007/wiki1 14 pp3|wiki1 14 pp3]]
* [[CSC/ECE 506 Fall 2007/wiki1 1 11|wiki1 1 11]]
* [[CSC/ECE 506 Fall 2007/wiki1 1 ab|wiki1 1 ab]]
* [[CSC/ECE 506 Fall 2007/wiki1 2 3K8i|wiki1 2 3K8i]]
* [[CSC/ECE 506 Fall 2007/wiki1 2 cv|wiki1 2 cv]]
* [[CSC/ECE 506 Fall 2007/wiki1 2 sk|wiki1 2 sk]]
* [[CSC/ECE 506 Fall 2007/wiki1 3 as1506|wiki1 3 as1506]]
* [[CSC/ECE 506 Fall 2007/wiki1 4 01|wiki1 4 01]]
* [[CSC/ECE 506 Fall 2007/wiki1 4 JHSL|wiki1 4 JHSL]]
* [[CSC/ECE 506 Fall 2007/wiki1 4 a1|wiki1 4 a1]]
* [[CSC/ECE 506 Fall 2007/wiki1 4 la|wiki1 4 la]]
* [[CSC/ECE 506 Fall 2007/wiki1 5 1008|wiki1 5 1008]]
* [[CSC/ECE 506 Fall 2007/wiki1 5 jp07|wiki1 5 jp07]]
* [[CSC/ECE 506 Fall 2007/wiki1 6 bn|wiki1 6 bn]]
* [[CSC/ECE 506 Fall 2007/wiki1 6 r8e|wiki1 6 r8e]]
* [[CSC/ECE 506 Fall 2007/wiki1 7 2281|wiki1 7 2281]]
* [[CSC/ECE 506 Fall 2007/wiki1 7 2815|wiki1 7 2815]]
* [[CSC/ECE 506 Fall 2007/wiki1 7 a1|wiki1 7 a1]]
* [[CSC/ECE 506 Fall 2007/wiki1 8 perash1|wiki1 8 perash1]]
* [[CSC/ECE 506 Fall 2007/wiki1 8 s5|wiki1 8 s5]]
* [[CSC/ECE 506 Fall 2007/wiki1 9 arubha|wiki1 9 arubha]]
* [[CSC/ECE 506 Fall 2007/wiki1 9 vr|wiki1 9 vr]]
* [[CSC/ECE 506 Fall 2007/wiki2-5-as|wiki2-5-as]]
* [[CSC/ECE 506 Fall 2007/wiki2 05 sa|wiki2 05 sa]]
* [[CSC/ECE 506 Fall 2007/wiki2 2 helperThreads|wiki2 2 helperThreads]]
* [[CSC/ECE 506 Fall 2007/wiki2 3 pa|wiki2 3 pa]]
* [[CSC/ECE 506 Fall 2007/wiki2 3 pb|wiki2 3 pb]]
* [[CSC/ECE 506 Fall 2007/wiki2 4 BV|wiki2 4 BV]]
* [[CSC/ECE 506 Fall 2007/wiki2 4 LA|wiki2 4 LA]]
* [[CSC/ECE 506 Fall 2007/wiki2 4 helperThreads|wiki2 4 helperThreads]]
* [[CSC/ECE 506 Fall 2007/wiki2 4 md|wiki2 4 md]]
* [[CSC/ECE 506 Fall 2007/wiki2 5 as|wiki2 5 as]]
* [[CSC/ECE 506 Fall 2007/wiki2 6 cv|wiki2 6 cv]]
* [[CSC/ECE 506 Fall 2007/wiki2 6 pb|wiki2 6 pb]]
* [[CSC/ECE 506 Fall 2007/wiki2 6 sbh|wiki2 6 sbh]]
* [[CSC/ECE 506 Fall 2007/wiki2 7 amassg|wiki2 7 amassg]]
* [[CSC/ECE 506 Fall 2007/wiki2 7 ss|wiki2 7 ss]]
* [[CSC/ECE 506 Fall 2007/wiki2 aY3w|wiki2 aY3w]]
* [[CSC/ECE 506 Fall 2007/wiki2 helperThreads|wiki2 helperThreads]]
* [[CSC/ECE 506 Fall 2007/wiki3 1 ncdt|wiki3 1 ncdt]]
* [[CSC/ECE 506 Fall 2007/wiki3 1 satkar|wiki3 1 satkar]]
* [[CSC/ECE 506 Fall 2007/wiki3 2 1r|wiki3 2 1r]]
* [[CSC/ECE 506 Fall 2007/wiki3 2 aY3w|wiki3 2 aY3w]]
* [[CSC/ECE 506 Fall 2007/wiki3 2 tl|wiki3 2 tl]]
* [[CSC/ECE 506 Fall 2007/wiki3 4 sm|wiki3 4 sm]]
* [[CSC/ECE 506 Fall 2007/wiki3 7 qaz|wiki3 7 qaz]]
* [[CSC/ECE 506 Fall 2007/wiki3 7 tl|wiki3 7 tl]]
* [[CSC/ECE 506 Fall 2007/wiki3 8 38|wiki3 8 38]]
* [[CSC/ECE 506 Fall 2007/wiki3 8 a1|wiki3 8 a1]]
* [[CSC/ECE 506 Fall 2007/wiki3 9 sm|wiki3 9 sm]]
* [[CSC/ECE 506 Fall 2007/wiki4 001 a1|wiki4 001 a1]]
* [[CSC/ECE 506 Fall 2007/wiki4 2 helperThreads|wiki4 2 helperThreads]]
* [[CSC/ECE 506 Fall 2007/wiki4 5 1008|wiki4 5 1008]]
* [[CSC/ECE 506 Fall 2007/wiki4 7 2815|wiki4 7 2815]]
* [[CSC/ECE 506 Fall 2007/wiki4 7 jp07|wiki4 7 jp07]]
* [[CSC/ECE 506 Fall 2007/wiki4 8 xk|wiki4 8 xk]]
* [[CSC/ECE 506 Fall 2007/wiki4 helperThreads|wiki4 helperThreads]]
* [[CSC/ECE 506 Fall 2007/wiki8 4 xk|wiki8 4 xk]]
* [[CSC/ECE 506 Fall 2007/wiki 11 e4|wiki 11 e4]]
* [[CSC/ECE 506 Fall 2007/wiki 2 5 2281|wiki 2 5 2281]]

Latest revision as of 02:39, 23 February 2010

Formatting Resources

Formatting Help Guide from MetaWiki


Peer-reviewed Assignment 1

Important Dates

  • 08/31/2007 Peer-reviewed 1 Selection
  • 09/05/2007 Peer-reviewed 1 Submission
  • 09/07/2007 Peer-reviewed 1 First feedback
  • 09/10/2007 Peer-reviewed 1 Resubmission
  • 09/12/2007 Peer-reviewed 1 Final review
  • 09/14/2007 Peer-reviewed 1 Review of review

Topics

  • Sections 1.1 and 1.1.2
    • Update performance trends in multiprocessors.

Introduction to Parallel Programming - This summary gives a brief introduction to Parallel Programming.

Performance trends in multiprocessors - This summary discusses Moore's Law in the future and multiprocessor architecture's price vs. performance. It also concludes on how the relationship between the development of microprocessors and Moore's Law will be affected in the future.

[http://pg-server.csc.ncsu.edu/mediawiki/index.php/Chapter_6_Multi_Core_Architecture

  • Section 1.1.1, first half: Scientific/engineering application trends
    • What characterizes present-day applications?
    • How much memory, processor time, etc.?
    • How high is the speedup?

Scientific/engineering application trends - This summary discusses trends in scientific and engineering computing, hardware trends, software trends, and applications for HPC.

High Performance Computing Trends in Scientific and Engineering Applications - This summary discusses high performance computing trends in scientific and engineering applications, hardware used in high performance computing, and software applications.


  • Section 1.1.1, second half: Commercial application trends
    • What characterizes present-day ?
    • How much memory, processor time, etc.?
    • How high is the speedup?

Commercial application trends - This summary give an overview of commercial applications of parallel computing architecture. It also highlights who is doing parallel computing and what they are using it for.


  • Section 1.1.3: Technology trends

[hhttp://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_1_11 Technology Trends] - This summary gives a general overview of technology trends for processor technology (multi-core processors and cell processors), memory technology (caches, DDR and DDR2, and on-chip memory controllers), disk technology (RAID and Caches), clusters and Direct Memory Access (DMA).


  • Section 1.1.3: Architectural trends
    • How have architectures changed in the past 10 years?
    • Update Figs. 1.8 and 1.9 with new points, for 2000, 2002, 2004, 2006, and 2007.

Summary of Architectural Trends - This summary gives a detailed observation of architectural trends. It also highlights the concepts of VLIW (very long instruction word) processors, multi-threading, multi-core CPUs, and speculative execution. It also updates Figs. 1.8 and 1.9 with new points, for 2000, 2002, 2004, 2006, and 2007.

General Overview of Architectural Trends - This summary gives a general overview of architectural trends. It also highlights "My dual quad-core with quad-SLI", the use of silicon/carbon, and buses and memory.

Microprocessor and System Design Trends - This summary gives a general overview of architectural trends. It also highlights microprocessor design trends and system design trends.


  • Section 1.1.4: Supercomputers
    • Compare current supercomputers with those of 10 yrs. ago.
    • Update Figures 1.10 to 1.12 with new data points. For 1.12, consult top500.org.

Supercomputers - This summary details what a supercomputer is, the evolution of supercomputer architecture and performance, and explores the metric (LINPACK Benchmark Suite) most commonly used for evaluating the effectiveness of supercomputers. It also takes a look at the most dominant supercomputers of the last 10 years.

Supercomputers with a section on Cluster Computing - This summary details what a supercomputer is and explores the main metric (LINPACK Benchmark Suite) for evaluating the effectiveness of supercomputers. It also illustrates current trends in the industry by exploring the types of systems used in the 500 fastest computer systems in the world. It explores the concept of cluster computing.

TPC-C (Transaction Processing Performance Council) benchmarks - This summary defines TPC-C (Transaction Processing Performance Council) benchmarks, lists the Top 10 supercomputers according to TPC-C benchmarking performance, lists TPC-C's Top 10 supercomputers according to performance per unit price, and graphs the throughput versus the number of processors for each vendor. It also highlights processor and memory speeds, commercial computers, and the concept of speedup.


  • Sections 1.2.1 and 1.2.4: Communication architecture
    • Trends in last 10 years.
    • How has data parallelism found its way into shared-memory and message-passing machines? An early example would be MMX.
    • Would you change the number of layers in Fig. 1.13?

Introduction to Message Passing - This summary gives a brief introduction to Message Passing.

Message Passing - This summary highlights the typical structure of message-passing machines, advantages of using message passing, and gives a detailed introduction of what message passing is.

Typical Structures of Message-Passing Machines - This summary emphasizes on highlighting the typical structure of message-passing machines.

Sections 1.2.1 and 1.2.4: Communication architecture - The summary provides detailed information on communication architecture, communication in multiprocessing, and the trends of communication architecture in the last 10 years.

Communication architecture - The summary provides detailed information on communication architecture, communication in multiprocessing, parallel programming models, the concept of convergence, layers of abstraction, and external links to trends in parallel computing.


  • Section 1.2.2: Shared address space
    • Any changes in the organization of address spaces in the last 10 years?
    • Are the interconnection structures different in new computers now than they were 10 years ago?
    • What is the size and capacity of current SMPs?
    • How have supercomputers evolved since the Cray T3E?

Shared address space - This summary highlights the recent design trends in shared address spaces, evolution of interconnect technology, current high end SMPs, and explores the evolution of supercomputers since the Cray T3E.

Shared address space with a figure with the Top 10 Fastest Supercomputers - This summary highlights the recent design trends in shared address spaces, evolution of interconnect technology, current high end SMPs, and explores the evolution of supercomputers since the Cray T3E.


  • Section 1.2.3: Message passing
    • Are blade servers an extension of message passing?
    • How have blade architectures evolved over the past 10 years?

Latest Developments in Message Passing - This summary defines the message passing model, the latest developments in message passing. It also highlights the advantages and implementation of Message Passing Interface (MPI). It also explores the blade servers' architecture, evolution, and future.

Introduction to Blade Servers - This summary give a brief introduction to blade servers.

General Blade Server Architecture - This summary highlights the general blade-server architecture. It also give a detailed figure that defines the different components within a general blade-server architecture.

Blade Servers - This summary introduces the general blade-server and highlights the advantages of blade servers. It also explores its evolution, its architecture, blade enclosures, and if blade servers are an extension of message passing.

Evolution of Blade Servers - This summary simply focuses on the evolution from standalone conventional server to the blade servers that have become popular today.

Enclosure - This summary simply focuses on the technology of blade enclosures. It explores the aspect of power, cooling, networking, and storage for a blade enclosure.

Advantages of Blade Servers - This summary simply focuses on the numerous advantages of blade servers: Reduced Space Requirements, Reduced Power Consumption and Improved Power Management, Lower Management Cost, Simplified Cabling, Future Proofing Through Modularity, and Easier Physical Deployment.

How Blade Servers are an Extension of Message Passing - This summary simply focuses on how blade servers an extension of message passing.


  • Section 1.2.5: Trends in vector processing and array processing.
    • New machines have recently been announced. Why will this be an important architectural dimension in the coming years?

Trends in vector processing and array processing - Summary 2 - This summary highlights current trends, past trends and emerging trends in vector processing and array processing. It also discusses the advantages of vector processing and the pitfalls of vector processing as well.

Vector processing and array processing - This summary provides a detail description, the history, and definition of vector and array processors. It highlights the future and new trends in vector processing and array processing.


  • Section 1.2.6
    • New developments in dataflow and systolic architectures, if any.
    • Or if not, why are these styles not evolving with time?

New Developments in Dataflow and Systolic Architectures - This summary give a detailed description of the new developments in dataflow and systolic architectures. It even explores why systolic architecture has not truly evolved with time (to the extent of other architectures).

Dataflow & Systolic Architectures - This summary give a detailed description of the new developments in dataflow and systolic architectures. It also looks at the current state of both dataflow architectures and systolic architectures. It even explores several papers that propose different applications for systolic architecture.


  • Sections 1.3.1 and 1.3.2: Communication and programming model
    • How have reordering strategies evolved to accommodate larger multicomputers?
    • Have new kinds of synchronization operations been developed?
    • I doubt that other topics covered in these sections have changed much, but do check.

Communication and programming model - This summary provides detailed description of ordering and synchronization of communication and programming model. It also provides multiple external references.


  • Sections 1.3.3 and 1.3.4: Most changes here are probably related to performance metrics.
    • Cite other models for measuring artifacts such as data-transfer time, overhead, occupancy, and communication cost. Focus on the models that are most useful in practice.

Fundamental Design Issues - Communication and Replication - This summary give a detailed description of communication and replication. It also explores computer performance, overhead and occupancy, performance metrics, data transfer time, and communication cost.

Performance metrics - This summary give a detailed description of communication and replication. It also looks at the artifacts of measuring performance, overhead and occupancy, communication cost, and scalability.

Peer-reviewed Assignment 2

Important Dates

  • 09/17/2007 Peer-reviewed 1 Selection
  • 09/24/2007 Peer-reviewed 1 Submission
  • 09/26/2007 Peer-reviewed 1 First feedback
  • 09/28/2007 Peer-reviewed 1 Resubmission
  • 10/03/2007 Peer-reviewed 1 Final review
  • 10/05/2007 Peer-reviewed 1 Review of review

Topics

  • Parallelizing an application
    • Pick another parallel application, not covered in the text, and less than 7 years old, and describe the various steps in parallelizing it (decomposition, assignment, orchestration, and mapping). You may use an example from the peer-reviewed literature, or a Web page. You do not have to go into great detail, but you should describe enough about these four stages to make the algorithm interesting.

LAMMPS and a Flowchart of Molecular Dynamics Sequential code - This summary explores LAMMPS (Large Scale Atomic/Molecular Massively Parallel System) algorithm, the sequential algorithm. It also explores the concepts of Decomposition & Assignment, Orchestration, and Mapping for the LAMMPS programming model.

MapReduce - This summary explores MapReduce, a programming model. It also explores the concepts of Decomposition & Assignment, Orchestration, and Mapping for the MapReduce programming model.

Shuffled Complex Evolution Metropolis (SCEM-UA) - This summary explores Shuffled Complex Evolution Metropolis (SCEM-UA), a programming model. It also explores the concepts of Decomposition & Assignment, Orchestration, and Mapping for the Shuffled Complex Evolution Metropolis programming model.


  • Cache sizes in multicore architectures
    • Create a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used. Compare this with two or three recent single-core designs.

Basic Summary of Cache Sizes in Multicore Architectures - This summary created a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used. Cache Organization in Multicore Processor - This summary defines a multi-core processor and cache organization in multicore, creates a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used.

Cache Sizes in Multicore Architectures - This summary highlights cache sizes in multicore architectures. It creates a table containing details of the current multi-core processor architectures along with their intricate details like number of levels, cache size, etc. It also has multiple external references.


  • MSIMD architectures and applications
    • MSIMD architectures have garnered quite a bit of contention recently. Read a few papers on these architectures and write a survey of applications for which they would be suitable. If possible, talk about the steps in parallelizing these applications (decomposition, assignment, orchestration, and mapping).

MSIMD architectures - This summary introduces introduces MSIMD architectures and the problems that lie within this architecture. It also explores 3-D Wafer Stack Neurocomputing, Neural Network Vision application, resource optimization of a parallel computer for multiple vector processing, a simulation of a MSIMD System with resequencing, and SIMD Architecture for feature tracking.

Applications of the MSIMD architecture - This summary highlights applications of the MSIMD architecture (The GPA Machine and The Warwick Pyramid Machine). It also explores the Artificial Neural Networks.


  • Cache-to-cache sharing
    • On p. 300 of the text, cache-to-cache sharing is introduced. If a cache has an up-to-date copy of a block, should it supply it, or should it wait for memory to do it? What do current multiprocessors do? In current machines, is cache-to-cache sharing faster or slower than waiting for memory to respond?

Cache-to-cache sharing - This summary introduces the need for cache-to-cache sharing and examines its disadvantages. It also explores the current uses of cache-to-cache sharing.

Peer-reviewed Assignment 3

Important Dates

  • 10/12/2007 Peer-reviewed 1 Selection
  • 10/17/2007 Peer-reviewed 1 Submission
  • 10/19/2007 Peer-reviewed 1 First feedback
  • 10/22/2007 Peer-reviewed 1 Resubmission
  • 10/24/2007 Peer-reviewed 1 Final review
  • 10/26/2007 Peer-reviewed 1 Review of review



Topics

  • True and false sharing
    • True and false sharing. In Lectures 9 and 10, we covered performance results for true- and false-sharing misses. The results showed that some applications experienced degradation due to false sharing, and that this problem was greater with larger cache lines. But these data are at least 9 years old, and for multiprocessors that are smaller than those in use today. Comb the ACM Digital Library, IEEE Xplore, and the Web for more up-to-date results. What strategies have proven successful in combating false sharing? Is there any research into ways of diminishing true-sharing misses, e.g., by locating communicating processes on the same processor? Wouldn't this diminish parallelism and thus hurt performance?

Diminishing True and False Sharing - This summary gives a detailed description of true sharing and false sharing. It discusses the problem with false sharing, strategies to combat false sharing, and diminishing true-sharing misses.

Techniques to reduce false sharing misses and techniques to reduce true sharing misses - This summary introduces the concepts of both true sharing and false sharing. It also explores the effects of false sharing, techniques to reduce false sharing misses, and techniques to reduce true sharing misses.

False Sharing Miss and Miss Caused due to True Sharing - This summary explores false sharing misses and misses caused due to true sharing.


  • Simple Scalable Coherent Interface (SSCI) and the Scalable Coherent Interface (SCI)
    • SCI. The IEEE Scalable Coherent Interface is a superset of the SSCI protocol we have been considering in class. A lot has been written about it, but it is still difficult to comprehend. Using SSCI as a starting point, explain why additional states are necessary, and give (or cite) examples that demonstrate how they work. Ideally, this would still be an overview of the working of the protocol, referencing more detailed documentation on the Web.

Simple Scalable Coherent Interface (SSCI)/Scalable Coherent Interface (SCI)/Directory-based cache coherence - This summary gives a detailed description of directory-based cache coherence. It also explores Simple Scalable Coherent Interface (SSCI) and the Scalable Coherent Interface (SCI).

SSCI Protocol/SCI Protocol/Discussion of necessary additional states - This summary gives a brief overview of the SSCI Protocol, a brief overview of the SCI Protocol, and discusses why additional states are needed.

Examples of SSCI Protocol - This summary gives a brief overview of the Simple Scalable Coherent Interface (SSCI) Protocol, a brief overview of the Scalable Coherent Interface (SCI) Protocol, and discusses why additional states are needed. It also explores the SSCI and the SCI protocols.

SSCI and IEE SCI Protocol Similarities - This summary gives a brief overview of the Simple Scalable Coherent Interface (SSCI) Protocol, explores the 2 main configurations of the Scalable Coherent Interface (SCI) Protocol, and discusses why additional states are needed. It also discusses the similarities between the SSCI and the SCI protocols.

Peer-reviewed Assignment 4

Important Dates

  • 11/23/2007 Peer-reviewed 1 Selection
  • 11/28/2007 Peer-reviewed 1 Submission
  • 11/30/2007 Peer-reviewed 1 First feedback
  • 12/03/2007 Peer-reviewed 1 Resubmission
  • 12/05/2007 Peer-reviewed 1 Final review
  • 12/07/2007 Peer-reviewed 1 Review of review

Topics

  • Helper
    • A helper thread is a thread that does some of the work of the main thread in advance of the main thread so that the main thread can work more quickly. The Olukotun text only scratches the surface on all the different ways that helper threads can be used. Survey these ways, making sure to include the Slipstream approach developed at NCSU.

Helper Threads - This summary defines what a helper thread is and a slipstream approach for helper threads. It also examines the advantages and disadvantages of helper threads. It also explores the uses of helper threads for: predicting a branch at early stages, prefetching of data, and a memory bug reduction.

Helper Threads (with illustration of how they work) - This summary defines what a helper thread is and a slipstream approach for helper threads. It also examines the advantages and disadvantages of helper threads. It also explores the uses of helper threads for: predicting a branch at early stages, prefetching of data, and a memory bug reduction.

Helper Threads (with multiple illustrations) - This summary defines what a helper thread is and terms associated with helper threads. It also has 3 figures that illustrate 3 respective concepts: Additions to the Superscalar Processor, Speedup of Two Tasks per CMP versus One Task per CMP, Slipstream-based Self-invalidation. It also explores use of helper threads, types of helper threads, examples of helper threads, and the issues with using helper threads.

Helper Threads (with illustration of how they work and multiple external references) - This summary introduces the concept of helper threads. It also explores applications of helper threads: speculative branch prediction, speculative prefetching, and the slipstream technology.


  • Interconnection
    • A helper thread is a thread that does some of the work of the main thread in advance of the main thread so that the main thread can work more quickly. The Olukotun text only scratches the surface on all the different ways that helper threads can be used. Survey these ways, making sure to include the Slipstream approach developed at NCSU.

Current Supercomputer Interconnect Topologies - This summary highlights current supercomputer interconnect topologies. It also examines the Gigabit Ethernet, Infiniband, Myrinet, and their pros and cons.

Interconnect Networks - This summary provides a basic overview of interconnect networks. It also examines the basic topologies and real-world implementations of those topologies.

Summary of links