CSC/ECE 506 Spring 2012/preface: Difference between revisions
Line 17: | Line 17: | ||
=Chapter 3: Shared Memory Parallel Programming <ref>[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2012/3a_df Spring_2012/3a_df]</ref><ref>[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2012/3a_kp Spring_2012/3a_kp]</ref><ref>[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2012/3a_yw Spring_2012/3a_yw]]</ref><ref>[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2012/3b_sk Spring_2012/3b_sk]</ref><ref>[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2011/ch3_ab Spring_2011/ch3_ab]</ref>= | =Chapter 3: Shared Memory Parallel Programming <ref>[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2012/3a_df Spring_2012/3a_df]</ref><ref>[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2012/3a_kp Spring_2012/3a_kp]</ref><ref>[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2012/3a_yw Spring_2012/3a_yw]]</ref><ref>[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2012/3b_sk Spring_2012/3b_sk]</ref><ref>[http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2011/ch3_ab Spring_2011/ch3_ab]</ref>= | ||
In this chapter we have also three topics. In the first one, '''Types of Parallelism''', the authors of [23] define the concepts of DOALL, DOACCROSS, DOPIPE, Functional Parallelism and Reduction. Following they introduce synchronization and its mechanisms and describe how they are applied in several architectural examples, that is, IA-64, IA-32, Linux Kernel, CUDA, Power-PC and Cell Broadband Engines (CEB); talking about what techniques every of these examples uses to maintain that synchronization ( | In this chapter we have also three topics. In the first one, '''Types of Parallelism''', the authors of [23] define the concepts of DOALL, DOACCROSS, DOPIPE, Functional Parallelism and Reduction. Following they introduce synchronization and its mechanisms and describe how they are applied in several architectural examples, that is, IA-64, IA-32, Linux Kernel, CUDA, Power-PC and Cell Broadband Engines (CEB); talking about what techniques every of these examples uses to maintain that synchronization (spin-locks, barriers...). | ||
Next topic in this chapter is '''Patterns of parallel programming''' where we have a couple of different approaches: [20] tries first to make classifications and presents three different ones, Massingeli's, Keutzer and Mattson's and Siu et al's. Then it divides the list of patterns at four different levels, that is, Decomposition, Algorithm, Implementation and Parallel Executions; and describes common routines and algorithms that are used at every level, with some examples. It finishes with a glossary of all of the patterns that could be useful for a fast search. [21] on the other hand, focuses on the Algorithm level and, after introducing the need to have these patterns as building blocks to get the best performance and repetibility out of our systems, it divides the patterns into two groups, Functional Parallelism Patterns, like the Embarassingly Parallel Pattern or Divide and conquer; and Data Parallelism Patterns, which include Replicable Patterns, Repository Patterns, Recursive Data Pattern, Geometric and Irregular Mesh Pattern and Pipeline Pattern[15] | |||
== Chapter 3b: '' Map Reduce'' == | == Chapter 3b: '' Map Reduce'' == | ||
Revision as of 14:07, 17 April 2012
This preface intends to browse through the wiki pages done over the Spring 11 and Spring 12 CSC/ECE 506 courses and, taking the Solihin's FUNDAMENTALS OF PARALLEL COMPUTER ARCHITECTURE<ref>FUNDAMENTALS OF PARALLEL COMPUTER ARCHITECTURE by Solihin</ref> as a base and perusing the book Table of Content <ref>FUNDAMENTALS OF PARALLEL COMPUTER ARCHITECTURE TOC</ref> travel though the study topics and summing them up, picking up the best approaches.
Chapter 1: Perspectives <ref>Spring_2012/1a_hv</ref><ref>Spring_2012/1a_mw</ref> <ref>Spring_2012/1a_ry</ref> <ref>Spring_2012/1b_ap</ref> <ref>Spring_2012/1b_as </ref> <ref> Spring_2012/1b_ps</ref> <ref>Spring_2012/1c_12</ref> <ref> Spring_2012/1c_cl </ref> <ref>Spring_2012/1c_dm </ref>
This chapter includes pages on three different subjects: supercomputer comparisons, Moore's Law and MISD architectures.On the first topic, supercomputers evolution, out of the 3 pages both [5] and [6] are very similar, probably because they come from the same source ([6] actually specify what their additions were to the Spring 11 original), and are both remarkable. They start analyzing the evolution of the supercomputer (Eniac, Cray, Japanese computers, IBM...) to pass to their different designs and architectures ([5] adds here a section of the reasons for the decay of the vector-based machines). Chapter 7 on [6] is an extensive excursion over the operative systems used in these platforms, ways of interconnecting them ([5] also introduces Infiniband as an alternative to Ethernet)and includes some comparisons between brands and different architectures. Next more current fast supercomputers implementations are presented, together with programming models and tools, like Fortran, C, PVM or OpenMP. Finally Consumption and cooling techniques are presented, followed by a discussion on trends and the future of these machines.
The second set of wikis try to briefly answer the Does Moore's Law still hold? question. [7] and [8] are also very similar in their approach and both pages introduce the Law and its proposer, a historical of its evolution through the years; and also a list of computers with their number of transistors as a proof of the Law and its particularities, including that it refers to the number of transistors, not performance. Then they approach the future of the Law, and [7] includes some future alternatives to the transistor, like memristors, quantum computing and ballistic deflection transistors.
The third set is about MISD architectures and the wiki pages go into describing the Flynn's Taxonomy in general and the MISD machines in particular. [9], after introducing the MISD architecture and giving examples (or explaining the lack of them, as theoretically they don't exist), makes a good description of systolic arrays and its applications, describing, for instance, discrete and fast Fourier transforms implementations. It also discusses whether the systolic arrays are really MISD machines or not, as the concept is a matter of polemic.
Chapter 2: Parallel Programming Models <ref>Spring_2012/2a_bm </ref> <ref> Spring_2012/2a_va </ref><ref> Spring_2012/ch2b_cm </ref><ref>Spring_2011/ch2_cl </ref><ref> Spring_2011/ch2_dm</ref><ref>Spring_2011/ch2_JR </ref><ref> Spring_2011/ch2a_mc </ref>
This chapter is also divided into three different topics. The first one describes Data Parallel models in general and the wiki pages have been done in the Spring of 2011. Out of the three pages on the subject, [16] does a good job describing the model and its parts, decomposition and orchestration; but [15] treats the Task Parallel model better, comparing them even graphically; and offers also a history of the evolution of these techniques.
Coming into the Spring 2012 wiki pages, we have two topics. The first one is Shared Address Space (SAS) programming on distributed-memory machines, with [12] going first through the origins and background and introducing the two main issues on shared memory: coherence and synchronization. It passes then to talk about Distributed Shared Memory (DSM) and cache coherence, page management and memory mapping in MOME, and also node communications and programming environments, with descriptions of how the locks and barriers are introduced to ensure synchronization. It follows with some DSM implementation example of implementations and algorithms, introducing consistency models. Finally, it describes some evaluation models to measure performance (SPLASH) and some study cases, to then close the page commenting of the perspective for the evolution of the topic.
The last topic in this chapter is Data parallelism in GPUs. The pages [14] and [18] treat the subject in a very similar way, getting into the diatribes of the massive parallel computer that take places in graphic processors and how are they more focused on data processing, versus control, as other kind of processor. [18] opens openCL and programming models, talking about the platforms, execution and memory models and programming approaches. Then it discusses two implementation from AMD, the Radeon HD5000 and HD6000 series and NVidia GeForce 4000s. Compute Unified Device Architecture (CUDA) is then introduced and described, focusing in threads and programming. [14] is more focused in the different NVidea processor implementations and dedicates more space to stream processors and CUDA architecture, programming and examples.
In this chapter we have also three topics. In the first one, Types of Parallelism, the authors of [23] define the concepts of DOALL, DOACCROSS, DOPIPE, Functional Parallelism and Reduction. Following they introduce synchronization and its mechanisms and describe how they are applied in several architectural examples, that is, IA-64, IA-32, Linux Kernel, CUDA, Power-PC and Cell Broadband Engines (CEB); talking about what techniques every of these examples uses to maintain that synchronization (spin-locks, barriers...).
Next topic in this chapter is Patterns of parallel programming where we have a couple of different approaches: [20] tries first to make classifications and presents three different ones, Massingeli's, Keutzer and Mattson's and Siu et al's. Then it divides the list of patterns at four different levels, that is, Decomposition, Algorithm, Implementation and Parallel Executions; and describes common routines and algorithms that are used at every level, with some examples. It finishes with a glossary of all of the patterns that could be useful for a fast search. [21] on the other hand, focuses on the Algorithm level and, after introducing the need to have these patterns as building blocks to get the best performance and repetibility out of our systems, it divides the patterns into two groups, Functional Parallelism Patterns, like the Embarassingly Parallel Pattern or Divide and conquer; and Data Parallelism Patterns, which include Replicable Patterns, Repository Patterns, Recursive Data Pattern, Geometric and Irregular Mesh Pattern and Pipeline Pattern[15]
Chapter 3b: Map Reduce
Chapter 4a: Automatic parallelism and its limitations
Chapter 4b: The limits to speedup
Chapter 5:Parallel Programming for Linked Data Structures <ref>Spring_2012/ch5a_ja</ref><ref>Spring_2011/ch5_LL</ref><ref>Spring_2011/ch1_LL </ref>
Chapter 5a: Other linked data structures
Chapter 6: Introduction to Memory Hierarchy Organization <ref>Spring_2011/6a_ms</ref><ref>Spring_2012/6b_am</ref><ref>Spring_2012/6b_pa</ref><ref>Spring_2011/ch6a_ep</ref><ref>Spring_2011/ch6a_jp</ref><ref>Spring_2011/ch6b_ab</ref><ref>Spring_2011/ch6b_df</ref>
Chapter 6b: Multiprocessor issues with write buffers
Chapter 7b: TLB coherence
Chapter 8: Bus-Based Coherent Multiprocessors <ref>Spring_2011/ch8_cl</ref><ref>Spring_2011/ch8_mc</ref> <ref>Spring_2012/8a_cj</ref> <ref>Spring_2012/8a_cm</ref> <ref>Spring_2012/8a_fu</ref> <ref>Spring_2012/8b_va</ref>
Chapter 8a: MSI, MESI, MESIF, and MOESI protocols on real architectures
Chapter 8b: 8b. Update and adaptive coherence protocols on real architectures, and power considerations
Chapter 9: Hardware Support for Synchronization <ref>Spring_2011/ch9_ms</ref><ref>Spring_2012/9a_mm</ref><ref>Spring_2012/9a_mm</ref><ref>Spring_2012/9a_mm</ref>
Chapter 9a: Reducing locking overhead
Chapter 10: Memory Consistency Models<ref>Spring_2012/10a_dr</ref><ref>Spring_2012/10a_jp</ref><ref>Spring_2012/10a_vm</ref><ref>Spring_2012/10b_sr</ref><ref>Spring_2012/ch10_sj</ref><ref>Spring_2011/ch10_MC</ref><ref>Spring_2011/ch10_sb</ref><ref>Spring_2011/ch10_sj</ref><ref>Spring_2011/ch10a_dc</ref>
Chapter 10a: Prefetching and consistency models
Chapter 10b: Use of consistency models in current multiprocessors
Chapter 11a: Performance of DSM systems
Chapter 11b: Improvements to directory-based cache-coherence animations
Chapter 12: Interconnection Network Architecture <ref>Spring_2011/ch12 </ref><ref>Spring_2011/ch12_aj</ref><ref>Spring_2011/ch12_ar</ref><ref>Spring_2011/ch12_ob</ref><ref>Spring_2012/12b_jh</ref><ref>Spring_2012/12b_sb</ref>
Chapter 12a: New interconnection topologies
Chapter 12b: On-chip interconnects
References
<references/>