CSC/ECE 506 Spring 2010/summary: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
<html xmlns:o="urn:schemas-microsoft-com:office:office"
<center>ECE633 Independent Study: Architecture of Parallel Computers</center>
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><meta name=ProgId content=Word.Document><meta name=Generator content="Microsoft Word 11"><meta name=Originator content="Microsoft Word 11"><link rel=File-List href="Independent%20Study_files/filelist.xml"><title>ECE633 Independent Study: Architecture of Parallel Computers</title><!--[if gte mso 9]><xml><o:DocumentProperties><o:Author>karishma navalakha</o:Author><o:LastAuthor>Darshan J Pandya</o:LastAuthor><o:Revision>2</o:Revision><o:TotalTime>1</o:TotalTime><o:Created>2010-11-08T02:52:00Z</o:Created><o:LastSaved>2010-11-08T02:52:00Z</o:LastSaved><o:Pages>1</o:Pages><o:Words>4136</o:Words><o:Characters>23579</o:Characters><o:Lines>196</o:Lines><o:Paragraphs>55</o:Paragraphs><o:CharactersWithSpaces>27660</o:CharactersWithSpaces><o:Version>11.9999</o:Version></o:DocumentProperties></xml><![endif]--><!--[if gte mso 9]><xml><w:WordDocument><w:PunctuationKerning/><w:ValidateAgainstSchemas/><w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid><w:IgnoreMixedContent>false</w:IgnoreMixedContent><w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText><w:Compatibility><w:BreakWrappedTables/><w:SnapToGridInCell/><w:WrapTextWithPunct/><w:UseAsianBreakRules/><w:DontGrowAutofit/></w:Compatibility><w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel></w:WordDocument></xml><![endif]--><!--[if gte mso 9]><xml><w:LatentStyles DefLockedState="false" LatentStyleCount="156"><w:LsdException Locked="true" Name="Normal"/><w:LsdException Locked="true" Name="heading 1"/><w:LsdException Locked="true" Name="heading 2"/><w:LsdException Locked="true" Name="heading 3"/><w:LsdException Locked="true" Name="heading 4"/><w:LsdException Locked="true" Name="heading 5"/><w:LsdException Locked="true" Name="heading 6"/><w:LsdException Locked="true" Name="heading 7"/><w:LsdException Locked="true" Name="heading 8"/><w:LsdException Locked="true" Name="heading 9"/><w:LsdException Locked="true" Name="toc 1"/><w:LsdException Locked="true" Name="toc 2"/><w:LsdException Locked="true" Name="toc 3"/><w:LsdException Locked="true" Name="toc 4"/><w:LsdException Locked="true" Name="toc 5"/><w:LsdException Locked="true" Name="toc 6"/><w:LsdException Locked="true" Name="toc 7"/><w:LsdException Locked="true" Name="toc 8"/><w:LsdException Locked="true" Name="toc 9"/><w:LsdException Locked="true" Name="caption"/><w:LsdException Locked="true" Name="Title"/><w:LsdException Locked="true" Name="Subtitle"/><w:LsdException Locked="true" Name="Strong"/><w:LsdException Locked="true" Name="Emphasis"/><w:LsdException Locked="true" Name="Table Grid"/></w:LatentStyles></xml><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;
mso-font-charset:2;
mso-generic-font-family:auto;
mso-font-pitch:variable;
mso-font-signature:0 268435456 0 0 -2147483648 0;}
@font-face
{font-family:Cambria;
panose-1:2 4 5 3 5 4 6 3 2 4;
mso-font-charset:0;
mso-generic-font-family:roman;
mso-font-pitch:variable;
mso-font-signature:-1610611985 1073741899 0 0 415 0;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;
mso-font-charset:0;
mso-generic-font-family:swiss;
mso-font-pitch:variable;
mso-font-signature:-520092929 1073786111 9 0 415 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:"";
margin-top:0in;
margin-right:0in;
margin-bottom:10.0pt;
margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:Calibri;
mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}
h1
{mso-style-link:"Heading 1 Char";
mso-style-next:Normal;
margin-top:24.0pt;
margin-right:0in;
margin-bottom:0in;
margin-left:0in;
margin-bottom:.0001pt;
line-height:115%;
mso-pagination:widow-orphan lines-together;
page-break-after:avoid;
mso-outline-level:1;
font-size:14.0pt;
font-family:Cambria;
color:#365F91;
mso-font-kerning:0pt;}
h2
{mso-style-noshow:yes;
mso-style-link:"Heading 2 Char";
mso-style-next:Normal;
margin-top:10.0pt;
margin-right:0in;
margin-bottom:0in;
margin-left:0in;
margin-bottom:.0001pt;
line-height:115%;
mso-pagination:widow-orphan lines-together;
page-break-after:avoid;
mso-outline-level:2;
font-size:13.0pt;
font-family:Cambria;
color:#4F81BD;}
a:link, span.MsoHyperlink
{font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";
color:blue;
text-decoration:underline;
text-underline:single;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-noshow:yes;
font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";
color:purple;
text-decoration:underline;
text-underline:single;}
span.apple-style-span
{mso-style-name:apple-style-span;
font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}
span.toctext
{mso-style-name:toctext;
font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}
p.Quote, li.Quote, div.Quote
{mso-style-name:Quote;
mso-style-link:"Quote Char";
mso-style-next:Normal;
margin-top:0in;
margin-right:0in;
margin-bottom:10.0pt;
margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:Calibri;
mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";
color:black;
font-style:italic;}
span.QuoteChar
{mso-style-name:"Quote Char";
mso-style-locked:yes;
mso-style-link:Quote;
font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";
color:black;
font-style:italic;}
span.Heading1Char
{mso-style-name:"Heading 1 Char";
mso-style-locked:yes;
mso-style-link:"Heading 1";
mso-ansi-font-size:14.0pt;
mso-bidi-font-size:14.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-hansi-font-family:Cambria;
mso-bidi-font-family:"Times New Roman";
color:#365F91;
font-weight:bold;}
span.apple-converted-space
{mso-style-name:apple-converted-space;
font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}
p.ListParagraph, li.ListParagraph, div.ListParagraph
{mso-style-name:"List Paragraph";
margin-top:0in;
margin-right:0in;
margin-bottom:10.0pt;
margin-left:.5in;
mso-add-space:auto;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:Calibri;
mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}
p.ListParagraphCxSpFirst, li.ListParagraphCxSpFirst, div.ListParagraphCxSpFirst
{mso-style-name:"List ParagraphCxSpFirst";
mso-style-type:export-only;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
mso-add-space:auto;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:Calibri;
mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}
p.ListParagraphCxSpMiddle, li.ListParagraphCxSpMiddle, div.ListParagraphCxSpMiddle
{mso-style-name:"List ParagraphCxSpMiddle";
mso-style-type:export-only;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
mso-add-space:auto;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:Calibri;
mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}
p.ListParagraphCxSpLast, li.ListParagraphCxSpLast, div.ListParagraphCxSpLast
{mso-style-name:"List ParagraphCxSpLast";
mso-style-type:export-only;
margin-top:0in;
margin-right:0in;
margin-bottom:10.0pt;
margin-left:.5in;
mso-add-space:auto;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:Calibri;
mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}
p.NoSpacing, li.NoSpacing, div.NoSpacing
{mso-style-name:"No Spacing";
mso-style-parent:"";
margin:0in;
margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:Calibri;
mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}
span.tocnumber
{mso-style-name:tocnumber;
font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}
span.Heading2Char
{mso-style-name:"Heading 2 Char";
mso-style-noshow:yes;
mso-style-locked:yes;
mso-style-link:"Heading 2";
mso-ansi-font-size:13.0pt;
mso-bidi-font-size:13.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-hansi-font-family:Cambria;
mso-bidi-font-family:"Times New Roman";
color:#4F81BD;
font-weight:bold;}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;
mso-header-margin:.5in;
mso-footer-margin:.5in;
mso-paper-source:0;}
div.Section1
{page:Section1;}
/* List Definitions */
@list l0
{mso-list-id:220138155;
mso-list-type:hybrid;
mso-list-template-ids:-935426840 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
mso-bidi-font-family:"Times New Roman";}
@list l1
{mso-list-id:585001264;
mso-list-template-ids:1185565082;}
@list l1:level1
{mso-level-number-format:bullet;
mso-level-text:F0B7;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2
{mso-list-id:627711828;
mso-list-type:hybrid;
mso-list-template-ids:-647192220 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l2:level1
{mso-level-number-format:bullet;
mso-level-text:F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l3
{mso-list-id:631063392;
mso-list-type:hybrid;
mso-list-template-ids:288115314 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l3:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
mso-bidi-font-family:"Times New Roman";}
@list l4
{mso-list-id:755058200;
mso-list-type:hybrid;
mso-list-template-ids:-1450139246 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l4:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
mso-bidi-font-family:"Times New Roman";}
@list l5
{mso-list-id:766925642;
mso-list-type:hybrid;
mso-list-template-ids:-598935564 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l5:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
mso-bidi-font-family:"Times New Roman";}
@list l6
{mso-list-id:1101224121;
mso-list-type:hybrid;
mso-list-template-ids:-267988680 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l6:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
mso-bidi-font-family:"Times New Roman";}
@list l7
{mso-list-id:1109931714;
mso-list-type:hybrid;
mso-list-template-ids:-960715140 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l7:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
mso-bidi-font-family:"Times New Roman";}
@list l8
{mso-list-id:1402295398;
mso-list-type:hybrid;
mso-list-template-ids:763423860 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l8:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
mso-bidi-font-family:"Times New Roman";}
@list l9
{mso-list-id:1403603513;
mso-list-type:hybrid;
mso-list-template-ids:1497244868 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l9:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
mso-bidi-font-family:"Times New Roman";}
@list l10
{mso-list-id:1520926471;
mso-list-type:hybrid;
mso-list-template-ids:1496089722 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l10:level1
{mso-level-number-format:bullet;
mso-level-text:F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l11
{mso-list-id:1605575881;
mso-list-type:hybrid;
mso-list-template-ids:-1087757402 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l11:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
mso-bidi-font-family:"Times New Roman";}
@list l12
{mso-list-id:1709915636;
mso-list-type:hybrid;
mso-list-template-ids:2042111048 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l12:level1
{mso-level-number-format:bullet;
mso-level-text:F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l13
{mso-list-id:1745058118;
mso-list-type:hybrid;
mso-list-template-ids:-1145408414 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l13:level1
{mso-level-number-format:bullet;
mso-level-text:F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l14
{mso-list-id:1889145465;
mso-list-template-ids:1736059350;}
@list l14:level1
{mso-level-number-format:bullet;
mso-level-text:F0A7;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l15
{mso-list-id:1992056895;
mso-list-type:hybrid;
mso-list-template-ids:-1328883150 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l15:level1
{mso-level-number-format:bullet;
mso-level-text:F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 10]><style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:Calibri;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}
</style><![endif]--></head><body lang=EN-US link=blue vlink=purple style='tab-interval:.5in'><div class=Section1><p class=MsoNormal align=center style='text-align:center'><span
style='font-family:"Times New Roman"'>ECE633 Independent Study: Architecture of
Parallel Computers<o:p></o:p></span></p><p class=MsoNormal align=center style='text-align:center'><span
style='font-family:"Times New Roman"'>Karishma Navalakha<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><b style='mso-bidi-font-weight:
normal'><span style='font-family:"Times New Roman"'>Abstract:<span
style='mso-tab-count:1'></span><o:p></o:p></span></b></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman"'>There
has been tremendous research and development in the field of multi-core
Architecture in the last decade. In such a dynamic environment it is very
difficult to have text books covering latest developments in the field. Wiki
written text books comes as an extremely handy tool for students to get
acquainted and interested in ongoing research.<span style='mso-spacerun:yes'></span>In this independent study we explored an academic learning technique
where students could learn the fundamental concepts of the subject through the
text book available to students and lectures delivered by Prof. Gehringer in
class. They can now build on this foundation and gather latest information from
the varied online resources and technical papers and summarize their findings
in the form of wiki pages.<span style='mso-spacerun:yes'></span>Software is
also being currently developed to assist the students and was adopted in this
course. We tried to enhance the quality of student submitted wiki pages through
peer reviewing. Professor Gehringer and I constantly provided inputs to
students to improve both their quality of wiki pages as well as quality of
reviewing. The software being developed under the able guidance of professor
Gehringer has been vital in overcoming administrative hurdles involved in
assigning topics to students, maintaining the updates and tracking progress of
their writings, getting feedbacks through peer reviewing and handling the
re-submitted work. All this has been managed via the software in an organized
fashion.<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><b style='mso-bidi-font-weight:
normal'><span style='font-family:"Times New Roman"'>Experience with Wiki
written text book:<o:p></o:p></span></b></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman"'>The
software was first deployed in CSC/ECE 506, Architecture of Parallel Computers.
This is a beginning masters-level course that is taken by all Computer
Engineering masters students. It is optional for Computer Science students, but
as it is one way to fulfill a core requirement, it is popular with them too.
The recently adopted textbook for this course is the locally written
Fundamentals of Parallel Computer Architecture: Multichip and Multicore Systems
[Solihin 2009]. It did not make sense to have the students rewrite this excellent
text, but the book concentrates on theory and design fundamentals, without
detailed application to current parallel machines. We felt that students would
benefit from learning how the principles were applied in current architectures.
Furthermore, they would learn about the newest machines in this fast-changing
field.<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman"'>After
every chapter covered in class, two individuals, or pairs of students were
required to sign up for writing the wiki supplement for that particular
chapter. (That is, we solicited two supplements for each chapter, each of which
could be authored by one or two students.) They were asked to add specific
types of information which was not included in the chapter.<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman"'>Initially,
students were not clear about the purpose of their wiki pages. The first pages
they wrote had substantial duplication of topics covered in the textbook.
Students were attempting to give a complete coverage of issues discussed in the
chapter. We wanted them to concentrate instead on recent developments. Upon
seeing this, we established the practice of having the first two authors of
this paper (Gehringer and Navalakha) review the student work, along with three
peer reviews from fellow students. A lot of review time was spent providing
guidance on how to revise.<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman"'>At
the beginning we gave the students complete freedom to explore resources for
the topic they had chosen to write on. This was not very successful, as the
students seemingly chose to read the first few search hits, which tended to
provide an overview of the topic, rather than in-depth information on
particular implementations. Sometimes students were not aware that the
information they found was already covered in the next chapter, which they have
not read yet. The first review which we gave students was mainly just making
them aware of topics covered in later chapters. A lot of effort in writing the
initial draft was thus wasted. After the first two sets of topics, we began to
provide links for students to material that we wanted the students to pay
attention to. Gehringer and Navalakha met weekly to discuss what to provide to
students. We regularly consulted other textbooks, technology news, and Web
sites of major processor manufacturers, such as Intel and AMD. As the semester
progressed, the quality of the initial submissions improved, and the students
realized better returns for their effort.<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman"'>The
quality of work seemed to improve as the semester progressed. A comparison of
the grades for the wiki pages revealed that the average score for the first
chapter written by each student was 82.8% while the average for the second
submission was 82.7%. The quality of wiki pages had improved, but at the same
time, the peer reviewers became more demanding. Students were given more inputs
to improve their work via peer reviewing. Thus the improvement was seen in the
final wiki page produced as against the grades received by students. The
initial wiki pages provided randomly collected data and was cluttered by
diagrams and graphs. This information reinstated facts given in the textbook.
The later wiki pages focused on a comparative study of present-day
supercomputers produced by Intel, AMD and IBM.<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman"'>For
example while writing the wiki for cache-coherence protocols, the students
examined which protocol was favored by which company and why. They also
discussed protocols which have been introduced in recent two years e.g.,
Intel's MESIF protocol. Such in depth analysis made the wiki more appealing to
readers. Gehringer and Navalakha provided additional reviews which helped in
constantly improving the quality of wiki pages. These reviews gave the students
insight into what was expected expected of them. This led to an increasing
focus on current developments while peer reviewing. It was observed that later
versions of reviews included guidance similar to that received from Gehringer
and Navalakha. The organization of the wiki pages and the volume of relevant
data collected by students improved as the semester progressed.<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman"'>Electronic
peer-review systems have been widely used to review student work, but never
before, to our knowledge, have they been applied to assignments consisting of
multiple interrelated parts with precedence constraints. The growing interest
in large collaborative projects, such as wiki textbooks, has led to a need for
electronic support for the process, lest the administrative burden on
instructor and TA grow too large.<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><b style='mso-bidi-font-weight:
normal'><span style='font-family:"Times New Roman"'>Chapter wise learning from
this independent study:<o:p></o:p></span></b></p><p class=MsoNormal style='text-align:justify'><b style='mso-bidi-font-weight:
normal'><span style='font-family:"Times New Roman"'>Chapter 1: <o:p></o:p></span></b></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman"'>It
covered an interesting topic of supercomputer evolution. Wiki pages written for
this topic included a lot data from literature. Students came up with
interesting topics which were not covered in the text book such as </span><a
href="http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Spring_2010/ch1_lm#Timeline_of_supercomputers"><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic;
text-decoration:none;text-underline:none'>Timeline of supercomputers</span></span></a><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic'>, </span></span><a
href="http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1#First_Supercomputer_.28_ENIAC_.29"><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic;
text-decoration:none;text-underline:none'>First Supercomputer(ENIAC)</span></span></a><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic'>, </span></span><a
href="http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1#Cray_History"><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic;
text-decoration:none;text-underline:none'>Cray History</span></span></a><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic'>, </span></span><a
href="http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1#Supercomputer_Hierarchal_Architecture"><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic;
text-decoration:none;text-underline:none'>Supercomputer Hierarchal Architecture</span></span></a><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic'>, </span></span><a
href="http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1#SuperComputer_Operating_System"><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic;
text-decoration:none;text-underline:none'>Supercomputer Operating System</span></span></a><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic'>, </span></span><a
href="http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1#Cooling_Supercomputer"><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic;
text-decoration:none;text-underline:none'>Cooling Supercomputer</span></span></a><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic'> and
</span></span><a
href="http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Spring_2010/ch1_lm#Processor_Family"><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic;
text-decoration:none;text-underline:none'>Processor Family</span></span></a><span
class=QuoteChar><span style='font-style:normal;mso-bidi-font-style:italic'>.
From their research we could see the increase in dominance of Intel’s
processors in the consumer market. We also conclude that Unix has been the
platform for most of these super computers. </span></span><span
class=apple-style-span><span style='color:black'>Massive Parallel Processing
(MPP) and Symmetric Multiprocessing (SMP) were the earliest style of widely
used multiprocessor machine architectures which was replaced by constellation
computing in the 2000 and currently is dominated by cluster computing.<o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman"'>References:<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><a href="http://www.top500.org/"><span
style='font-family:"Times New Roman"'>http://www.top500.org/</span></a><span
style='font-family:"Times New Roman"'><o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><a
href="http://books.google.com/books?id=wx4kNh8ArH8C&pg=PA3&lpg=PA3&dq=evolution+of+supercomputers&source=bl&ots=7DVWaEYsZ4&sig=WKRWRuqtM-UfPoB-Wdka5ZWTgng&hl=en&ei=xAleS-TmDpqutgfcj_2jAg&sa=X&oi=book_result&ct=result&resnum=1&ved=0CAoQ6AEwADgK#v=onepage&q=evolution20supercomputers&f=false"
title="http://books.google.com/books?id=wx4kNh8ArH8C&pg=PA3&lpg=PA3&dq=evolution+of+supercomputers&source=bl&ots=7DVWaEYsZ4&sig=WKRWRuqtM-UfPoB-Wdka5ZWTgng&hl=en&ei=xAleS-TmDpqutgfcj_2jAg&sa=X&oi=book_result&ct=result&resnum=1&ved=0CAoQ6AEwADgK#v=onepage&q=evolu"><span
style='font-family:"Times New Roman";color:#3366BB'>The future of
supercomputing: an interim report By National Research Council (U.S.).
Committee on the Future of Supercomputing</span></a><span
class=apple-style-span><o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><b
style='mso-bidi-font-weight:normal'><span style='color:black'>Chapter 2: <o:p></o:p></span></b></span></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman";
color:black'>Data Parallel Programming: </span><span class=apple-style-span>The
students provided comparisons between data parallelism and task parallelism. </span><a
href="http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Spring_2010/ch_2_maf#References"><span
style='font-family:"Times New Roman";color:#002BB8'>Haveraaen (2000)</span></a><span
class=apple-style-span><span style='color:black'> notes that data parallel
codes typically bear a strong resemblance to sequential codes, making them
easier to read and write. </span></span><span style='font-family:"Times New Roman";
color:black'>Students noted that the data parallel model may be used with the
shared memory or the message passing model without conflict. <span
class=apple-style-span>In their comparisons they concluded combining the data
parallel and message passing models results in reduction in the amount and
complexity of communication required relative to a task parallel approach.
Similarly, combining the data parallel and shared memory models tends to simplify
and reduce the amount of synchronization required. <span style='mso-bidi-font-style:
italic'>SIMD (single-instruction-multiple-data)</span></span><span
class=apple-converted-space> </span><span class=apple-style-span>processors
are specifically designed to run data parallel algorithms.</span><span
class=apple-converted-space> Modern examples include CUDA processors
developed by nVidia and Cell processors developed by STI (Sony, Toshiba, and
IBM).<o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span
class=apple-converted-space><span style='color:black'>References: <o:p></o:p></span></span></p><p class=ListParagraphCxSpFirst style='text-align:justify;text-indent:-.25in;
mso-list:l4 level1 lfo8'><![if !supportLists]><span style='font-family:"Times New Roman"'><span
style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span><![endif]><span style='font-family:"Times New Roman"'>W.
Daniel Hillis and Guy L. Steele, Jr., </span><a
href="http://portal.acm.org/citation.cfm?id=7903"
title="http://portal.acm.org/citation.cfm?id=7903"><span style='font-family:
"Times New Roman"'>"Data parallel algorithms,"</span></a><span
style='font-family:"Times New Roman"'> Communications of the
ACM, 29(12):1170-1183, December 1986.<o:p></o:p></span></p><p class=ListParagraphCxSpLast style='text-align:justify;text-indent:-.25in;
mso-list:l4 level1 lfo8'><![if !supportLists]><span style='font-family:"Times New Roman"'><span
style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span><![endif]><span style='font-family:"Times New Roman"'>Alexander
C. Klaiber and Henry M. Levy, </span><a
href="http://portal.acm.org/citation.cfm?id=192020"
title="http://portal.acm.org/citation.cfm?id=192020"><span style='font-family:
"Times New Roman"'>"A comparison of message passing and shared memory
architectures for data parallel programs,"</span></a><span
style='font-family:"Times New Roman"'> in Proceedings of the 21st
Annual International Symposium on Computer Architecture, April 1994, pp.
94-105.<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><b style='mso-bidi-font-weight:
normal'><span style='font-family:"Times New Roman"'>Chapter3: <o:p></o:p></span></b></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>In this wiki supplement, the three kinds of parallelisms,
i.e. DOALL, DOACROSS and DOPIPE were discussed. These three parallelism
techniques were discussed with examples in the form of Open MP code as
discussed in the text book. Besides the students provided additional depth in
this topic by discussing parallel_for, parallel_reduce,<span
style='mso-spacerun:yes'></span>parallel_scan,<span
style='mso-spacerun:yes'></span>pipeline,<span style='mso-spacerun:yes'></span>Reduction,<span style='mso-spacerun:yes'></span>DOALL,<span
style='mso-spacerun:yes'></span>DOACROSS, DOPIPE with respect to Intel Thread
Building Blocks. They also compared DOPIPE, DOACROSS, DOALL in POSIX Threads.
Finally they conclude : Pthreads works for all the parallelism and could
express functional parallelism easily, but it needs to build specialized
synchronization primitives and explicitly privatize variables, makes it more
effort needed to switch a serial program in to parallel mode.<o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>OpenMP can provide many performance enhancing features,
such as atomic, barrier and flush synchronization primitives. It is very simple
to use OpenMP to exploit DOALL parallelism, but the syntax for expressing
functional parallelism is awkward.<o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>Intel TBB relies on generic programming, it performs better
with custom iteration spaces or complex reduction operations. Also, it provides
generic parallel patterns for parallel while-loops, data-flow pipeline models,
parallel sorts and prefixes, so it's better in cases go beyond loop-based
parallelism.<o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>References: <o:p></o:p></span></span></p><p class=ListParagraphCxSpFirst style='text-align:justify;text-indent:-.25in;
mso-list:l6 level1 lfo9'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><a
href="https://docs.google.com/viewer?a=v&pid=gmail&attid=0.1&thid=126f8a391c11262c&mt=application%2Fpdf&url=https2F2Fmail3Fui26ik26view26th26attid26disp26realattid26zw&sig=AHIEtbTeQDhK98IswmnVSfrPBMfmPLH5Nw"
title="https://docs.google.com/viewer?a=v&pid=gmail&attid=0.1&thid=126f8a391c11262c&mt=application%2Fpdf&url=https2F2Fmail3Fui26ik26view26th26attid26disp26realattid%3Df_g602o"><span
style='font-family:"Times New Roman";color:#3366BB'>An Optimal Abtraction Model
for Hardware Multithreading in Modern Processor Architectures</span></a><span
class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpMiddle style='text-align:justify;text-indent:-.25in;
mso-list:l6 level1 lfo9'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><a
href="http://www.threadingbuildingblocks.org/uploads/81/91/Latest20Source%20Documentation/Reference.pdf"
title="http://www.threadingbuildingblocks.org/uploads/81/91/Latest20Source%20Documentation/Reference.pdf"><span
style='font-family:"Times New Roman";color:#3366BB'>Intel Threading Building
Blocks 2.2 for Open Source Reference Manual</span></a><span
class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpLast style='text-align:justify;text-indent:-.25in;
mso-list:l6 level1 lfo9'><![if !supportLists]><span style='font-family:"Times New Roman"'><span
style='mso-list:Ignore'>3.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span><![endif]><a
href="https://computing.llnl.gov/tutorials/pthreads/#Joining"
title="https://computing.llnl.gov/tutorials/pthreads/#Joining"><span
style='font-family:"Times New Roman";color:#3366BB'>POSIX Threads Programming
by Blaise Barney, Lawrence Livermore National Laboratory</span></a><span
style='font-family:"Times New Roman"'><o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'><o:p> </o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><b
style='mso-bidi-font-weight:normal'><span style='color:black'>Chapter 6:<o:p></o:p></span></b></span></p><p class=MsoNormal style='text-align:justify'><span
class=apple-converted-space><span style='color:black'>Cache Structures of
Multi-Core Architectures: Students added additional insight on this topic by
discussing Shared Memory Multiprocessors, write policies and replacement
policies. Greedy Dual Size (GDS) and Priority Cache(PC) replacement policy was
an additional subtopic students threw light on. Students also gave definitions
about Trace Cache and Smart Cache techniques by Intel.<span
style='mso-spacerun:yes'></span>The most important take away from this topic
was how students discussed WRITE POLICIES used in recent multi core
architectures. For example, Intel IA 32 IA64 architecture implements Write
Combining, Write Collapsing, Weakly Ordered, Uncacheable & Write No
Allocate and Non-temporal techniques in its cache. AMD uses cache exclusion
unlike Intel’s cache inclusion. Sun's Niagara and SPARC use L1 caches as WT,
with allocate on load and noallocate on stores.<span style='mso-spacerun:yes'></span><o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span
class=apple-converted-space><span style='color:black'>References: <o:p></o:p></span></span></p><p class=ListParagraphCxSpFirst style='text-align:justify;text-indent:-.25in;
mso-list:l9 level1 lfo10'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><span class=apple-converted-space><span
style='color:black'> </span></span><a
href="http://download.intel.com/technology/architecture/sma.pdf"
title="http://download.intel.com/technology/architecture/sma.pdf"><span
style='font-family:"Times New Roman";color:#3366BB;text-decoration:none;
text-underline:none'>http://download.intel.com/technology/architecture/sma.pdf</span></a><span
class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpMiddle style='text-align:justify;text-indent:-.25in;
mso-list:l9 level1 lfo10'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><span class=apple-converted-space><span
style='color:black'> </span></span><a
href="http://www.intel.com/Assets/PDF/manual/248966.pdf"
title="http://www.intel.com/Assets/PDF/manual/248966.pdf"><span
style='font-family:"Times New Roman";color:#3366BB;text-decoration:none;
text-underline:none'>http://www.intel.com/Assets/PDF/manual/248966.pdf</span></a><span
class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpLast style='text-align:justify;text-indent:-.25in;
mso-list:l9 level1 lfo10'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>3.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><span style='font-family:"Times New Roman"'><span
style='mso-spacerun:yes'></span></span><a
href="http://www.intel.com/design/intarch/papers/cache6.pdf"
title="http://www.intel.com/design/intarch/papers/cache6.pdf"><span
style='font-family:"Times New Roman";color:#3366BB;text-decoration:none;
text-underline:none'>http://www.intel.com/design/intarch/papers/cache6.pdf</span></a><span
class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><b
style='mso-bidi-font-weight:normal'><span style='color:black'>Chapter 7: <o:p></o:p></span></b></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>Shared-memory multiprocessors run into several problems
that are more pronounced than their uniprocessor counterparts. The Solihin text
used in this course goes into detail on three of these issues, that is cache
coherence, memory consistency and synchronization.<span
style='mso-spacerun:yes'></span>The goal of this wiki supplement was to
discuss these three issues and also what can be done to ensure that
instructions are handled in both a timely and efficient manner and in a manner
that is consistent with what the programmer might desire. Memory consistency
was discussed by comparing ordering on a uniprocessor vs ordering on a
multiprocessor. They concluded that in a multiprocessor much more care must be
taken to ensure that all of the loads and stores are committed to memory in a
valid order. Synchronization was discussed as applicable to Open MP and fence
insertion. Other methods such as test and set method and direct interrupt to
another core were also briefly discussed. </span></span><span style='font-family:
"Times New Roman";color:black'>The programmer (or complier) is responsible for
knowing which synchronization directives are available on a given architecture
and implementing them in an efficient manner. The students also discussed commonly
used instructions for synchronization in popular processor architectures. For
example <span class=apple-style-span>SPARC V8 uses store barrier, Alpha uses
memory barrier and write memory barrier whereas Intel x86 uses lfence (load)
sfence (store).<o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>References: <o:p></o:p></span></span></p><p class=ListParagraphCxSpFirst style='text-align:justify;text-indent:-.25in;
mso-list:l11 level1 lfo11'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><span class=apple-converted-space><span
style='color:black'> </span></span><a
href="https://wiki.ittc.ku.edu/ittc/images/0/0f/Loghi.pdf"
title="https://wiki.ittc.ku.edu/ittc/images/0/0f/Loghi.pdf"><span
style='font-family:"Times New Roman";color:#3366BB;text-decoration:none;
text-underline:none'>https://wiki.ittc.ku.edu/ittc/images/0/0f/Loghi.pdf</span></a><span
class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpLast style='text-align:justify;text-indent:-.25in;
mso-list:l11 level1 lfo11'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><span class=apple-converted-space><span
style='color:black'> </span></span><a
href="http://portal.acm.org/citation.cfm?id=782854&dl=GUIDE&coll=GUIDE&CFID=84866326&CFTOKEN=84791790"
title="http://portal.acm.org/citation.cfm?id=782854&dl=GUIDE&coll=GUIDE&CFID=84866326&CFTOKEN=84791790"><span
style='font-family:"Times New Roman";color:#3366BB;text-decoration:none;
text-underline:none'>http://portal.acm.org/citation.cfm?id=782854&dl=GUIDE&coll=GUIDE&CFID=84866326&CFTOKEN=84791790</span></a><span
class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'><o:p> </o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><b
style='mso-bidi-font-weight:normal'><span style='color:black'>Chapter 8:<o:p></o:p></span></b></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>Students discussed the existing bus-based cache coherence
in real machines. They went ahead and classified the cache coherence protocols
based on the year they were introduced and they processors which uses them. MSI
protocol was first used in SGI IRIS 4D series. In Synapse protocol M state is
called D (Dirty) but works the same as MSI protocol works. MSI has a major
drawback in that each read-write sequence incurs 2 bus transactions
irrespective of whether the cache line is stored in only one cache or not. The
Pentium Pro microprocessor, introduced in 1992 was the first Intel architecture
microprocessor to support SMP and MESI. The MESIF protocol, used in the latest
Intel multi-core processors was introduced to accommodate the point-to-point
links used in the QuickPath Interconnect.<span style='mso-spacerun:yes'></span>MESI came with the drawback of using much time and bandwidth. MOESI was
the AMD’s answer to this problem . MOESI' has become one of the most popular
snoop-based protocols supported in the AMD64 architecture. The AMD dual-core
Opteron can maintain cache coherence in systems up to 8 processors using this
protocol. The Dragon Protocol is an update based coherence protocol which does
not invalidate other cached copies. The Dragon Protocol , was developed by
Xerox Palo Alto Research Center(Xerox PARC), a subsidiary of Xerox Corporation.
This protocol was used in the Xerox PARC Dragon multiprocessor workstation. <o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>References:<o:p></o:p></span></span></p><p class=ListParagraphCxSpFirst style='text-align:justify;text-indent:-.25in;
mso-list:l7 level1 lfo12'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><a
href="http://www.zak.ict.pwr.wroc.pl/nikodem/ak_materialy/Cache20&%20MESI.pdf"
title="http://www.zak.ict.pwr.wroc.pl/nikodem/ak_materialy/Cache20&%20MESI.pdf"><span
style='font-family:"Times New Roman";color:#3366BB'>Cache consistency with MESI
on Intel processor</span></a><span class=apple-style-span><span
style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpMiddle style='text-align:justify;text-indent:-.25in;
mso-list:l7 level1 lfo12'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><a
href="http://techreport.com/articles.x/8236/2"
title="http://techreport.com/articles.x/8236/2"><span style='font-family:"Times New Roman";
color:#3366BB'>AMD dual core Architecture</span></a><span
class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpMiddle style='text-align:justify;text-indent:-.25in;
mso-list:l7 level1 lfo12'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>3.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><a
href="http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/stamp/stamp.jsp?tp=&arnumber=4913"
title="http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/stamp/stamp.jsp?tp=&arnumber=4913"><span
style='font-family:"Times New Roman";color:#3366BB'>Silicon Graphics Computer
Systems</span></a><span class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpMiddle style='text-align:justify;text-indent:-.25in;
mso-list:l7 level1 lfo12'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>4.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><a
href="http://portal.acm.org/citation.cfm?id=1499317&dl=GUIDE&coll=GUIDE&CFID=83027384&CFTOKEN=95680533"
title="http://portal.acm.org/citation.cfm?id=1499317&dl=GUIDE&coll=GUIDE&CFID=83027384&CFTOKEN=95680533"><span
style='font-family:"Times New Roman";color:#3366BB'>Synapse tightly coupled
multiprocessors: a new approach to solve old problems</span></a><span
class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpMiddle style='text-align:justify;text-indent:-.25in;
mso-list:l7 level1 lfo12'><![if !supportLists]><span style='font-family:"Times New Roman";
color:black'><span style='mso-list:Ignore'>5.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span><![endif]><a
href="http://en.wikipedia.org/wiki/Dragon_protocol"
title="http://en.wikipedia.org/wiki/Dragon_protocol"><span style='font-family:
"Times New Roman";color:#3366BB'>Dragon Protocol</span></a><span
style='font-family:"Times New Roman";color:black'><o:p></o:p></span></p><p class=ListParagraphCxSpLast style='text-align:justify'><span
class=apple-style-span><span style='color:black'><o:p> </o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><b
style='mso-bidi-font-weight:normal'><span style='color:black'>Chapter 9:<o:p></o:p></span></b></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>Synchronization: Students classified synchronization
techniques based on implementation. Hardware synchronization uses locks,
barriers and mutual exclusion. Software synchronization examples include ticket
locks and queue-based MCS locks.<span style='mso-spacerun:yes'></span>Mutex
implementation uses execution of atomic statements. <o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>Some common examples include <span style='mso-bidi-font-weight:
bold'>Test-and-Set, Fetch-and-Increment, Exchange, Compare-and-Swap. </span>Another
type of lock that was not discussed in the text is known as the
"Hand-off" lock was discussed in detail by the students. They also
discussed reasons why a programmer should attempt to write programs in such a
way as to avoid locks. There are API's that exist for parallel architectures
that provide specific types of synchronization. If the API are used they way
they were design, performance can be maximized while minimizing overhead.Load
Locked(LL) and Store Conditional(SC) are a pair of instructions are improved
hardware primitives that are used for lock-free read-modify-write operation.<o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>Detailed description of </span></span><span
style='font-family:"Times New Roman";color:black'>Combining Tree Barrier,
Tournament Barrier and Disseminating Barrier was included. One of the
interesting topics discussed in this wiki supplement was the performance
evaluation of different barrier implementations. They showed that
barrier/centralized blocking barrier does not scale with number of threads and
the contention increases with increase in number of threads.<o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman";
color:black'>References:<o:p></o:p></span></p><p class=ListParagraphCxSpFirst style='text-align:justify;text-indent:-.25in;
mso-list:l8 level1 lfo13'><![if !supportLists]><span style='font-family:"Times New Roman";
color:black'><span style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span><![endif]><a
href="http://www2.cs.uh.edu/~hpctools/pub/iwomp-barrier.pdf"><span
style='font-family:"Times New Roman"'>http://www2.cs.uh.edu/~hpctools/pub/iwomp-barrier.pdf</span></a><span
style='font-family:"Times New Roman";color:black'><o:p></o:p></span></p><p class=ListParagraphCxSpMiddle style='text-align:justify;text-indent:-.25in;
mso-list:l8 level1 lfo13'><![if !supportLists]><span style='font-family:"Times New Roman";
color:black'><span style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span><![endif]><a
href="http://www.statemaster.com/encyclopedia/Deadlock"><span style='font-family:
"Times New Roman"'>http://www.statemaster.com/encyclopedia/Deadlock</span></a><span
style='font-family:"Times New Roman";color:black'><o:p></o:p></span></p><p class=ListParagraphCxSpLast style='text-align:justify;text-indent:-.25in;
mso-list:l8 level1 lfo13'><![if !supportLists]><span style='font-family:"Times New Roman";
color:black'><span style='mso-list:Ignore'>3.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span><![endif]><a
href="http://www.ukhec.ac.uk/publications/reports/synch_java.pdf"><span
style='font-family:"Times New Roman"'>http://www.ukhec.ac.uk/publications/reports/synch_java.pdf</span></a><span
style='font-family:"Times New Roman";color:black'><o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span style='font-family:"Times New Roman";
color:black'><o:p> </o:p></span></p><p class=MsoNormal style='text-align:justify'><b style='mso-bidi-font-weight:
normal'><span style='font-family:"Times New Roman";color:black'>Chapter 10: <o:p></o:p></span></b></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>Students discussed the existing bus-based cache coherence
in real machines. They went ahead and classified the cache coherence protocols
based on the year they were introduced and they processors which uses them. MSI
protocol was first used in SGI IRIS 4D series. In Synapse protocol M state is
called D (Dirty) but works the same as MSI protocol works. MSI has a major
drawback in that each read-write sequence incurs 2 bus transactions
irrespective of whether the cache line is stored in only one cache or not. The
Pentium Pro microprocessor, introduced in 1992 was the first Intel architecture
microprocessor to support SMP and MESI. The MESIF protocol, used in the latest
Intel multi-core processors was introduced to accommodate the point-to-point
links used in the QuickPath Interconnect.<span style='mso-spacerun:yes'></span>MESI came with the drawback of using much time and bandwidth. MOESI was
the AMD’s answer to this problem . MOESI' has become one of the most popular
snoop-based protocols supported in the AMD64 architecture. The AMD dual-core
Opteron can maintain cache coherence in systems up to 8 processors using this
protocol. The Dragon Protocol is an update based coherence protocol which does
not invalidate other cached copies. The Dragon Protocol , was developed by
Xerox Palo Alto Research Center(Xerox PARC), a subsidiary of Xerox Corporation.
This protocol was used in the Xerox PARC Dragon multiprocessor workstation.
References: <o:p></o:p></span></span></p><p class=ListParagraphCxSpFirst style='text-align:justify;text-indent:-.25in;
mso-list:l5 level1 lfo14'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><a
href="http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf"
title="http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf"><span
style='font-family:"Times New Roman";color:#3366BB'>Shared Memory Consistency
Models</span></a><span class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpMiddle style='text-align:justify;text-indent:-.25in;
mso-list:l5 level1 lfo14'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><a
href="http://portal.acm.org/citation.cfm?id=193889&dl=GUIDE&coll=GUIDE&CFID=84028355&CFTOKEN=32262273"
title="http://portal.acm.org/citation.cfm?id=193889&dl=GUIDE&coll=GUIDE&CFID=84028355&CFTOKEN=32262273"><span
style='font-family:"Times New Roman";color:#3366BB'>Designing Memory
Consistency Models For Shared-Memory Multiprocessors</span></a><span
class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpLast style='text-align:justify;text-indent:-.25in;
mso-list:l5 level1 lfo14'><![if !supportLists]><span style='font-family:"Times New Roman"'><span
style='mso-list:Ignore'>3.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span><![endif]><a
href="http://cs.gmu.edu/cne/modules/dsm/green/memcohe.html"
title="http://cs.gmu.edu/cne/modules/dsm/green/memcohe.html"><span
style='font-family:"Times New Roman";color:#3366BB'>Consistency Models</span></a><span
style='font-family:"Times New Roman"'><o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><b
style='mso-bidi-font-weight:normal'><span style='color:black'><o:p> </o:p></span></b></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><b
style='mso-bidi-font-weight:normal'><span style='color:black'>Chapter 11:<o:p></o:p></span></b></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>The cache coherence protocol presented in Chapter 11 of
Solihin 2008 is simpler than most real directory-based protocols. This textbook
supplement presents the directory-based protocols used by the DASH
multiprocessor and the Alewife multiprocessor. It concludes with an argument of
why complexity might be undesirable in cache coherence protocols. The DASH
multiprocessor uses a two-level coherence protocol, relying on a snoopy bus to
ensure cache coherence within cluster and a directory-based protocol to ensure
coherence across clusters. The protocol uses a Remote Access Cache (RAC) at
each cluster, which essentially consolidates memory blocks from remote clusters
into a single cache on the local snoopy bus. When a request is issued for a
block from a remote cluster that is not in the RAC, the request is denied but
the request is also forwarded to the owner. The owner supplies the block to the
RAC. Eventually, when the requestor retries, the block will be waiting in the
RAC. Read and readx operations on a Dash processor were discussed in detail.
They also discuss two race conditions which mainly arises on a Dash
processor.The first occurs when a Read from requester R is forwarded from home
H to owner O, but O sends a Writeback to H before the forwarded Read arrives.</span></span><span
class=apple-converted-space><span style='color:black'> </span></span><span
class=apple-style-span><span style='color:black'>Another possible race occurs
when the home node H replies with data (ReplyD) to a Read from requester R but
an invalidation (Inv) arrives first.<span style='mso-spacerun:yes'></span>LimitLESS is the cache coherence protocol used by the Alewife
multiprocessor.</span></span><span class=apple-converted-space><span
style='color:black'>   </span></span><span class=apple-style-span><span
style='color:black'>Unlike the DASH multiprocessor, the Alewife multiprocessor
is not organized into clusters of nodes with local buses, and therefore cache
coherence through the system is maintain through the directory. <o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>References: <o:p></o:p></span></span></p><p class=ListParagraphCxSpFirst style='text-align:justify;text-indent:-.25in;
mso-list:l0 level1 lfo15'><![if !supportLists]><span class=apple-style-span><i><span
style='color:black'><span style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></i></span><![endif]><span class=apple-style-span><span
style='color:black'>Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop
Gupta, and John Hennessy (1990).</span></span><span
class=apple-converted-space><span style='color:black'> </span></span><a
href="http://doi.acm.org/10.1145/325164.325132"
title="http://doi.acm.org/10.1145/325164.325132"><span style='font-family:"Times New Roman";
color:#3366BB;text-decoration:none;text-underline:none'>"The
directory-based cache coherence protocol for the DASH multiprocessor."</span></a><span
class=apple-converted-space><span style='color:black'> </span></span><span
class=apple-style-span><span style='color:black'>In</span></span><span
class=apple-converted-space><span style='color:black'> </span></span><span
class=apple-style-span><i><span style='color:black'>Proceedings of the 17th
Annual International Symposium on Computer Architecture.<o:p></o:p></span></i></span></p><p class=ListParagraphCxSpLast style='text-align:justify;text-indent:-.25in;
mso-list:l0 level1 lfo15'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><span class=apple-style-span><span
style='color:black'>David Chaiken, John Kubiatowicz, and Anant Agarwal (1991).</span></span><span
class=apple-converted-space><span style='color:black'> </span></span><a
href="http://groups.csail.mit.edu/cag/papers/pdf/asplos4.pdf"
title="http://groups.csail.mit.edu/cag/papers/pdf/asplos4.pdf"><span
style='font-family:"Times New Roman";color:#3366BB;text-decoration:none;
text-underline:none'>"LimitLESS directories: A scalable cache coherence
scheme."</span></a><span class=apple-converted-space><span
style='color:black'> </span></span><span class=apple-style-span><i><span
style='color:black'>ACM SIGPLAN Notices</span></i><span style='color:black'>.<o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><b
style='mso-bidi-font-weight:normal'><span style='color:black'>Chapter 12:<o:p></o:p></span></b></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>Interconnection Networks: Advances in multiprocessors,
parallel computing & networking and parallel computer architectures demand
very high performance from interconnection networks. Due to this,
interconnection network structure has changed over time, trying to meet higher
bandwidths and performance. Students discussed criterion to be considered for
choosing the best Network. It included Performance Requirements, Scalability,
Incremental expandability, Partitionability, Simplicity, Distance Span,
Physical Constraints, Reliability and Reparability, Expected Workloads and Cost
Constraints. They provided in depth discussion on Classification of
Interconnection networks.<span style='mso-spacerun:yes'></span>Shared-Medium
Networks include Token Ring, Token Bus,<span style='mso-spacerun:yes'></span>Backplane Bus. Direct Networks include Mesh, Torus, Hypercube,<span
style='mso-spacerun:yes'></span>Tree, Cube-Connected Cycles and<span
style='mso-spacerun:yes'></span>de Bruijn and Star Graph Networks.<span
style='mso-spacerun:yes'></span>Indirect Networks include Regular Topologies
like Crossbar Network and Multistage Interconnection Network and Hybrid
Networks such as<span style='mso-spacerun:yes'></span>Multiple Backplane
Buses, Hierarchical Networks,<span style='mso-spacerun:yes'></span>Cluster-Based Networks and Hypergraph Topologies. They also discussed
routing algorithms and deadlock, starvation and livelock associated with it.
These topics were covered in in an extremely detailed way. The students
included a diagrammatic<span style='mso-spacerun:yes'></span>representation
for every topology. <o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>References: <o:p></o:p></span></span></p><p class=ListParagraphCxSpFirst style='text-align:justify;text-indent:-.25in;
mso-list:l3 level1 lfo16'><![if !supportLists]><span class=apple-style-span><span
style='color:black'><span style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span></span><![endif]><a
href="http://www.top500.org/2007_overview_recent_supercomputers/sci"
title="http://www.top500.org/2007_overview_recent_supercomputers/sci"><span
style='font-family:"Times New Roman";color:#3366BB'>http://www.top500.org/2007_overview_recent_supercomputers/sci</span></a><span
class=apple-style-span><span style='color:black'><o:p></o:p></span></span></p><p class=ListParagraphCxSpLast style='text-align:justify;text-indent:-.25in;
mso-list:l3 level1 lfo16'><![if !supportLists]><span style='font-family:"Times New Roman"'><span
style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>     
</span></span></span><![endif]><a
href="http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/topology.html"><span
style='font-family:"Times New Roman"'>http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/topology.html</span></a><span
style='font-family:"Times New Roman"'><o:p></o:p></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'><o:p> </o:p></span></span></p><p class=MsoNormal style='text-align:justify'><b style='mso-bidi-font-weight:
normal'><span style='font-family:"Times New Roman";color:black'><o:p> </o:p></span></b></p><p class=MsoNormal style='text-align:justify'><b style='mso-bidi-font-weight:
normal'><span style='font-family:"Times New Roman";color:black'>Conclusion:<o:p></o:p></span></b></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'>This independent study helped me to increase my knowledge
to a great extent in the field of Architecture of parallel computers. There
were 4 students working on every chapter and came up with 2 wiki pages per
group. We collected a total of 18 wiki supplements. The data collected was
enormous. While reviewing their content I kept updating my knowledge base. I
also provided the resources from where they can collect data. This helped me to
come across latest developments in the field. Interacting with students helped
me to increase my communication skills. Constant discussions with Prof.
Gehringer helped me to understand key concepts. This idea of writing wiki
supplements got selected for KU Village presentation. I got an opportunity to
present this paper along with Prof. Gehringer. <o:p></o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'><o:p> </o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'><o:p> </o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'><o:p> </o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=apple-style-span><span
style='color:black'><o:p> </o:p></span></span></p><p class=MsoNormal style='text-align:justify'><span class=QuoteChar><span
style='font-style:normal;mso-bidi-font-style:italic'><o:p> </o:p></span></span></p></div>
<!-- END: body -->
<a href=http://www.milonic.com/><font color="#FFFFFF">JavaScript Menu Courtesy of Milonic.com</font></a>
</td>
    <td valign="top" width="10"><img src="/images/1x1.gif" width="10" height="1" alt=""></td>
    <td valign="top">
    <!-- INIT: left_body -->
   
<table border="0" cellpadding="0" cellspacing="0" width="148">
      <tr>
        <td bgcolor="#3399FF" rowspan="8" width="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr><td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td></tr>
      <tr>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>


      </tr>
<center>Karishma NavalakhaA</center>
      <tr>
        <td><p class="menu_right_titolo_gruppo">Site Utility</p></td>
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>


        <td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
'''Abstract: '''
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>
        <td><p class="menu_right"><img border="0" src="/images/printer_version.gif" width="14" height="10" alt="">
          <a href="/en/software/os/i_love_wiki/index.mpl?print=1&">Printer version</a></p><p class="menu_right"><img border="0" src="/images/flag_italy.gif" width="14" height="10" alt="">
        <b>


          <a href="/it/software/os/i_love_wiki/index.mpl?">Leggilo in italiano</a></b></p>
There has been tremendous research and development in the field of multi-core Architecture in the last decade. In such a dynamic environment it is very difficult to have text books covering latest developments in the field. Wiki written text books comes as an extremely handy tool for students to get acquainted and interested in ongoing research. In this independent study we explored an academic learning technique where students could learn the fundamental concepts of the subject through the text book available to students and lectures delivered by Prof. Gehringer in class. They can now build on this foundation and gather latest information from the varied online resources and technical papers and summarize their findings in the form of wiki pages. Software is also being currently developed to assist the students and was adopted in this course. We tried to enhance the quality of student submitted wiki pages through peer reviewing. Professor Gehringer and I constantly provided inputs to students to improve both their quality of wiki pages as well as quality of reviewing. The software being developed under the able guidance of professor Gehringer has been vital in overcoming administrative hurdles involved in assigning topics to students, maintaining the updates and tracking progress of their writings, getting feedbacks through peer reviewing and handling the re-submitted work. All this has been managed via the software in an organized fashion.
        <p class="menu_right">
        <img border="0" src="/images/stats.gif" width="14" height="10">
        <a target="_blank" href="/cgi-bin/perl/awstats/awstats.pl?lang=en">Site stats</a></p>
        <p class="menu_right">
        <img border="0" src="/images/stats.gif" width="14" height="10">
        <a href="/phpBB2/index.php">Read forums</a></p>
        <p class="menu_right">
        <img border="0" src="/images/sitemap.gif" width="14" height="10">
        <a href="/sitemap.mpl">Site map</a></p>
        </td>
      </tr>
    </table>
    <br>
           
        <!-- INIT: Google adSense -->
        <table border="0" cellpadding="0" cellspacing="0" width="148">
      <tr>
        <td bgcolor="#3399FF" rowspan="8" width="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr><td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td></tr>
      <tr>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>


      </tr>
'''Experience with Wiki written text book:'''
      <tr>
        <td><p class="menu_right_titolo_gruppo">Ads by
  <font color="#2168E0">G</font><font color="#D8240C">o</font><font color="#F4C513">o</font><font color="#2168E0">g</font><font color="#2C9F2C">l</font><font color="#D70000">e</font></p></td>
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>


        <td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
<nowiki>The software was first deployed in CSC/ECE 506, Architecture of Parallel Computers. This is a beginning masters-level course that is taken by all Computer Engineering masters students. It is optional for Computer Science students, but as it is one way to fulfill a core requirement, it is popular with them too. The recently adopted textbook for this course is the locally written Fundamentals of Parallel Computer Architecture: Multichip and Multicore Systems [Solihin 2009]. It did not make sense to have the students rewrite this excellent text, but the book concentrates on theory and design fundamentals, without detailed application to current parallel machines. We felt that students would benefit from learning how the principles were applied in current architectures. Furthermore, they would learn about the newest machines in this fast-changing field.</nowiki>
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>
        <td>
        <script type="text/javascript"><!--
google_ad_client = "pub-6275448308915555";
google_ad_width = 120;
google_ad_height = 240;
google_ad_format = "120x240_as";
google_ad_type = "text_image";
google_ad_channel ="";
google_color_border = "FFFFFF";
google_color_bg = "FFFFFF";
google_color_link = "CC3300";
google_color_url = "CC3300";
google_color_text = "000000";
//--></script>
<script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
        </td>
      </tr>
    </table> 
<!-- END: Google adSense -->
<br>
            <table border="0" cellpadding="0" cellspacing="0" width="148">
      <tr>
        <td bgcolor="#dc9529" rowspan="8" width="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
        <td bgcolor="#dc9529" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr><td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td></tr>
      <tr>
        <td bgcolor="#dc9529" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>


      </tr>
After every chapter covered in class, two individuals, or pairs of students were required to sign up for writing the wiki supplement for that particular chapter. (That is, we solicited two supplements for each chapter, each of which could be authored by one or two students.) They were asked to add specific types of information which was not included in the chapter.
      <tr>
        <td><p class="menu_right_titolo_gruppo"><font color="#dc9529">
<img border="0" src="/images/chat_with_me.gif" width="27" height="16"> Chat with me</font></p></td>
      </tr>
      <tr>
        <td height="2" bgcolor="#dc9529"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>


        <td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
Initially, students were not clear about the purpose of their wiki pages. The first pages they wrote had substantial duplication of topics covered in the textbook. Students were attempting to give a complete coverage of issues discussed in the chapter. We wanted them to concentrate instead on recent developments. Upon seeing this, we established the practice of having the first two authors of this paper (Gehringer and Navalakha) review the student work, along with three peer reviews from fellow students. A lot of review time was spent providing guidance on how to revise.
      </tr>
      <tr>
        <td height="2" bgcolor="#dc9529"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>
        <td>
     
        <p class="menu_right">
        My nickname is <b>emi </b>but now I am <font color="#FF00FF"><b>offline</b></font>.
However you can usually find me on these channels of the
<a href="http://www.azzurra.org/">Azzurra</a> networks:</p>
<p class="menu_right">
        <span style="font-size: 7pt"><a href="irc://irc.azzurra.net/areanetworking">areanetworking</a>,  
        <a href="irc://irc.azzurra.net/telug">telug</a>,
        <a href="irc://irc.azzurra.net/controguerra">controguerra</a>, <a href="irc://irc.azzurra.net/webgui">webgui</a>,
        <a href="irc://irc.azzurra.net/geeks">geeks</a>, <a href="irc://irc.azzurra.net/pescaralug">pescaralug</a></span></p>
       
        </td>
      </tr>
    </table>
       
   
<br>
<table border="0" cellpadding="0" cellspacing="0" width="148">
      <tr>
        <td bgcolor="#3399FF" rowspan="8" width="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr><td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td></tr>
      <tr>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>


      </tr>
At the beginning we gave the students complete freedom to explore resources for the topic they had chosen to write on. This was not very successful, as the students seemingly chose to read the first few search hits, which tended to provide an overview of the topic, rather than in-depth information on particular implementations. Sometimes students were not aware that the information they found was already covered in the next chapter, which they have not read yet. The first review which we gave students was mainly just making them aware of topics covered in later chapters. A lot of effort in writing the initial draft was thus wasted. After the first two sets of topics, we began to provide links for students to material that we wanted the students to pay attention to. Gehringer and Navalakha met weekly to discuss what to provide to students. We regularly consulted other textbooks, technology news, and Web sites of major processor manufacturers, such as Intel and AMD. As the semester progressed, the quality of the initial submissions improved, and the students realized better returns for their effort.
      <tr>
        <td><p class="menu_right_titolo_gruppo">Credits</p></td>
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>


        <td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
The quality of work seemed to improve as the semester progressed. A comparison of the grades for the wiki pages revealed that the average score for the first chapter written by each student was 82.8% while the average for the second submission was 82.7%. The quality of wiki pages had improved, but at the same time, the peer reviewers became more demanding. Students were given more inputs to improve their work via peer reviewing. Thus the improvement was seen in the final wiki page produced as against the grades received by students. The initial wiki pages provided randomly collected data and was cluttered by diagrams and graphs. This information reinstated facts given in the textbook. The later wiki pages focused on a comparative study of present-day supercomputers produced by Intel, AMD and IBM.
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>
        <td><p class="menu_right_titolo">
<a href="http://www.masonhq.com/">Mason</a>
</p>
<p class="menu_right">
All this website has built using Mason. Without this language derived from Perl,
most of the tools used for the manage of this site would never been developed.
</p>
    <p class="menu_right_titolo">
<a href=http://www.milonic.com/>JavaScript Menu</a></p>
<p class="menu_right">
The great menu at the top of this page.</p>
    </td>
      </tr>
    </table>
    <!-- END:  left_body -->
    </td>
    <td valign="top" width="8"><img src="/images/1x1.gif" width="8" height="1" alt=""></td>
  </tr>
</table>
<!-- INIT: Comments -->
<!-- END:  Comments -->
<p>
<!-- INIT: Google adSense -->
<table align="center">
<tr><td>
<td align="bottom">
<script type="text/javascript"><!--
google_ad_client = "pub-6275448308915555";
google_ad_width = 728;
google_ad_height = 90;
google_ad_format = "728x90_as";
google_ad_type = "text_image";
google_ad_channel ="";
google_color_border = "4575A3";
google_color_bg = "D1E9FF";
google_color_link = "0033CC";
google_color_url = "CC3300";
google_color_text = "000000";
//--></script>
<script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
</td>
</td></tr>
</table>
<!-- END: Google adSense -->


<table border="0" cellpadding="0" cellspacing="0" width="100%">
For example while writing the wiki for cache-coherence protocols, the students examined which protocol was favored by which company and why. They also discussed protocols which have been introduced in recent two years e.g., Intel's MESIF protocol. Such in depth analysis made the wiki more appealing to readers. Gehringer and Navalakha provided additional reviews which helped in constantly improving the quality of wiki pages. These reviews gave the students insight into what was expected expected of them. This led to an increasing focus on current developments while peer reviewing. It was observed that later versions of reviews included guidance similar to that received from Gehringer and Navalakha. The organization of the wiki pages and the volume of relevant data collected by students improved as the semester progressed.
  <tr><td colspan="3" bgcolor="#4575A3" class="linea" HEIGHT="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td></tr>
  <tr>
    <td> Copyright© 1997-2006 Emiliano Bruni</td>
<td align="center">Online from 16/08/1998 with <img src="/cgi-bin/c/Count.cgi?ft=0&df=ebruni.it.dat&comma=T&md=8&pad=T&dd=verdana" alt="" align="bottom">
visitors</td>
    <td align="right">Write me to:
    <img border="0" src="/images/mail.gif" width="77" height="10" align="baseline"></td>
  </tr>
</table>


<pre></pre>
Electronic peer-review systems have been widely used to review student work, but never before, to our knowledge, have they been applied to assignments consisting of multiple interrelated parts with precedence constraints. The growing interest in large collaborative projects, such as wiki textbooks, has led to a need for electronic support for the process, lest the administrative burden on instructor and TA grow too large.


<!-- END: body -->
'''Chapter wise learning from this independent study:'''
<a href=http://www.milonic.com/><font color="#FFFFFF">JavaScript Menu Courtesy of Milonic.com</font></a>
</td>
    <td valign="top" width="10"><img src="/images/1x1.gif" width="10" height="1" alt=""></td>
    <td valign="top">
    <!-- INIT: left_body -->
   
<table border="0" cellpadding="0" cellspacing="0" width="148">
      <tr>
        <td bgcolor="#3399FF" rowspan="8" width="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr><td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td></tr>
      <tr>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>


      </tr>
'''Chapter 1: '''
      <tr>
        <td><p class="menu_right_titolo_gruppo">Site Utility</p></td>
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>


        <td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
It covered an interesting topic of supercomputer evolution. Wiki pages written for this topic included a lot data from literature. Students came up with interesting topics which were not covered in the text book such as [http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Spring_2010/ch1_lm#Timeline_of_supercomputers Timeline of supercomputers], [http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1#First_Supercomputer_.28_ENIAC_.29 First Supercomputer(ENIAC)], [http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1#Cray_History Cray History], [http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1#Supercomputer_Hierarchal_Architecture Supercomputer Hierarchal Architecture], [http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1#SuperComputer_Operating_System Supercomputer Operating System], [http://pg-server.csc.ncsu.edu/mediawiki/index.php/1.1#Cooling_Supercomputer Cooling Supercomputer] and [http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Spring_2010/ch1_lm#Processor_Family Processor Family]. From their research we could see the increase in dominance of Intel’s processors in the consumer market. We also conclude that Unix has been the platform for most of these super computers. Massive Parallel Processing (MPP) and Symmetric Multiprocessing (SMP) were the earliest style of widely used multiprocessor machine architectures which was replaced by constellation computing in the 2000 and currently is dominated by cluster computing.
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>
        <td><p class="menu_right"><img border="0" src="/images/printer_version.gif" width="14" height="10" alt="">
          <a href="/en/software/os/i_love_wiki/index.mpl?print=1&">Printer version</a></p><p class="menu_right"><img border="0" src="/images/flag_italy.gif" width="14" height="10" alt="">
        <b>


          <a href="/it/software/os/i_love_wiki/index.mpl?">Leggilo in italiano</a></b></p>
References:
        <p class="menu_right">
        <img border="0" src="/images/stats.gif" width="14" height="10">
        <a target="_blank" href="/cgi-bin/perl/awstats/awstats.pl?lang=en">Site stats</a></p>
        <p class="menu_right">
        <img border="0" src="/images/stats.gif" width="14" height="10">
        <a href="/phpBB2/index.php">Read forums</a></p>
        <p class="menu_right">
        <img border="0" src="/images/sitemap.gif" width="14" height="10">
        <a href="/sitemap.mpl">Site map</a></p>
        </td>
      </tr>
    </table>
    <br>
           
        <!-- INIT: Google adSense -->
        <table border="0" cellpadding="0" cellspacing="0" width="148">
      <tr>
        <td bgcolor="#3399FF" rowspan="8" width="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr><td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td></tr>
      <tr>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>


      </tr>
[http://www.top500.org/ http://www.top500.org/]
      <tr>
        <td><p class="menu_right_titolo_gruppo">Ads by
  <font color="#2168E0">G</font><font color="#D8240C">o</font><font color="#F4C513">o</font><font color="#2168E0">g</font><font color="#2C9F2C">l</font><font color="#D70000">e</font></p></td>
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>


        <td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
[http://books.google.com/books?id=wx4kNh8ArH8C&pg=PA3&lpg=PA3&dq=evolution+of+supercomputers&source=bl&ots=7DVWaEYsZ4&sig=WKRWRuqtM-UfPoB-Wdka5ZWTgng&hl=en&ei=xAleS-TmDpqutgfcj_2jAg&sa=X&oi=book_result&ct=result&resnum=1&ved=0CAoQ6AEwADgK#v=onepage&q=evolution%20of%20supercomputers&f=false The future of supercomputing: an interim report By National Research Council (U.S.). Committee on the Future of Supercomputing]
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>
        <td>
        <script type="text/javascript"><!--
google_ad_client = "pub-6275448308915555";
google_ad_width = 120;
google_ad_height = 240;
google_ad_format = "120x240_as";
google_ad_type = "text_image";
google_ad_channel ="";
google_color_border = "FFFFFF";
google_color_bg = "FFFFFF";
google_color_link = "CC3300";
google_color_url = "CC3300";
google_color_text = "000000";
//--></script>
<script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
        </td>
      </tr>
    </table> 
<!-- END: Google adSense -->
<br>
            <table border="0" cellpadding="0" cellspacing="0" width="148">
      <tr>
        <td bgcolor="#dc9529" rowspan="8" width="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
        <td bgcolor="#dc9529" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr><td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td></tr>
      <tr>
        <td bgcolor="#dc9529" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>


      </tr>
'''Chapter 2: '''
      <tr>
        <td><p class="menu_right_titolo_gruppo"><font color="#dc9529">
<img border="0" src="/images/chat_with_me.gif" width="27" height="16"> Chat with me</font></p></td>
      </tr>
      <tr>
        <td height="2" bgcolor="#dc9529"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>


        <td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
Data Parallel Programming: The students provided comparisons between data parallelism and task parallelism. [http://pg-server.csc.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Spring_2010/ch_2_maf#References Haveraaen (2000)] notes that data parallel codes typically bear a strong resemblance to sequential codes, making them easier to read and write. Students noted that the data parallel model may be used with the shared memory or the message passing model without conflict. In their comparisons they concluded combining the data parallel and message passing models results in reduction in the amount and complexity of communication required relative to a task parallel approach. Similarly, combining the data parallel and shared memory models tends to simplify and reduce the amount of synchronization required. SIMD (single-instruction-multiple-data) processors are specifically designed to run data parallel algorithms. Modern examples include CUDA processors developed by nVidia and Cell processors developed by STI (Sony, Toshiba, and IBM).
      </tr>
      <tr>
        <td height="2" bgcolor="#dc9529"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>
        <td>
     
        <p class="menu_right">
        My nickname is <b>emi </b>but now I am <font color="#FF00FF"><b>offline</b></font>.
However you can usually find me on these channels of the
<a href="http://www.azzurra.org/">Azzurra</a> networks:</p>
<p class="menu_right">
        <span style="font-size: 7pt"><a href="irc://irc.azzurra.net/areanetworking">areanetworking</a>,  
        <a href="irc://irc.azzurra.net/telug">telug</a>,
        <a href="irc://irc.azzurra.net/controguerra">controguerra</a>, <a href="irc://irc.azzurra.net/webgui">webgui</a>,  
        <a href="irc://irc.azzurra.net/geeks">geeks</a>, <a href="irc://irc.azzurra.net/pescaralug">pescaralug</a></span></p>
       
        </td>
      </tr>
    </table>
       
   
<br>
<table border="0" cellpadding="0" cellspacing="0" width="148">
      <tr>
        <td bgcolor="#3399FF" rowspan="8" width="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr><td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td></tr>
      <tr>
        <td bgcolor="#3399FF" height="2"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>


      </tr>
References:
      <tr>
        <td><p class="menu_right_titolo_gruppo">Credits</p></td>
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>


        <td height="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
1. W. Daniel Hillis and Guy L. Steele, Jr., [http://portal.acm.org/citation.cfm?id=7903 "Data parallel algorithms,"] Communications of the ACM, 29(12):1170-1183, December 1986.
      </tr>
      <tr>
        <td height="2" bgcolor="#3399FF"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td>
      </tr>
      <tr>
        <td><p class="menu_right_titolo">
<a href="http://www.masonhq.com/">Mason</a>
</p>
<p class="menu_right">
All this website has built using Mason. Without this language derived from Perl,
most of the tools used for the manage of this site would never been developed.
</p>
    <p class="menu_right_titolo">
<a href=http://www.milonic.com/>JavaScript Menu</a></p>
<p class="menu_right">
The great menu at the top of this page.</p>
    </td>
      </tr>
    </table>
    <!-- END: left_body -->
    </td>
    <td valign="top" width="8"><img src="/images/1x1.gif" width="8" height="1" alt=""></td>
  </tr>
</table>
<!-- INIT: Comments -->
<!-- END:  Comments -->
<p>
<!-- INIT: Google adSense -->
<table align="center">
<tr><td>
<td align="bottom">
<script type="text/javascript"><!--
google_ad_client = "pub-6275448308915555";
google_ad_width = 728;
google_ad_height = 90;
google_ad_format = "728x90_as";
google_ad_type = "text_image";
google_ad_channel ="";
google_color_border = "4575A3";
google_color_bg = "D1E9FF";
google_color_link = "0033CC";
google_color_url = "CC3300";
google_color_text = "000000";
//--></script>
<script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
</td>
</td></tr>
</table>
<!-- END: Google adSense -->


<table border="0" cellpadding="0" cellspacing="0" width="100%">
2. Alexander C. Klaiber and Henry M. Levy, [http://portal.acm.org/citation.cfm?id=192020 "A comparison of message passing and shared memory architectures for data parallel programs,"] in Proceedings of the 21st Annual International Symposium on Computer Architecture, April 1994, pp. 94-105.
  <tr><td colspan="3" bgcolor="#4575A3" class="linea" HEIGHT="1"><img border="0" src="/images/1x1.gif" width="1" height="1" alt=""></td></tr>
  <tr>
    <td> Copyright© 1997-2006 Emiliano Bruni</td>
<td align="center">Online from 16/08/1998 with <img src="/cgi-bin/c/Count.cgi?ft=0&df=ebruni.it.dat&comma=T&md=8&pad=T&dd=verdana" alt="" align="bottom">
visitors</td>
    <td align="right">Write me to:
    <img border="0" src="/images/mail.gif" width="77" height="10" align="baseline"></td>
  </tr>
</table>


<pre></pre>
'''Chapter3: '''
</body>
 
</html>
In this wiki supplement, the three kinds of parallelisms, i.e. DOALL, DOACROSS and DOPIPE were discussed. These three parallelism techniques were discussed with examples in the form of Open MP code as discussed in the text book. Besides the students provided additional depth in this topic by discussing parallel_for, parallel_reduce, parallel_scan, pipeline, Reduction, DOALL, DOACROSS, DOPIPE with respect to Intel Thread Building Blocks. They also compared DOPIPE, DOACROSS, DOALL in POSIX Threads. Finally they conclude : Pthreads works for all the parallelism and could express functional parallelism easily, but it needs to build specialized synchronization primitives and explicitly privatize variables, makes it more effort needed to switch a serial program in to parallel mode.
 
OpenMP can provide many performance enhancing features, such as atomic, barrier and flush synchronization primitives. It is very simple to use OpenMP to exploit DOALL parallelism, but the syntax for expressing functional parallelism is awkward.
 
Intel TBB relies on generic programming, it performs better with custom iteration spaces or complex reduction operations. Also, it provides generic parallel patterns for parallel while-loops, data-flow pipeline models, parallel sorts and prefixes, so it's better in cases go beyond loop-based parallelism.
 
References:
 
1. [https://docs.google.com/viewer?a=v&pid=gmail&attid=0.1&thid=126f8a391c11262c&mt=application%2Fpdf&url=https%3A%2F%2Fmail.google.com%2Fmail%2F%3Fui%3D2%26ik%3Dd38b56c94f%26view%3Datt%26th%3D126f8a391c11262c%26attid%3D0.1%26disp%3Dattd%26realattid%3Df_g602ojwk0%26zw&sig=AHIEtbTeQDhK98IswmnVSfrPBMfmPLH5Nw An Optimal Abtraction Model for Hardware Multithreading in Modern Processor Architectures]
 
2. [http://www.threadingbuildingblocks.org/uploads/81/91/Latest%20Open%20Source%20Documentation/Reference.pdf Intel Threading Building Blocks 2.2 for Open Source Reference Manual]
 
3. [https://computing.llnl.gov/tutorials/pthreads/#Joining POSIX Threads Programming by Blaise Barney, Lawrence Livermore National Laboratory]
 
 
'''Chapter 6:'''
 
Cache Structures of Multi-Core Architectures: Students added additional insight on this topic by discussing Shared Memory Multiprocessors, write policies and replacement policies. Greedy Dual Size (GDS) and Priority Cache(PC) replacement policy was an additional subtopic students threw light on. Students also gave definitions about Trace Cache and Smart Cache techniques by Intel. The most important take away from this topic was how students discussed WRITE POLICIES used in recent multi core architectures. For example, Intel IA 32 IA64 architecture implements Write Combining, Write Collapsing, Weakly Ordered, Uncacheable & Write No Allocate and Non-temporal techniques in its cache. AMD uses cache exclusion unlike Intel’s cache inclusion. Sun's Niagara and SPARC use L1 caches as WT, with allocate on load and noallocate on stores.
 
References:
 
1. [http://download.intel.com/technology/architecture/sma.pdf http://download.intel.com/technology/architecture/sma.pdf]
 
2. [http://www.intel.com/Assets/PDF/manual/248966.pdf http://www.intel.com/Assets/PDF/manual/248966.pdf]
 
3. [http://www.intel.com/design/intarch/papers/cache6.pdf http://www.intel.com/design/intarch/papers/cache6.pdf]
 
'''Chapter 7: '''
 
Shared-memory multiprocessors run into several problems that are more pronounced than their uniprocessor counterparts. The Solihin text used in this course goes into detail on three of these issues, that is cache coherence, memory consistency and synchronization. The goal of this wiki supplement was to discuss these three issues and also what can be done to ensure that instructions are handled in both a timely and efficient manner and in a manner that is consistent with what the programmer might desire. Memory consistency was discussed by comparing ordering on a uniprocessor vs ordering on a multiprocessor. They concluded that in a multiprocessor much more care must be taken to ensure that all of the loads and stores are committed to memory in a valid order. Synchronization was discussed as applicable to Open MP and fence insertion. Other methods such as test and set method and direct interrupt to another core were also briefly discussed. The programmer (or complier) is responsible for knowing which synchronization directives are available on a given architecture and implementing them in an efficient manner. The students also discussed commonly used instructions for synchronization in popular processor architectures. For example SPARC V8 uses store barrier, Alpha uses memory barrier and write memory barrier whereas Intel x86 uses lfence (load) sfence (store).
 
References:
 
1. [https://wiki.ittc.ku.edu/ittc/images/0/0f/Loghi.pdf https://wiki.ittc.ku.edu/ittc/images/0/0f/Loghi.pdf]
 
2. [http://portal.acm.org/citation.cfm?id=782854&dl=GUIDE&coll=GUIDE&CFID=84866326&CFTOKEN=84791790 http://portal.acm.org/citation.cfm?id=782854&dl=GUIDE&coll=GUIDE&CFID=84866326&CFTOKEN=84791790]
 
 
'''Chapter 8:'''
 
Students discussed the existing bus-based cache coherence in real machines. They went ahead and classified the cache coherence protocols based on the year they were introduced and they processors which uses them. MSI protocol was first used in SGI IRIS 4D series. In Synapse protocol M state is called D (Dirty) but works the same as MSI protocol works. MSI has a major drawback in that each read-write sequence incurs 2 bus transactions irrespective of whether the cache line is stored in only one cache or not. The Pentium Pro microprocessor, introduced in 1992 was the first Intel architecture microprocessor to support SMP and MESI. The MESIF protocol, used in the latest Intel multi-core processors was introduced to accommodate the point-to-point links used in the QuickPath Interconnect. MESI came with the drawback of using much time and bandwidth. MOESI was the AMD’s answer to this problem . MOESI' has become one of the most popular snoop-based protocols supported in the AMD64 architecture. The AMD dual-core Opteron can maintain cache coherence in systems up to 8 processors using this protocol. The Dragon Protocol is an update based coherence protocol which does not invalidate other cached copies. The Dragon Protocol , was developed by Xerox Palo Alto Research Center(Xerox PARC), a subsidiary of Xerox Corporation. This protocol was used in the Xerox PARC Dragon multiprocessor workstation.
 
References:
 
1. [http://www.zak.ict.pwr.wroc.pl/nikodem/ak_materialy/Cache%20consistency%20&%20MESI.pdf Cache consistency with MESI on Intel processor]
 
2. [http://techreport.com/articles.x/8236/2 AMD dual core Architecture]
 
3. [http://ieeexplore.ieee.org.www.lib.ncsu.edu:2048/stamp/stamp.jsp?tp=&arnumber=4913 Silicon Graphics Computer Systems]
 
4. [http://portal.acm.org/citation.cfm?id=1499317&dl=GUIDE&coll=GUIDE&CFID=83027384&CFTOKEN=95680533 Synapse tightly coupled multiprocessors: a new approach to solve old problems]
 
5. [http://en.wikipedia.org/wiki/Dragon_protocol Dragon Protocol]
 
 
'''Chapter 9:'''
 
Synchronization: Students classified synchronization techniques based on implementation. Hardware synchronization uses locks, barriers and mutual exclusion. Software synchronization examples include ticket locks and queue-based MCS locks. Mutex implementation uses execution of atomic statements.
 
Some common examples include Test-and-Set, Fetch-and-Increment, Exchange, Compare-and-Swap. Another type of lock that was not discussed in the text is known as the "Hand-off" lock was discussed in detail by the students. They also discussed reasons why a programmer should attempt to write programs in such a way as to avoid locks. There are API's that exist for parallel architectures that provide specific types of synchronization. If the API are used they way they were design, performance can be maximized while minimizing overhead.Load Locked(LL) and Store Conditional(SC) are a pair of instructions are improved hardware primitives that are used for lock-free read-modify-write operation.
 
Detailed description of Combining Tree Barrier, Tournament Barrier and Disseminating Barrier was included. One of the interesting topics discussed in this wiki supplement was the performance evaluation of different barrier implementations. They showed that barrier/centralized blocking barrier does not scale with number of threads and the contention increases with increase in number of threads.
 
References:
 
1. [http://www2.cs.uh.edu/~hpctools/pub/iwomp-barrier.pdf http://www2.cs.uh.edu/~hpctools/pub/iwomp-barrier.pdf]
 
2. [http://www.statemaster.com/encyclopedia/Deadlock http://www.statemaster.com/encyclopedia/Deadlock]
 
3. [http://www.ukhec.ac.uk/publications/reports/synch_java.pdf http://www.ukhec.ac.uk/publications/reports/synch_java.pdf]
 
 
'''Chapter 10: '''
 
Students discussed the existing bus-based cache coherence in real machines. They went ahead and classified the cache coherence protocols based on the year they were introduced and they processors which uses them. MSI protocol was first used in SGI IRIS 4D series. In Synapse protocol M state is called D (Dirty) but works the same as MSI protocol works. MSI has a major drawback in that each read-write sequence incurs 2 bus transactions irrespective of whether the cache line is stored in only one cache or not. The Pentium Pro microprocessor, introduced in 1992 was the first Intel architecture microprocessor to support SMP and MESI. The MESIF protocol, used in the latest Intel multi-core processors was introduced to accommodate the point-to-point links used in the QuickPath Interconnect. MESI came with the drawback of using much time and bandwidth. MOESI was the AMD’s answer to this problem . MOESI' has become one of the most popular snoop-based protocols supported in the AMD64 architecture. The AMD dual-core Opteron can maintain cache coherence in systems up to 8 processors using this protocol. The Dragon Protocol is an update based coherence protocol which does not invalidate other cached copies. The Dragon Protocol , was developed by Xerox Palo Alto Research Center(Xerox PARC), a subsidiary of Xerox Corporation. This protocol was used in the Xerox PARC Dragon multiprocessor workstation. References:
 
1. [http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf Shared Memory Consistency Models]
 
2. [http://portal.acm.org/citation.cfm?id=193889&dl=GUIDE&coll=GUIDE&CFID=84028355&CFTOKEN=32262273 Designing Memory Consistency Models For Shared-Memory Multiprocessors]
 
3. [http://cs.gmu.edu/cne/modules/dsm/green/memcohe.html Consistency Models]
 
 
'''Chapter 11:'''
 
The cache coherence protocol presented in Chapter 11 of Solihin 2008 is simpler than most real directory-based protocols. This textbook supplement presents the directory-based protocols used by the DASH multiprocessor and the Alewife multiprocessor. It concludes with an argument of why complexity might be undesirable in cache coherence protocols. The DASH multiprocessor uses a two-level coherence protocol, relying on a snoopy bus to ensure cache coherence within cluster and a directory-based protocol to ensure coherence across clusters. The protocol uses a Remote Access Cache (RAC) at each cluster, which essentially consolidates memory blocks from remote clusters into a single cache on the local snoopy bus. When a request is issued for a block from a remote cluster that is not in the RAC, the request is denied but the request is also forwarded to the owner. The owner supplies the block to the RAC. Eventually, when the requestor retries, the block will be waiting in the RAC. Read and readx operations on a Dash processor were discussed in detail. They also discuss two race conditions which mainly arises on a Dash processor.The first occurs when a Read from requester R is forwarded from home H to owner O, but O sends a Writeback to H before the forwarded Read arrives. Another possible race occurs when the home node H replies with data (ReplyD) to a Read from requester R but an invalidation (Inv) arrives first. LimitLESS is the cache coherence protocol used by the Alewife multiprocessor. Unlike the DASH multiprocessor, the Alewife multiprocessor is not organized into clusters of nodes with local buses, and therefore cache coherence through the system is maintain through the directory.
 
References:
 
''1. ''Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy (1990). [http://doi.acm.org/10.1145/325164.325132 "The directory-based cache coherence protocol for the DASH multiprocessor."] In ''Proceedings of the 17th Annual International Symposium on Computer Architecture.''
 
2. David Chaiken, John Kubiatowicz, and Anant Agarwal (1991). [http://groups.csail.mit.edu/cag/papers/pdf/asplos4.pdf "LimitLESS directories: A scalable cache coherence scheme."] ''ACM SIGPLAN Notices''.
 
'''Chapter 12:'''
 
Interconnection Networks: Advances in multiprocessors, parallel computing & networking and parallel computer architectures demand very high performance from interconnection networks. Due to this, interconnection network structure has changed over time, trying to meet higher bandwidths and performance. Students discussed criterion to be considered for choosing the best Network. It included Performance Requirements, Scalability, Incremental expandability, Partitionability, Simplicity, Distance Span, Physical Constraints, Reliability and Reparability, Expected Workloads and Cost Constraints. They provided in depth discussion on Classification of Interconnection networks. Shared-Medium Networks include Token Ring, Token Bus, Backplane Bus. Direct Networks include Mesh, Torus, Hypercube, Tree, Cube-Connected Cycles and de Bruijn and Star Graph Networks. Indirect Networks include Regular Topologies like Crossbar Network and Multistage Interconnection Network and Hybrid Networks such as Multiple Backplane Buses, Hierarchical Networks, Cluster-Based Networks and Hypergraph Topologies. They also discussed routing algorithms and deadlock, starvation and livelock associated with it. These topics were covered in in an extremely detailed way. The students included a diagrammatic representation for every topology.
 
References:
 
1. [http://www.top500.org/2007_overview_recent_supercomputers/sci http://www.top500.org/2007_overview_recent_supercomputers/sci]
 
2. [http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/topology.html http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/topology.html]
 
 
'''Conclusion:'''
 
This independent study helped me to increase my knowledge to a great extent in the field of Architecture of parallel computers. There were 4 students working on every chapter and came up with 2 wiki pages per group. We collected a total of 18 wiki supplements. The data collected was enormous. While reviewing their content I kept updating my knowledge base. I also provided the resources from where they can collect data. This helped me to come across latest developments in the field. Interacting with students helped me to increase my communication skills. Constant discussions with Prof. Gehringer helped me to understand key concepts. This idea of writing wiki supplements got selected for KU Village presentation. I got an opportunity to present this paper along with Prof. Gehringer.

Revision as of 03:30, 8 November 2010

ECE633 Independent Study: Architecture of Parallel Computers
Karishma NavalakhaA

Abstract:

There has been tremendous research and development in the field of multi-core Architecture in the last decade. In such a dynamic environment it is very difficult to have text books covering latest developments in the field. Wiki written text books comes as an extremely handy tool for students to get acquainted and interested in ongoing research. In this independent study we explored an academic learning technique where students could learn the fundamental concepts of the subject through the text book available to students and lectures delivered by Prof. Gehringer in class. They can now build on this foundation and gather latest information from the varied online resources and technical papers and summarize their findings in the form of wiki pages. Software is also being currently developed to assist the students and was adopted in this course. We tried to enhance the quality of student submitted wiki pages through peer reviewing. Professor Gehringer and I constantly provided inputs to students to improve both their quality of wiki pages as well as quality of reviewing. The software being developed under the able guidance of professor Gehringer has been vital in overcoming administrative hurdles involved in assigning topics to students, maintaining the updates and tracking progress of their writings, getting feedbacks through peer reviewing and handling the re-submitted work. All this has been managed via the software in an organized fashion.

Experience with Wiki written text book:

The software was first deployed in CSC/ECE 506, Architecture of Parallel Computers. This is a beginning masters-level course that is taken by all Computer Engineering masters students. It is optional for Computer Science students, but as it is one way to fulfill a core requirement, it is popular with them too. The recently adopted textbook for this course is the locally written Fundamentals of Parallel Computer Architecture: Multichip and Multicore Systems [Solihin 2009]. It did not make sense to have the students rewrite this excellent text, but the book concentrates on theory and design fundamentals, without detailed application to current parallel machines. We felt that students would benefit from learning how the principles were applied in current architectures. Furthermore, they would learn about the newest machines in this fast-changing field.

After every chapter covered in class, two individuals, or pairs of students were required to sign up for writing the wiki supplement for that particular chapter. (That is, we solicited two supplements for each chapter, each of which could be authored by one or two students.) They were asked to add specific types of information which was not included in the chapter.

Initially, students were not clear about the purpose of their wiki pages. The first pages they wrote had substantial duplication of topics covered in the textbook. Students were attempting to give a complete coverage of issues discussed in the chapter. We wanted them to concentrate instead on recent developments. Upon seeing this, we established the practice of having the first two authors of this paper (Gehringer and Navalakha) review the student work, along with three peer reviews from fellow students. A lot of review time was spent providing guidance on how to revise.

At the beginning we gave the students complete freedom to explore resources for the topic they had chosen to write on. This was not very successful, as the students seemingly chose to read the first few search hits, which tended to provide an overview of the topic, rather than in-depth information on particular implementations. Sometimes students were not aware that the information they found was already covered in the next chapter, which they have not read yet. The first review which we gave students was mainly just making them aware of topics covered in later chapters. A lot of effort in writing the initial draft was thus wasted. After the first two sets of topics, we began to provide links for students to material that we wanted the students to pay attention to. Gehringer and Navalakha met weekly to discuss what to provide to students. We regularly consulted other textbooks, technology news, and Web sites of major processor manufacturers, such as Intel and AMD. As the semester progressed, the quality of the initial submissions improved, and the students realized better returns for their effort.

The quality of work seemed to improve as the semester progressed. A comparison of the grades for the wiki pages revealed that the average score for the first chapter written by each student was 82.8% while the average for the second submission was 82.7%. The quality of wiki pages had improved, but at the same time, the peer reviewers became more demanding. Students were given more inputs to improve their work via peer reviewing. Thus the improvement was seen in the final wiki page produced as against the grades received by students. The initial wiki pages provided randomly collected data and was cluttered by diagrams and graphs. This information reinstated facts given in the textbook. The later wiki pages focused on a comparative study of present-day supercomputers produced by Intel, AMD and IBM.

For example while writing the wiki for cache-coherence protocols, the students examined which protocol was favored by which company and why. They also discussed protocols which have been introduced in recent two years e.g., Intel's MESIF protocol. Such in depth analysis made the wiki more appealing to readers. Gehringer and Navalakha provided additional reviews which helped in constantly improving the quality of wiki pages. These reviews gave the students insight into what was expected expected of them. This led to an increasing focus on current developments while peer reviewing. It was observed that later versions of reviews included guidance similar to that received from Gehringer and Navalakha. The organization of the wiki pages and the volume of relevant data collected by students improved as the semester progressed.

Electronic peer-review systems have been widely used to review student work, but never before, to our knowledge, have they been applied to assignments consisting of multiple interrelated parts with precedence constraints. The growing interest in large collaborative projects, such as wiki textbooks, has led to a need for electronic support for the process, lest the administrative burden on instructor and TA grow too large.

Chapter wise learning from this independent study:

Chapter 1:

It covered an interesting topic of supercomputer evolution. Wiki pages written for this topic included a lot data from literature. Students came up with interesting topics which were not covered in the text book such as Timeline of supercomputers, First Supercomputer(ENIAC), Cray History, Supercomputer Hierarchal Architecture, Supercomputer Operating System, Cooling Supercomputer and Processor Family. From their research we could see the increase in dominance of Intel’s processors in the consumer market. We also conclude that Unix has been the platform for most of these super computers. Massive Parallel Processing (MPP) and Symmetric Multiprocessing (SMP) were the earliest style of widely used multiprocessor machine architectures which was replaced by constellation computing in the 2000 and currently is dominated by cluster computing.

References:

http://www.top500.org/

The future of supercomputing: an interim report By National Research Council (U.S.). Committee on the Future of Supercomputing

Chapter 2:

Data Parallel Programming: The students provided comparisons between data parallelism and task parallelism. Haveraaen (2000) notes that data parallel codes typically bear a strong resemblance to sequential codes, making them easier to read and write. Students noted that the data parallel model may be used with the shared memory or the message passing model without conflict. In their comparisons they concluded combining the data parallel and message passing models results in reduction in the amount and complexity of communication required relative to a task parallel approach. Similarly, combining the data parallel and shared memory models tends to simplify and reduce the amount of synchronization required. SIMD (single-instruction-multiple-data) processors are specifically designed to run data parallel algorithms. Modern examples include CUDA processors developed by nVidia and Cell processors developed by STI (Sony, Toshiba, and IBM).

References:

1. W. Daniel Hillis and Guy L. Steele, Jr., "Data parallel algorithms," Communications of the ACM, 29(12):1170-1183, December 1986.

2. Alexander C. Klaiber and Henry M. Levy, "A comparison of message passing and shared memory architectures for data parallel programs," in Proceedings of the 21st Annual International Symposium on Computer Architecture, April 1994, pp. 94-105.

Chapter3:

In this wiki supplement, the three kinds of parallelisms, i.e. DOALL, DOACROSS and DOPIPE were discussed. These three parallelism techniques were discussed with examples in the form of Open MP code as discussed in the text book. Besides the students provided additional depth in this topic by discussing parallel_for, parallel_reduce, parallel_scan, pipeline, Reduction, DOALL, DOACROSS, DOPIPE with respect to Intel Thread Building Blocks. They also compared DOPIPE, DOACROSS, DOALL in POSIX Threads. Finally they conclude : Pthreads works for all the parallelism and could express functional parallelism easily, but it needs to build specialized synchronization primitives and explicitly privatize variables, makes it more effort needed to switch a serial program in to parallel mode.

OpenMP can provide many performance enhancing features, such as atomic, barrier and flush synchronization primitives. It is very simple to use OpenMP to exploit DOALL parallelism, but the syntax for expressing functional parallelism is awkward.

Intel TBB relies on generic programming, it performs better with custom iteration spaces or complex reduction operations. Also, it provides generic parallel patterns for parallel while-loops, data-flow pipeline models, parallel sorts and prefixes, so it's better in cases go beyond loop-based parallelism.

References:

1. An Optimal Abtraction Model for Hardware Multithreading in Modern Processor Architectures

2. Intel Threading Building Blocks 2.2 for Open Source Reference Manual

3. POSIX Threads Programming by Blaise Barney, Lawrence Livermore National Laboratory


Chapter 6:

Cache Structures of Multi-Core Architectures: Students added additional insight on this topic by discussing Shared Memory Multiprocessors, write policies and replacement policies. Greedy Dual Size (GDS) and Priority Cache(PC) replacement policy was an additional subtopic students threw light on. Students also gave definitions about Trace Cache and Smart Cache techniques by Intel. The most important take away from this topic was how students discussed WRITE POLICIES used in recent multi core architectures. For example, Intel IA 32 IA64 architecture implements Write Combining, Write Collapsing, Weakly Ordered, Uncacheable & Write No Allocate and Non-temporal techniques in its cache. AMD uses cache exclusion unlike Intel’s cache inclusion. Sun's Niagara and SPARC use L1 caches as WT, with allocate on load and noallocate on stores.

References:

1. http://download.intel.com/technology/architecture/sma.pdf

2. http://www.intel.com/Assets/PDF/manual/248966.pdf

3. http://www.intel.com/design/intarch/papers/cache6.pdf

Chapter 7:

Shared-memory multiprocessors run into several problems that are more pronounced than their uniprocessor counterparts. The Solihin text used in this course goes into detail on three of these issues, that is cache coherence, memory consistency and synchronization. The goal of this wiki supplement was to discuss these three issues and also what can be done to ensure that instructions are handled in both a timely and efficient manner and in a manner that is consistent with what the programmer might desire. Memory consistency was discussed by comparing ordering on a uniprocessor vs ordering on a multiprocessor. They concluded that in a multiprocessor much more care must be taken to ensure that all of the loads and stores are committed to memory in a valid order. Synchronization was discussed as applicable to Open MP and fence insertion. Other methods such as test and set method and direct interrupt to another core were also briefly discussed. The programmer (or complier) is responsible for knowing which synchronization directives are available on a given architecture and implementing them in an efficient manner. The students also discussed commonly used instructions for synchronization in popular processor architectures. For example SPARC V8 uses store barrier, Alpha uses memory barrier and write memory barrier whereas Intel x86 uses lfence (load) sfence (store).

References:

1. https://wiki.ittc.ku.edu/ittc/images/0/0f/Loghi.pdf

2. http://portal.acm.org/citation.cfm?id=782854&dl=GUIDE&coll=GUIDE&CFID=84866326&CFTOKEN=84791790


Chapter 8:

Students discussed the existing bus-based cache coherence in real machines. They went ahead and classified the cache coherence protocols based on the year they were introduced and they processors which uses them. MSI protocol was first used in SGI IRIS 4D series. In Synapse protocol M state is called D (Dirty) but works the same as MSI protocol works. MSI has a major drawback in that each read-write sequence incurs 2 bus transactions irrespective of whether the cache line is stored in only one cache or not. The Pentium Pro microprocessor, introduced in 1992 was the first Intel architecture microprocessor to support SMP and MESI. The MESIF protocol, used in the latest Intel multi-core processors was introduced to accommodate the point-to-point links used in the QuickPath Interconnect. MESI came with the drawback of using much time and bandwidth. MOESI was the AMD’s answer to this problem . MOESI' has become one of the most popular snoop-based protocols supported in the AMD64 architecture. The AMD dual-core Opteron can maintain cache coherence in systems up to 8 processors using this protocol. The Dragon Protocol is an update based coherence protocol which does not invalidate other cached copies. The Dragon Protocol , was developed by Xerox Palo Alto Research Center(Xerox PARC), a subsidiary of Xerox Corporation. This protocol was used in the Xerox PARC Dragon multiprocessor workstation.

References:

1. Cache consistency with MESI on Intel processor

2. AMD dual core Architecture

3. Silicon Graphics Computer Systems

4. Synapse tightly coupled multiprocessors: a new approach to solve old problems

5. Dragon Protocol


Chapter 9:

Synchronization: Students classified synchronization techniques based on implementation. Hardware synchronization uses locks, barriers and mutual exclusion. Software synchronization examples include ticket locks and queue-based MCS locks. Mutex implementation uses execution of atomic statements.

Some common examples include Test-and-Set, Fetch-and-Increment, Exchange, Compare-and-Swap. Another type of lock that was not discussed in the text is known as the "Hand-off" lock was discussed in detail by the students. They also discussed reasons why a programmer should attempt to write programs in such a way as to avoid locks. There are API's that exist for parallel architectures that provide specific types of synchronization. If the API are used they way they were design, performance can be maximized while minimizing overhead.Load Locked(LL) and Store Conditional(SC) are a pair of instructions are improved hardware primitives that are used for lock-free read-modify-write operation.

Detailed description of Combining Tree Barrier, Tournament Barrier and Disseminating Barrier was included. One of the interesting topics discussed in this wiki supplement was the performance evaluation of different barrier implementations. They showed that barrier/centralized blocking barrier does not scale with number of threads and the contention increases with increase in number of threads.

References:

1. http://www2.cs.uh.edu/~hpctools/pub/iwomp-barrier.pdf

2. http://www.statemaster.com/encyclopedia/Deadlock

3. http://www.ukhec.ac.uk/publications/reports/synch_java.pdf


Chapter 10:

Students discussed the existing bus-based cache coherence in real machines. They went ahead and classified the cache coherence protocols based on the year they were introduced and they processors which uses them. MSI protocol was first used in SGI IRIS 4D series. In Synapse protocol M state is called D (Dirty) but works the same as MSI protocol works. MSI has a major drawback in that each read-write sequence incurs 2 bus transactions irrespective of whether the cache line is stored in only one cache or not. The Pentium Pro microprocessor, introduced in 1992 was the first Intel architecture microprocessor to support SMP and MESI. The MESIF protocol, used in the latest Intel multi-core processors was introduced to accommodate the point-to-point links used in the QuickPath Interconnect. MESI came with the drawback of using much time and bandwidth. MOESI was the AMD’s answer to this problem . MOESI' has become one of the most popular snoop-based protocols supported in the AMD64 architecture. The AMD dual-core Opteron can maintain cache coherence in systems up to 8 processors using this protocol. The Dragon Protocol is an update based coherence protocol which does not invalidate other cached copies. The Dragon Protocol , was developed by Xerox Palo Alto Research Center(Xerox PARC), a subsidiary of Xerox Corporation. This protocol was used in the Xerox PARC Dragon multiprocessor workstation. References:

1. Shared Memory Consistency Models

2. Designing Memory Consistency Models For Shared-Memory Multiprocessors

3. Consistency Models


Chapter 11:

The cache coherence protocol presented in Chapter 11 of Solihin 2008 is simpler than most real directory-based protocols. This textbook supplement presents the directory-based protocols used by the DASH multiprocessor and the Alewife multiprocessor. It concludes with an argument of why complexity might be undesirable in cache coherence protocols. The DASH multiprocessor uses a two-level coherence protocol, relying on a snoopy bus to ensure cache coherence within cluster and a directory-based protocol to ensure coherence across clusters. The protocol uses a Remote Access Cache (RAC) at each cluster, which essentially consolidates memory blocks from remote clusters into a single cache on the local snoopy bus. When a request is issued for a block from a remote cluster that is not in the RAC, the request is denied but the request is also forwarded to the owner. The owner supplies the block to the RAC. Eventually, when the requestor retries, the block will be waiting in the RAC. Read and readx operations on a Dash processor were discussed in detail. They also discuss two race conditions which mainly arises on a Dash processor.The first occurs when a Read from requester R is forwarded from home H to owner O, but O sends a Writeback to H before the forwarded Read arrives. Another possible race occurs when the home node H replies with data (ReplyD) to a Read from requester R but an invalidation (Inv) arrives first. LimitLESS is the cache coherence protocol used by the Alewife multiprocessor. Unlike the DASH multiprocessor, the Alewife multiprocessor is not organized into clusters of nodes with local buses, and therefore cache coherence through the system is maintain through the directory.

References:

1. Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy (1990). "The directory-based cache coherence protocol for the DASH multiprocessor." In Proceedings of the 17th Annual International Symposium on Computer Architecture.

2. David Chaiken, John Kubiatowicz, and Anant Agarwal (1991). "LimitLESS directories: A scalable cache coherence scheme." ACM SIGPLAN Notices.

Chapter 12:

Interconnection Networks: Advances in multiprocessors, parallel computing & networking and parallel computer architectures demand very high performance from interconnection networks. Due to this, interconnection network structure has changed over time, trying to meet higher bandwidths and performance. Students discussed criterion to be considered for choosing the best Network. It included Performance Requirements, Scalability, Incremental expandability, Partitionability, Simplicity, Distance Span, Physical Constraints, Reliability and Reparability, Expected Workloads and Cost Constraints. They provided in depth discussion on Classification of Interconnection networks. Shared-Medium Networks include Token Ring, Token Bus, Backplane Bus. Direct Networks include Mesh, Torus, Hypercube, Tree, Cube-Connected Cycles and de Bruijn and Star Graph Networks. Indirect Networks include Regular Topologies like Crossbar Network and Multistage Interconnection Network and Hybrid Networks such as Multiple Backplane Buses, Hierarchical Networks, Cluster-Based Networks and Hypergraph Topologies. They also discussed routing algorithms and deadlock, starvation and livelock associated with it. These topics were covered in in an extremely detailed way. The students included a diagrammatic representation for every topology.

References:

1. http://www.top500.org/2007_overview_recent_supercomputers/sci

2. http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/topology.html


Conclusion:

This independent study helped me to increase my knowledge to a great extent in the field of Architecture of parallel computers. There were 4 students working on every chapter and came up with 2 wiki pages per group. We collected a total of 18 wiki supplements. The data collected was enormous. While reviewing their content I kept updating my knowledge base. I also provided the resources from where they can collect data. This helped me to come across latest developments in the field. Interacting with students helped me to increase my communication skills. Constant discussions with Prof. Gehringer helped me to understand key concepts. This idea of writing wiki supplements got selected for KU Village presentation. I got an opportunity to present this paper along with Prof. Gehringer.