<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.expertiza.ncsu.edu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Remcelfr</id>
	<title>Expertiza_Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.expertiza.ncsu.edu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Remcelfr"/>
	<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Special:Contributions/Remcelfr"/>
	<updated>2026-05-17T11:48:40Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=E1735._UI_changes_for_review_and_score_reports&amp;diff=108549</id>
		<title>E1735. UI changes for review and score reports</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=E1735._UI_changes_for_review_and_score_reports&amp;diff=108549"/>
		<updated>2017-04-13T03:46:55Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Test Plan */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
This wiki provides details on the tasks that were undertaken as part of the continuous improvement to the Expertiza project.&lt;br /&gt;
===Background===&lt;br /&gt;
[[Expertiza_documentation|Expertiza]] is a web application where students can submit and peer-review learning objects (articles, code, web sites, etc). The application provides a complete system through which students and instructors collaborate on the learning objects as well as submit, review and grade assignments for the courses. It is used in select courses at NC State and by professors at several other colleges and universities. The Expertiza project is supported by the National Science Foundation.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Overview of Review Functionality===&lt;br /&gt;
&lt;br /&gt;
The Expertiza review system encompasses many types of reviews. Assignments can have multiple submission rounds defined with their own due dates and criteria. Each round can have an associated questionnaire whereby peers are encouraged, or required, to review each others' submissions and rate those submissions using the scores 1 through 5 for each question. The scores for each question are averaged to find the rating for the submission for each  questionnaire response. The average of all questionnaire responses determine the score for the submission.After the reviews are submitted, the recipient of those reviews can rate the reviewers using a similar questionnaire. On this questionnaire, the author of the submission will rate the reviews based on the reviewers understanding, helpfulness, and respectfulness. This is a review of the reviews, thus it is termed a &amp;quot;metareview.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Both students and instructors using Expertiza have the ability to view the reviews and the scores associated with the reviews for each assignment. These screens will display review summary and detail information in various formats such as lists, graphs, and heatgrids. The instructor will be able to see all review and score information for all teams on the assignment whereas a student will only be able to see the review and score information pertaining to them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The reviews and associated scores are available on the scores report. To access the score reports in Expertiza follow these instructions:&lt;br /&gt;
&lt;br /&gt;
'''As a student'''&lt;br /&gt;
&lt;br /&gt;
# Log into Expertiza.&lt;br /&gt;
# Click on the 'Assignments' link on the top navigation bar.&lt;br /&gt;
# Find the assignment in the list and click the title of the assignment.&lt;br /&gt;
# Click on the 'Your scores' link to see the standard view or click on the 'Alternate View' link to see the heatgrid view.&lt;br /&gt;
&lt;br /&gt;
'''As an instructor'''&lt;br /&gt;
&lt;br /&gt;
# Log into Expertiza.&lt;br /&gt;
# Hover over the 'Manage' item on the top navigation bar, then select click the 'Assignments' link.&lt;br /&gt;
# Find the assignment in the list and click the 'View scores' icon (a star with a magnifying glass) in the 'Actions' column.&lt;br /&gt;
# This will bring up the standard view. To see the heatgrid view, click the 'Alternate View' link on the team headings or click the 'view heatgrid' beneath the 'Final Score' when the team is expanded.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Motivation===&lt;br /&gt;
By participating in the overall refactoring effort as part of the continuous improvement of Expertiza, students get an opportunity to work on a open source software project. This helps them gain exposure on the technologies used in the project as well as much needed experience in collaborating with peers as part of the software development process. This effort was undertaken as a final project for the CSC 517 - Object Oriented Design and Development course at North Carolina State University in the spring of 2017.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Project Purpose==&lt;br /&gt;
&lt;br /&gt;
===Requirements Statement===&lt;br /&gt;
Expertiza displays reviews (i) to the team who was reviewed, and (ii) to the reviewer.  A student user can see all the reviews of his/her team’s project.  The instructor can see all the reviews of everyone’s project.  The instructor also has access to a review report, which shows, for each reviewer, all the reviews that (s)he wrote. Currently, the score report and review report use completely different code.  This makes the UI non-orthogonal and also causes DRY problems.  So, we would like to have a single way of displaying reviews that would be visible to students (reviews that they did, and reviews that their team received), and instructors (reviews that each time received, sorted by team; and reviews that each student did, sorted by student).&lt;br /&gt;
&lt;br /&gt;
===Required Tasks===&lt;br /&gt;
The tasks involved as part of this requirements change are as follows:&lt;br /&gt;
# Compact the review display&lt;br /&gt;
#* Eliminate the blank lines between items within a single review. Instead vary the background color from line to line to improve readability&lt;br /&gt;
#* With a single click, there should be a way to hide all the reviews, reveal just the headings (as at present), or expand all the reviews&lt;br /&gt;
# At the top of each review, it should say&lt;br /&gt;
#* Who submitted the review. The instructor should see the user’s name and user-ID.&lt;br /&gt;
#* A student should see&lt;br /&gt;
#** “Reviewer #k”, where k is an integer between 1 and n, the number of reviews that have been submitted for this project&lt;br /&gt;
#** The version number of the review&lt;br /&gt;
#** The time the review was submitted&lt;br /&gt;
# There should be a tabbed view to switch between various review views&lt;br /&gt;
#* One tab has overall statistics (averages, min, max, as the present “normal” view)&lt;br /&gt;
#* One tab has the heat map (current “alternate” view)&lt;br /&gt;
#* One tab has a grid view, with no scores, but text comments in the grid squares, and then a “More” link to display the whole comment (which will require expanding the row of the grid)&lt;br /&gt;
#* Switching between reviews from Reviewer k and Reviewer j might also be done by clicking on different tabs.  Or, it might be more convenient to keep the current score view, which lists the n reviews across the page.  Then the student should be able to click on the reviewer number (the instructor would instead click on the reviewer name) and see the review done by that reviewer&lt;br /&gt;
# To make it easy to focus on the reviewer’s feedback, there should be a way to hide and/or gray the criteria (“questions”), so the responses stand out more clearly&lt;br /&gt;
# There needs to be a way to search all reviews (of a particular project, or by a particular individual) for a given text string.  The user should be able to go from one instance of the text string to another by clicking down and up buttons&lt;br /&gt;
&lt;br /&gt;
===Problem Statement===&lt;br /&gt;
In the current state, the score report for instructors and students are built differently though they display the same information using similar UI elements. The application has multiple views into the same information but the way in which those views are accessed, the code which populates them, and the layout of the screens differs unnecessarily between instructors and students and across the views themselves. This leads to redundant code in both the backend and frontend of the application. Furthermore, since the UI is not uniform between instructors and students, instructors may have difficulty assisting students in accessing their score information due to the differences which are present in the UI that leads to confusion.&lt;br /&gt;
&lt;br /&gt;
Following are some of the issues with the current state UI which we seek to rectify.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Scores Report====&lt;br /&gt;
[[File:Problem_Statement_Diagram_1A_-_Instructor_View_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
The scores report for students and instructors have very similar layouts despite being created by different controller methods and views. They both display graphs, reviews on team submissions, author feedback, and score metrics. The primary different between them is that the instructor view (shown above) displays information for all teams in a collapsible accordion widget format while the student view (shown below) display the information only for a single team. There are some further discontinuities between the two UIs. For example, the student cannot access the heatgrid view from within the scores report page. This view is only accessible from the assignment page for students. For instructors there are two ways to access the heatgrid view from a single page. The 'Alternate View' link is adjacent to the team name on the heading bar and there is also a link inexplicably placed beneath the 'Final Score' field.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_1B_-_Student_View_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Author Feedback====&lt;br /&gt;
Author feedback takes two different forms when being shown to instructors but only one form for students. Both students and instructors have access to the author feedback format shown on the left in the student view. This format is identical to the format of the review scores and is displayed on the scores report for both instructor and student. The format on the right is shown on the heatgrid view yet it is only available to instructors. The information conveyed by these two formats is nearly identical and not uniformly available to all consumers of this information.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_2_-_Author_Feedback_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Graphs and Charts====&lt;br /&gt;
The scores report view has a bar of graphs and charts at the top of the page for both students and instructors. Both views shown below use a donut chart and bar graphs though their method of display is not uniform. The instructor view has titles beneath each item but the student view does not. The 'Submitted work' and 'Author Feedback' titles shown, despite being beneath the bar graphs, are actually headings for the metrics which are displayed beneath the graphs. Also, the graphs contain two labels on the y-axis: the maximum score and the average score. The graphs are so compact that the values on the axis overlap and make them illegible.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_3_-_Squashed_Graphs_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Project Design==&lt;br /&gt;
&lt;br /&gt;
===High Level Design===&lt;br /&gt;
&lt;br /&gt;
====Design Overview====&lt;br /&gt;
The project requirements state that we need to create a standard UI for accessing and consuming review and score information. The emphasis will be on smart and purposeful code reuse as well as ease of navigation to access information. To achieve these results we will redesign the scores report to work for both instructors and students alike. There will be a single route, a single controller method, and a single view that is common t users of both role types. In some instances, such as with the graphs and charts, the data will be different enough to warrant separate methods to retrieve the values needed. In most instances, the data is identical and will be accessed identically.&lt;br /&gt;
&lt;br /&gt;
We will create a standard hierarchy which works for both students (who only need to see a single team's scores) and instructors (who need to see all teams' scores). This hierarchy will be rendered within a page in the form of a set of tabbed panes which contain the contents. We will separate the information among four tabs.&lt;br /&gt;
&lt;br /&gt;
# Reviews and scores&lt;br /&gt;
# Author Feedback&lt;br /&gt;
# Graphs and Charts&lt;br /&gt;
# Heatgrid&lt;br /&gt;
&lt;br /&gt;
Putting these components on different tabs will allow us to de-clutter the UI. The instructor scores report UI has multiple ways to expand and collapse sections of information, links are placed in some unusual places, and the page can get so cluttered that it is difficult to distinguish one thing from another. This will be cleaned up by removing some links, removing the author feedback and the charts, and placing the reviews and scores into more discernible sections. The graphs and charts will be on their own tab so they can be larger and easier to read. Since they are not confined to a single bar of a fixed height new graphs and charts can be added. The heatgrid will no longer be coupled with the author feedback and there will be a standard author feedback view which encompasses all information needs. Each of these tabs will be rendered using their own partial. &lt;br /&gt;
&lt;br /&gt;
To prevent the application from pulling data for tabs which the user will not view, AJAX calls will be used to access the data on demand without reloading the page. These calls can also be expanded to request data for individual sections. Routes will be created to controller methods specifically to pull the data for each tab so that calls can be made to them to generate the necessary data structures. These will be accessed when a user expands a group or switches tabs.&lt;br /&gt;
&lt;br /&gt;
====Technologies Used====&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Technology&lt;br /&gt;
! Technology Type&lt;br /&gt;
! Use(s)&lt;br /&gt;
|-&lt;br /&gt;
| Ruby&lt;br /&gt;
| Language&lt;br /&gt;
| Core development of the application's backend system&lt;br /&gt;
|-&lt;br /&gt;
| Rails&lt;br /&gt;
| Framework&lt;br /&gt;
| Implements MVC; CRUD support; Web application support&lt;br /&gt;
|-&lt;br /&gt;
| RSpec (rspec)&lt;br /&gt;
| Gem&lt;br /&gt;
| Enables TDD; supports testing DSL&lt;br /&gt;
|-&lt;br /&gt;
| JQuery (jquery.ui.tabs)&lt;br /&gt;
| Library&lt;br /&gt;
| Enables creation of tabbed Web interfaces&lt;br /&gt;
|-&lt;br /&gt;
| JQuery (jquery.ui.accordion)&lt;br /&gt;
| Library&lt;br /&gt;
| Enables creation of accordion widgets for Web interfaces&lt;br /&gt;
|-&lt;br /&gt;
| Sass (sass-rails)&lt;br /&gt;
| Gem&lt;br /&gt;
| Sass style sheet pre-processor engine&lt;br /&gt;
|-&lt;br /&gt;
| Cascading Style Sheets (CSS)&lt;br /&gt;
| Language&lt;br /&gt;
| Web page presentation description language&lt;br /&gt;
|-&lt;br /&gt;
| Sassy CSS (SCSS)&lt;br /&gt;
| Language&lt;br /&gt;
| Superset of CSS style sheet language&lt;br /&gt;
|-&lt;br /&gt;
| rails-ajax&lt;br /&gt;
| Gem&lt;br /&gt;
| Enable AJAX to refresh containers within views without reloading the page&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
====Software Patterns====&lt;br /&gt;
&lt;br /&gt;
'''Model-View-Controller (MVC)''' - MVC is a software architectural pattern which divides an application into three parts which separate the data representation from the interface through which users will access and operate on the data. The main components of this pattern are the:&lt;br /&gt;
* Model - The underlying logical structure of the application's data along with the accesses and operators on the data in the persistent storage medium.&lt;br /&gt;
* View - The user facing representation of the data along with the means for the user to request, operate on, or view the data.&lt;br /&gt;
* Controller - The intermediary layer between the Model and the View which accepts requests from the View, translates that to the Model, receives the Model's response and formats the response to the View.&lt;br /&gt;
&lt;br /&gt;
'''Active Record''' - A software architectural pattern which wraps data from persistent storage, along with the method to operate on the data, in a class or object of a class to be used within an application. It uses the object-relation mapping (ORM) technique to create virtual database objects.&lt;br /&gt;
&lt;br /&gt;
===Low Level Design===&lt;br /&gt;
&lt;br /&gt;
====Screen Mockups====&lt;br /&gt;
&lt;br /&gt;
'''New Scores Report With Multiple Teams and Collapsed Reviews'''&lt;br /&gt;
&lt;br /&gt;
[[File:Expanded_Teams_With_Collapsed_Tabs_-_45.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''New Scores Report With Multiple Teams and Expanded Reviews'''&lt;br /&gt;
&lt;br /&gt;
[[File:Expanded_Teams_With_Expanded_Tabs_-_85.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Plan==&lt;br /&gt;
For testing, we built off the existing test framework which implemented RSPEC.  The goal of our testing is to verify that the existing functionality is still present in the new views and is accessible to both the student and the instructor.  Below are the tests that have been modified or created to verify the changed functionality.  Any other tests not listed below are assumed to be unmodified and are intended to still work with the new updates. Note that some requirements of this project cannot be verified with the script such as formatting changes, but any functional modifications to the layout will be verified.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;2&amp;quot; | Test Case 1&lt;br /&gt;
|-&lt;br /&gt;
! Test Type&lt;br /&gt;
| Functional&lt;br /&gt;
|-&lt;br /&gt;
! Scenario&lt;br /&gt;
| Test to see if modifications to the display_as_html function in response.rb order to hide the “feedback review” button is correct for a student&lt;br /&gt;
|-&lt;br /&gt;
! Pre-Conditions&lt;br /&gt;
| &lt;br /&gt;
Make sure student to test with has the following:&lt;br /&gt;
#Has one course assigned&lt;br /&gt;
#Has one assigned as part of that course&lt;br /&gt;
#Has more than one review for that assignment&lt;br /&gt;
|-&lt;br /&gt;
! Description&lt;br /&gt;
| &lt;br /&gt;
#Login to the site as a student&lt;br /&gt;
#Click on &amp;quot;Assignments&amp;quot; on the top menu&lt;br /&gt;
#Select an assignment from the list&lt;br /&gt;
#On the &amp;quot;Submit or Review work for Expertiza&amp;quot; screen, click on &amp;quot;Your scores”&lt;br /&gt;
#On the &amp;quot;Summary Report for assignment&amp;quot; screen, verify that the &amp;quot;Give feedback for Review 1&amp;quot; button is not present&lt;br /&gt;
|-&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; | &amp;amp;nbsp;&lt;br /&gt;
|-&lt;br /&gt;
! colspan=&amp;quot;2&amp;quot; | Test Case 2&lt;br /&gt;
|-&lt;br /&gt;
! Test Type&lt;br /&gt;
| Functional&lt;br /&gt;
|-&lt;br /&gt;
! Scenario&lt;br /&gt;
| Test to see if modifications to the display_as_html function in response.rb order to show the “feedback review” button is correct for a student&lt;br /&gt;
|-&lt;br /&gt;
! Pre-Conditions&lt;br /&gt;
| &lt;br /&gt;
Make sure student to test with has the following:&lt;br /&gt;
#Has one course assigned&lt;br /&gt;
#Has one assigned as part of that course&lt;br /&gt;
#Has more than one review for that assignment&lt;br /&gt;
|-&lt;br /&gt;
! Description&lt;br /&gt;
| &lt;br /&gt;
#Login to the site as a student&lt;br /&gt;
#Click on &amp;quot;Assignments&amp;quot; on the top menu&lt;br /&gt;
#Select an assignment from the list&lt;br /&gt;
#On the &amp;quot;Submit or Review work for Expertiza&amp;quot; screen, click on &amp;quot;Your scores”&lt;br /&gt;
#On the “Summary Report for assignment” screen, select the “Show Review” button&lt;br /&gt;
#On the &amp;quot;Summary Report for assignment&amp;quot; screen, verify that the &amp;quot;Give feedback for Review 1&amp;quot; button is present&lt;br /&gt;
|-&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; | &amp;amp;nbsp;&lt;br /&gt;
|-&lt;br /&gt;
! colspan=&amp;quot;2&amp;quot; | Test Case 3&lt;br /&gt;
|-&lt;br /&gt;
! Test Type&lt;br /&gt;
| Functional&lt;br /&gt;
|-&lt;br /&gt;
! Scenario&lt;br /&gt;
| Test to see if modifications to the display_as_html function in response.rb order to hide the “feedback review” button is correct for an instructor&lt;br /&gt;
|-&lt;br /&gt;
! Pre-Conditions&lt;br /&gt;
| &lt;br /&gt;
Make sure student to test with has the following:&lt;br /&gt;
#At least one assignment exists&lt;br /&gt;
#Has more than one review for that assignment&lt;br /&gt;
|-&lt;br /&gt;
! Description&lt;br /&gt;
| &lt;br /&gt;
#Login to the site as an instructor&lt;br /&gt;
#Click on &amp;quot;Manage&amp;quot; on the top menu&lt;br /&gt;
#Click on “Assignments”&lt;br /&gt;
#Select an assignment from the list&lt;br /&gt;
#Click on &amp;quot;View Scores”&lt;br /&gt;
#On the &amp;quot;Summary Report for assignment&amp;quot; screen, verify that the &amp;quot;Give feedback for Review 1&amp;quot; button is not present&lt;br /&gt;
|-&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; | &amp;amp;nbsp;&lt;br /&gt;
|-&lt;br /&gt;
! colspan=&amp;quot;2&amp;quot; | Test Case 4&lt;br /&gt;
|-&lt;br /&gt;
! Test Type&lt;br /&gt;
| Functional&lt;br /&gt;
|-&lt;br /&gt;
! Scenario&lt;br /&gt;
| Test to see if modifications to the display_as_html function in response.rb order to show the “feedback review” button is correct for an instructor&lt;br /&gt;
|-&lt;br /&gt;
! Pre-Conditions&lt;br /&gt;
| &lt;br /&gt;
Make sure student to test with has the following:&lt;br /&gt;
#At least one assignment exists&lt;br /&gt;
#Has more than one review for that assignment&lt;br /&gt;
|-&lt;br /&gt;
! Description&lt;br /&gt;
| &lt;br /&gt;
#Login to the site as an instructor&lt;br /&gt;
#Click on &amp;quot;Manage&amp;quot; on the top menu&lt;br /&gt;
#Click on “Assignments”&lt;br /&gt;
#Select an assignment from the list&lt;br /&gt;
#Click on &amp;quot;View Scores”&lt;br /&gt;
#On the “Summary Report for assignment” screen, select the “Show Review” button&lt;br /&gt;
#On the &amp;quot;Summary Report for assignment&amp;quot; screen, verify that the &amp;quot;Give feedback for Review 1&amp;quot; button is present&lt;br /&gt;
|-&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; | &amp;amp;nbsp;&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=E1735._UI_changes_for_review_and_score_reports&amp;diff=108548</id>
		<title>E1735. UI changes for review and score reports</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=E1735._UI_changes_for_review_and_score_reports&amp;diff=108548"/>
		<updated>2017-04-13T03:45:50Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
This wiki provides details on the tasks that were undertaken as part of the continuous improvement to the Expertiza project.&lt;br /&gt;
===Background===&lt;br /&gt;
[[Expertiza_documentation|Expertiza]] is a web application where students can submit and peer-review learning objects (articles, code, web sites, etc). The application provides a complete system through which students and instructors collaborate on the learning objects as well as submit, review and grade assignments for the courses. It is used in select courses at NC State and by professors at several other colleges and universities. The Expertiza project is supported by the National Science Foundation.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Overview of Review Functionality===&lt;br /&gt;
&lt;br /&gt;
The Expertiza review system encompasses many types of reviews. Assignments can have multiple submission rounds defined with their own due dates and criteria. Each round can have an associated questionnaire whereby peers are encouraged, or required, to review each others' submissions and rate those submissions using the scores 1 through 5 for each question. The scores for each question are averaged to find the rating for the submission for each  questionnaire response. The average of all questionnaire responses determine the score for the submission.After the reviews are submitted, the recipient of those reviews can rate the reviewers using a similar questionnaire. On this questionnaire, the author of the submission will rate the reviews based on the reviewers understanding, helpfulness, and respectfulness. This is a review of the reviews, thus it is termed a &amp;quot;metareview.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Both students and instructors using Expertiza have the ability to view the reviews and the scores associated with the reviews for each assignment. These screens will display review summary and detail information in various formats such as lists, graphs, and heatgrids. The instructor will be able to see all review and score information for all teams on the assignment whereas a student will only be able to see the review and score information pertaining to them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The reviews and associated scores are available on the scores report. To access the score reports in Expertiza follow these instructions:&lt;br /&gt;
&lt;br /&gt;
'''As a student'''&lt;br /&gt;
&lt;br /&gt;
# Log into Expertiza.&lt;br /&gt;
# Click on the 'Assignments' link on the top navigation bar.&lt;br /&gt;
# Find the assignment in the list and click the title of the assignment.&lt;br /&gt;
# Click on the 'Your scores' link to see the standard view or click on the 'Alternate View' link to see the heatgrid view.&lt;br /&gt;
&lt;br /&gt;
'''As an instructor'''&lt;br /&gt;
&lt;br /&gt;
# Log into Expertiza.&lt;br /&gt;
# Hover over the 'Manage' item on the top navigation bar, then select click the 'Assignments' link.&lt;br /&gt;
# Find the assignment in the list and click the 'View scores' icon (a star with a magnifying glass) in the 'Actions' column.&lt;br /&gt;
# This will bring up the standard view. To see the heatgrid view, click the 'Alternate View' link on the team headings or click the 'view heatgrid' beneath the 'Final Score' when the team is expanded.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Motivation===&lt;br /&gt;
By participating in the overall refactoring effort as part of the continuous improvement of Expertiza, students get an opportunity to work on a open source software project. This helps them gain exposure on the technologies used in the project as well as much needed experience in collaborating with peers as part of the software development process. This effort was undertaken as a final project for the CSC 517 - Object Oriented Design and Development course at North Carolina State University in the spring of 2017.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Project Purpose==&lt;br /&gt;
&lt;br /&gt;
===Requirements Statement===&lt;br /&gt;
Expertiza displays reviews (i) to the team who was reviewed, and (ii) to the reviewer.  A student user can see all the reviews of his/her team’s project.  The instructor can see all the reviews of everyone’s project.  The instructor also has access to a review report, which shows, for each reviewer, all the reviews that (s)he wrote. Currently, the score report and review report use completely different code.  This makes the UI non-orthogonal and also causes DRY problems.  So, we would like to have a single way of displaying reviews that would be visible to students (reviews that they did, and reviews that their team received), and instructors (reviews that each time received, sorted by team; and reviews that each student did, sorted by student).&lt;br /&gt;
&lt;br /&gt;
===Required Tasks===&lt;br /&gt;
The tasks involved as part of this requirements change are as follows:&lt;br /&gt;
# Compact the review display&lt;br /&gt;
#* Eliminate the blank lines between items within a single review. Instead vary the background color from line to line to improve readability&lt;br /&gt;
#* With a single click, there should be a way to hide all the reviews, reveal just the headings (as at present), or expand all the reviews&lt;br /&gt;
# At the top of each review, it should say&lt;br /&gt;
#* Who submitted the review. The instructor should see the user’s name and user-ID.&lt;br /&gt;
#* A student should see&lt;br /&gt;
#** “Reviewer #k”, where k is an integer between 1 and n, the number of reviews that have been submitted for this project&lt;br /&gt;
#** The version number of the review&lt;br /&gt;
#** The time the review was submitted&lt;br /&gt;
# There should be a tabbed view to switch between various review views&lt;br /&gt;
#* One tab has overall statistics (averages, min, max, as the present “normal” view)&lt;br /&gt;
#* One tab has the heat map (current “alternate” view)&lt;br /&gt;
#* One tab has a grid view, with no scores, but text comments in the grid squares, and then a “More” link to display the whole comment (which will require expanding the row of the grid)&lt;br /&gt;
#* Switching between reviews from Reviewer k and Reviewer j might also be done by clicking on different tabs.  Or, it might be more convenient to keep the current score view, which lists the n reviews across the page.  Then the student should be able to click on the reviewer number (the instructor would instead click on the reviewer name) and see the review done by that reviewer&lt;br /&gt;
# To make it easy to focus on the reviewer’s feedback, there should be a way to hide and/or gray the criteria (“questions”), so the responses stand out more clearly&lt;br /&gt;
# There needs to be a way to search all reviews (of a particular project, or by a particular individual) for a given text string.  The user should be able to go from one instance of the text string to another by clicking down and up buttons&lt;br /&gt;
&lt;br /&gt;
===Problem Statement===&lt;br /&gt;
In the current state, the score report for instructors and students are built differently though they display the same information using similar UI elements. The application has multiple views into the same information but the way in which those views are accessed, the code which populates them, and the layout of the screens differs unnecessarily between instructors and students and across the views themselves. This leads to redundant code in both the backend and frontend of the application. Furthermore, since the UI is not uniform between instructors and students, instructors may have difficulty assisting students in accessing their score information due to the differences which are present in the UI that leads to confusion.&lt;br /&gt;
&lt;br /&gt;
Following are some of the issues with the current state UI which we seek to rectify.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Scores Report====&lt;br /&gt;
[[File:Problem_Statement_Diagram_1A_-_Instructor_View_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
The scores report for students and instructors have very similar layouts despite being created by different controller methods and views. They both display graphs, reviews on team submissions, author feedback, and score metrics. The primary different between them is that the instructor view (shown above) displays information for all teams in a collapsible accordion widget format while the student view (shown below) display the information only for a single team. There are some further discontinuities between the two UIs. For example, the student cannot access the heatgrid view from within the scores report page. This view is only accessible from the assignment page for students. For instructors there are two ways to access the heatgrid view from a single page. The 'Alternate View' link is adjacent to the team name on the heading bar and there is also a link inexplicably placed beneath the 'Final Score' field.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_1B_-_Student_View_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Author Feedback====&lt;br /&gt;
Author feedback takes two different forms when being shown to instructors but only one form for students. Both students and instructors have access to the author feedback format shown on the left in the student view. This format is identical to the format of the review scores and is displayed on the scores report for both instructor and student. The format on the right is shown on the heatgrid view yet it is only available to instructors. The information conveyed by these two formats is nearly identical and not uniformly available to all consumers of this information.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_2_-_Author_Feedback_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Graphs and Charts====&lt;br /&gt;
The scores report view has a bar of graphs and charts at the top of the page for both students and instructors. Both views shown below use a donut chart and bar graphs though their method of display is not uniform. The instructor view has titles beneath each item but the student view does not. The 'Submitted work' and 'Author Feedback' titles shown, despite being beneath the bar graphs, are actually headings for the metrics which are displayed beneath the graphs. Also, the graphs contain two labels on the y-axis: the maximum score and the average score. The graphs are so compact that the values on the axis overlap and make them illegible.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_3_-_Squashed_Graphs_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Project Design==&lt;br /&gt;
&lt;br /&gt;
===High Level Design===&lt;br /&gt;
&lt;br /&gt;
====Design Overview====&lt;br /&gt;
The project requirements state that we need to create a standard UI for accessing and consuming review and score information. The emphasis will be on smart and purposeful code reuse as well as ease of navigation to access information. To achieve these results we will redesign the scores report to work for both instructors and students alike. There will be a single route, a single controller method, and a single view that is common t users of both role types. In some instances, such as with the graphs and charts, the data will be different enough to warrant separate methods to retrieve the values needed. In most instances, the data is identical and will be accessed identically.&lt;br /&gt;
&lt;br /&gt;
We will create a standard hierarchy which works for both students (who only need to see a single team's scores) and instructors (who need to see all teams' scores). This hierarchy will be rendered within a page in the form of a set of tabbed panes which contain the contents. We will separate the information among four tabs.&lt;br /&gt;
&lt;br /&gt;
# Reviews and scores&lt;br /&gt;
# Author Feedback&lt;br /&gt;
# Graphs and Charts&lt;br /&gt;
# Heatgrid&lt;br /&gt;
&lt;br /&gt;
Putting these components on different tabs will allow us to de-clutter the UI. The instructor scores report UI has multiple ways to expand and collapse sections of information, links are placed in some unusual places, and the page can get so cluttered that it is difficult to distinguish one thing from another. This will be cleaned up by removing some links, removing the author feedback and the charts, and placing the reviews and scores into more discernible sections. The graphs and charts will be on their own tab so they can be larger and easier to read. Since they are not confined to a single bar of a fixed height new graphs and charts can be added. The heatgrid will no longer be coupled with the author feedback and there will be a standard author feedback view which encompasses all information needs. Each of these tabs will be rendered using their own partial. &lt;br /&gt;
&lt;br /&gt;
To prevent the application from pulling data for tabs which the user will not view, AJAX calls will be used to access the data on demand without reloading the page. These calls can also be expanded to request data for individual sections. Routes will be created to controller methods specifically to pull the data for each tab so that calls can be made to them to generate the necessary data structures. These will be accessed when a user expands a group or switches tabs.&lt;br /&gt;
&lt;br /&gt;
====Technologies Used====&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Technology&lt;br /&gt;
! Technology Type&lt;br /&gt;
! Use(s)&lt;br /&gt;
|-&lt;br /&gt;
| Ruby&lt;br /&gt;
| Language&lt;br /&gt;
| Core development of the application's backend system&lt;br /&gt;
|-&lt;br /&gt;
| Rails&lt;br /&gt;
| Framework&lt;br /&gt;
| Implements MVC; CRUD support; Web application support&lt;br /&gt;
|-&lt;br /&gt;
| RSpec (rspec)&lt;br /&gt;
| Gem&lt;br /&gt;
| Enables TDD; supports testing DSL&lt;br /&gt;
|-&lt;br /&gt;
| JQuery (jquery.ui.tabs)&lt;br /&gt;
| Library&lt;br /&gt;
| Enables creation of tabbed Web interfaces&lt;br /&gt;
|-&lt;br /&gt;
| JQuery (jquery.ui.accordion)&lt;br /&gt;
| Library&lt;br /&gt;
| Enables creation of accordion widgets for Web interfaces&lt;br /&gt;
|-&lt;br /&gt;
| Sass (sass-rails)&lt;br /&gt;
| Gem&lt;br /&gt;
| Sass style sheet pre-processor engine&lt;br /&gt;
|-&lt;br /&gt;
| Cascading Style Sheets (CSS)&lt;br /&gt;
| Language&lt;br /&gt;
| Web page presentation description language&lt;br /&gt;
|-&lt;br /&gt;
| Sassy CSS (SCSS)&lt;br /&gt;
| Language&lt;br /&gt;
| Superset of CSS style sheet language&lt;br /&gt;
|-&lt;br /&gt;
| rails-ajax&lt;br /&gt;
| Gem&lt;br /&gt;
| Enable AJAX to refresh containers within views without reloading the page&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
====Software Patterns====&lt;br /&gt;
&lt;br /&gt;
'''Model-View-Controller (MVC)''' - MVC is a software architectural pattern which divides an application into three parts which separate the data representation from the interface through which users will access and operate on the data. The main components of this pattern are the:&lt;br /&gt;
* Model - The underlying logical structure of the application's data along with the accesses and operators on the data in the persistent storage medium.&lt;br /&gt;
* View - The user facing representation of the data along with the means for the user to request, operate on, or view the data.&lt;br /&gt;
* Controller - The intermediary layer between the Model and the View which accepts requests from the View, translates that to the Model, receives the Model's response and formats the response to the View.&lt;br /&gt;
&lt;br /&gt;
'''Active Record''' - A software architectural pattern which wraps data from persistent storage, along with the method to operate on the data, in a class or object of a class to be used within an application. It uses the object-relation mapping (ORM) technique to create virtual database objects.&lt;br /&gt;
&lt;br /&gt;
===Low Level Design===&lt;br /&gt;
&lt;br /&gt;
====Screen Mockups====&lt;br /&gt;
&lt;br /&gt;
'''New Scores Report With Multiple Teams and Collapsed Reviews'''&lt;br /&gt;
&lt;br /&gt;
[[File:Expanded_Teams_With_Collapsed_Tabs_-_45.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''New Scores Report With Multiple Teams and Expanded Reviews'''&lt;br /&gt;
&lt;br /&gt;
[[File:Expanded_Teams_With_Expanded_Tabs_-_85.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Plan==&lt;br /&gt;
In order to test the functionality of our updates, we will focus on reusing the existing test cases which validate the current views for the score report and the reviews report.  Since the goal is to make the code which generates these views common, the tests will include generation of pages from a student login and instructor login to verify that the common elements are present.  This will also ensure that the underlying code which is used to generate the two views works as expected.  For the sections of the views which differ, such as the displaying of reviewer name, we will write independent test that ensure the correct name is listed based on the logged in user type.&lt;br /&gt;
&lt;br /&gt;
Additionally to explicitly testing our updates, we will update the other tests which rely on the contents of the pages in order to invoke various functionality. An example of this will be the addition of the tabbed views.  The existing tests will be updated in order to account for the transition to the appropriate tab before selecting links as appropriate.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;2&amp;quot; | Test Case 1&lt;br /&gt;
|-&lt;br /&gt;
! Test Type&lt;br /&gt;
| Functional&lt;br /&gt;
|-&lt;br /&gt;
! Scenario&lt;br /&gt;
| Test to see if modifications to the display_as_html function in response.rb order to hide the “feedback review” button is correct for a student&lt;br /&gt;
|-&lt;br /&gt;
! Pre-Conditions&lt;br /&gt;
| &lt;br /&gt;
Make sure student to test with has the following:&lt;br /&gt;
#Has one course assigned&lt;br /&gt;
#Has one assigned as part of that course&lt;br /&gt;
#Has more than one review for that assignment&lt;br /&gt;
|-&lt;br /&gt;
! Description&lt;br /&gt;
| &lt;br /&gt;
#Login to the site as a student&lt;br /&gt;
#Click on &amp;quot;Assignments&amp;quot; on the top menu&lt;br /&gt;
#Select an assignment from the list&lt;br /&gt;
#On the &amp;quot;Submit or Review work for Expertiza&amp;quot; screen, click on &amp;quot;Your scores”&lt;br /&gt;
#On the &amp;quot;Summary Report for assignment&amp;quot; screen, verify that the &amp;quot;Give feedback for Review 1&amp;quot; button is not present&lt;br /&gt;
|-&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; | &amp;amp;nbsp;&lt;br /&gt;
|-&lt;br /&gt;
! colspan=&amp;quot;2&amp;quot; | Test Case 2&lt;br /&gt;
|-&lt;br /&gt;
! Test Type&lt;br /&gt;
| Functional&lt;br /&gt;
|-&lt;br /&gt;
! Scenario&lt;br /&gt;
| Test to see if modifications to the display_as_html function in response.rb order to show the “feedback review” button is correct for a student&lt;br /&gt;
|-&lt;br /&gt;
! Pre-Conditions&lt;br /&gt;
| &lt;br /&gt;
Make sure student to test with has the following:&lt;br /&gt;
#Has one course assigned&lt;br /&gt;
#Has one assigned as part of that course&lt;br /&gt;
#Has more than one review for that assignment&lt;br /&gt;
|-&lt;br /&gt;
! Description&lt;br /&gt;
| &lt;br /&gt;
#Login to the site as a student&lt;br /&gt;
#Click on &amp;quot;Assignments&amp;quot; on the top menu&lt;br /&gt;
#Select an assignment from the list&lt;br /&gt;
#On the &amp;quot;Submit or Review work for Expertiza&amp;quot; screen, click on &amp;quot;Your scores”&lt;br /&gt;
#On the “Summary Report for assignment” screen, select the “Show Review” button&lt;br /&gt;
#On the &amp;quot;Summary Report for assignment&amp;quot; screen, verify that the &amp;quot;Give feedback for Review 1&amp;quot; button is present&lt;br /&gt;
|-&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; | &amp;amp;nbsp;&lt;br /&gt;
|-&lt;br /&gt;
! colspan=&amp;quot;2&amp;quot; | Test Case 3&lt;br /&gt;
|-&lt;br /&gt;
! Test Type&lt;br /&gt;
| Functional&lt;br /&gt;
|-&lt;br /&gt;
! Scenario&lt;br /&gt;
| Test to see if modifications to the display_as_html function in response.rb order to hide the “feedback review” button is correct for an instructor&lt;br /&gt;
|-&lt;br /&gt;
! Pre-Conditions&lt;br /&gt;
| &lt;br /&gt;
Make sure student to test with has the following:&lt;br /&gt;
#At least one assignment exists&lt;br /&gt;
#Has more than one review for that assignment&lt;br /&gt;
|-&lt;br /&gt;
! Description&lt;br /&gt;
| &lt;br /&gt;
#Login to the site as an instructor&lt;br /&gt;
#Click on &amp;quot;Manage&amp;quot; on the top menu&lt;br /&gt;
#Click on “Assignments”&lt;br /&gt;
#Select an assignment from the list&lt;br /&gt;
#Click on &amp;quot;View Scores”&lt;br /&gt;
#On the &amp;quot;Summary Report for assignment&amp;quot; screen, verify that the &amp;quot;Give feedback for Review 1&amp;quot; button is not present&lt;br /&gt;
|-&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; | &amp;amp;nbsp;&lt;br /&gt;
! colspan=&amp;quot;2&amp;quot; | Test Case 4&lt;br /&gt;
|-&lt;br /&gt;
! Test Type&lt;br /&gt;
| Functional&lt;br /&gt;
|-&lt;br /&gt;
! Scenario&lt;br /&gt;
| Test to see if modifications to the display_as_html function in response.rb order to show the “feedback review” button is correct for an instructor&lt;br /&gt;
|-&lt;br /&gt;
! Pre-Conditions&lt;br /&gt;
| &lt;br /&gt;
Make sure student to test with has the following:&lt;br /&gt;
#At least one assignment exists&lt;br /&gt;
#Has more than one review for that assignment&lt;br /&gt;
|-&lt;br /&gt;
! Description&lt;br /&gt;
| &lt;br /&gt;
#Login to the site as an instructor&lt;br /&gt;
#Click on &amp;quot;Manage&amp;quot; on the top menu&lt;br /&gt;
#Click on “Assignments”&lt;br /&gt;
#Select an assignment from the list&lt;br /&gt;
#Click on &amp;quot;View Scores”&lt;br /&gt;
#On the “Summary Report for assignment” screen, select the “Show Review” button&lt;br /&gt;
#On the &amp;quot;Summary Report for assignment&amp;quot; screen, verify that the &amp;quot;Give feedback for Review 1&amp;quot; button is present&lt;br /&gt;
|-&lt;br /&gt;
| colspan=&amp;quot;2&amp;quot; | &amp;amp;nbsp;&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=E1735._UI_changes_for_review_and_score_reports&amp;diff=108220</id>
		<title>E1735. UI changes for review and score reports</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=E1735._UI_changes_for_review_and_score_reports&amp;diff=108220"/>
		<updated>2017-04-08T02:58:08Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Test Plan */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
This wiki provides details on the tasks that were undertaken as part of the continuous improvement to the Expertiza project.&lt;br /&gt;
===Background===&lt;br /&gt;
[[Expertiza_documentation|Expertiza]] is a web application where students can submit and peer-review learning objects (articles, code, web sites, etc). The Expertiza project is supported by the National Science Foundation.&lt;br /&gt;
&lt;br /&gt;
The application provides a complete system through which students and instructors collaborate on the learning objects as well as submit, review and grade assignments for the courses.&lt;br /&gt;
&lt;br /&gt;
===Overview of Review Functionality===&lt;br /&gt;
&lt;br /&gt;
The Expertiza review system encompasses many types of reviews. Assignments can have multiple submission rounds defined with their own due dates and criteria. Each round can have an associated questionnaire whereby peers are encouraged, or required, to review each others' submissions and rate those submissions using the scores 1 through 5 for each question. The scores for each question are averaged to find the rating for the submission for each  questionnaire response. The average of all questionnaire responses determine the score for the submission.After the reviews are submitted, the recipient of those reviews can rate the reviewers using a similar questionnaire. On this questionnaire, the author of the submission will rate the reviews based on the reviewers understanding, helpfulness, and respectfulness. This is a review of the reviews, thus it is termed a &amp;quot;metareview.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Both students and instructors using Expertiza have the ability to view the reviews and the scores associated with the reviews for each assignment. These screens will display review summary and detail information in various formats such as lists, graphs, and heatgrids. The instructor will be able to see all review and score information for all teams on the assignment whereas a student will only be able to see the review and score information pertaining to them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The reviews and associated scores are available on the scores report. To access the score reports in Expertiza follow these instructions:&lt;br /&gt;
&lt;br /&gt;
'''As a student'''&lt;br /&gt;
&lt;br /&gt;
# Log into Expertiza.&lt;br /&gt;
# Click on the 'Assignments' link on the top navigation bar.&lt;br /&gt;
# Find the assignment in the list and click the title of the assignment.&lt;br /&gt;
# Click on the 'Your scores' link to see the standard view or click on the 'Alternate View' link to see the heatgrid view.&lt;br /&gt;
&lt;br /&gt;
'''As an instructor'''&lt;br /&gt;
&lt;br /&gt;
# Log into Expertiza.&lt;br /&gt;
# Hover over the 'Manage' item on the top navigation bar, then select click the 'Assignments' link.&lt;br /&gt;
# Find the assignment in the list and click the 'View scores' icon (a star with a magnifying glass) in the 'Actions' column.&lt;br /&gt;
# This will bring up the standard view. To see the heatgrid view, click the 'Alternate View' link on the team headings or click the 'view heatgrid' beneath the 'Final Score' when the team is expanded.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Motivation===&lt;br /&gt;
By participating in the overall refactoring effort as part of the continuous improvement of Expertiza, students get an opportunity to work on a open source software project. This helps them gain exposure on the technologies used in the project as well as much needed experience in collaborating with peers as part of the software development process.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Project Purpose==&lt;br /&gt;
&lt;br /&gt;
===Requirements Statement===&lt;br /&gt;
Expertiza displays reviews (i) to the team who was reviewed, and (ii) to the reviewer.  A student user can see all the reviews of his/her team’s project.  The instructor can see all the reviews of everyone’s project.  The instructor also has access to a review report, which shows, for each reviewer, all the reviews that (s)he wrote. Currently, the score report and review report use completely different code.  This makes the UI non-orthogonal and also causes DRY problems.  So, we would like to have a single way of displaying reviews that would be visible to students (reviews that they did, and reviews that their team received), and instructors (reviews that each time received, sorted by team; and reviews that each student did, sorted by student).&lt;br /&gt;
&lt;br /&gt;
===Required Tasks===&lt;br /&gt;
The tasks involved as part of this requirements change are as follows:&lt;br /&gt;
# Compact the review display&lt;br /&gt;
#* Eliminate the blank lines between items within a single review. Instead vary the background color from line to line to improve readability&lt;br /&gt;
#* With a single click, there should be a way to hide all the reviews, reveal just the headings (as at present), or expand all the reviews&lt;br /&gt;
# At the top of each review, it should say&lt;br /&gt;
#* Who submitted the review. The instructor should see the user’s name and user-ID.&lt;br /&gt;
#* A student should see&lt;br /&gt;
#** “Reviewer #k”, where k is an integer between 1 and n, the number of reviews that have been submitted for this project&lt;br /&gt;
#** The version number of the review&lt;br /&gt;
#** The time the review was submitted&lt;br /&gt;
# There should be a tabbed view to switch between various review views&lt;br /&gt;
#* One tab has overall statistics (averages, min, max, as the present “normal” view)&lt;br /&gt;
#* One tab has the heat map (current “alternate” view)&lt;br /&gt;
#* One tab has a grid view, with no scores, but text comments in the grid squares, and then a “More” link to display the whole comment (which will require expanding the row of the grid)&lt;br /&gt;
#* Switching between reviews from Reviewer k and Reviewer j might also be done by clicking on different tabs.  Or, it might be more convenient to keep the current score view, which lists the n reviews across the page.  Then the student should be able to click on the reviewer number (the instructor would instead click on the reviewer name) and see the review done by that reviewer&lt;br /&gt;
# To make it easy to focus on the reviewer’s feedback, there should be a way to hide and/or gray the criteria (“questions”), so the responses stand out more clearly&lt;br /&gt;
# There needs to be a way to search all reviews (of a particular project, or by a particular individual) for a given text string.  The user should be able to go from one instance of the text string to another by clicking down and up buttons&lt;br /&gt;
&lt;br /&gt;
===Problem Statement===&lt;br /&gt;
In the current state, the score report for instructors and students are built differently though they display the same information using similar UI elements. The application has multiple views into the same information but the way in which those views are accessed, the code which populates them, and the layout of the screens differs unnecessarily between instructors and students and across the views themselves. This leads to redundant code in both the backend and frontend of the application. Furthermore, since the UI is not uniform between instructors and students, instructors may have difficulty assisting students in accessing their score information due to the differences which are present in the UI that leads to confusion.&lt;br /&gt;
&lt;br /&gt;
Following are some of the issues with the current state UI which we seek to rectify.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Scores Report====&lt;br /&gt;
[[File:Problem_Statement_Diagram_1A_-_Instructor_View_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
The scores report for students and instructors have very similar layouts despite being created by different controller methods and views. They both display graphs, reviews on team submissions, author feedback, and score metrics. The primary different between them is that the instructor view (shown above) displays information for all teams in a collapsible accordion widget format while the student view (shown below) display the information only for a single team. There are some further discontinuities between the two UIs. For example, the student cannot access the heatgrid view from within the scores report page. This view is only accessible from the assignment page for students. For instructors there are two ways to access the heatgrid view from a single page. The 'Alternate View' link is adjacent to the team name on the heading bar and there is also a link inexplicably placed beneath the 'Final Score' field.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_1B_-_Student_View_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Author Feedback====&lt;br /&gt;
Author feedback takes two different forms when being shown to instructors but only one form for students. Both students and instructors have access to the author feedback format shown on the left in the student view. This format is identical to the format of the review scores and is displayed on the scores report for both instructor and student. The format on the right is shown on the heatgrid view yet it is only available to instructors. The information conveyed by these two formats is nearly identical and not uniformly available to all consumers of this information.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_2_-_Author_Feedback_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Graphs and Charts====&lt;br /&gt;
The scores report view has a bar of graphs and charts at the top of the page for both students and instructors. Both views shown below use a donut chart and bar graphs though their method of display is not uniform. The instructor view has titles beneath each item but the student view does not. The 'Submitted work' and 'Author Feedback' titles shown, despite being beneath the bar graphs, are actually headings for the metrics which are displayed beneath the graphs. Also, the graphs contain two labels on the y-axis: the maximum score and the average score. The graphs are so compact that the values on the axis overlap and make them illegible.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_3_-_Squashed_Graphs_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Project Design==&lt;br /&gt;
&lt;br /&gt;
===High Level Design===&lt;br /&gt;
The project requirements state that we need to create a standard UI for accessing and consuming review and score information. The emphasis will be on smart and purposeful code reuse as well as ease of navigation to access information. To achieve these results we will redesign the scores report to work for both instructors and students alike. There will be a single route, a single controller method, and a single view that is common t users of both role types. In some instances, such as with the graphs and charts, the data will be different enough to warrant separate methods to retrieve the values needed. In most instances, the data is identical and will be accessed identically.&lt;br /&gt;
&lt;br /&gt;
We will create a standard hierarchy which works for both students (who only need to see a single team's scores) and instructors (who need to see all teams' scores). This hierarchy will be rendered within a page in the form of a set of tabbed panes which contain the contents. We will separate the information among four tabs.&lt;br /&gt;
&lt;br /&gt;
# Reviews and scores&lt;br /&gt;
# Author Feedback&lt;br /&gt;
# Graphs and Charts&lt;br /&gt;
# Heatgrid&lt;br /&gt;
&lt;br /&gt;
Putting these components on different tabs will allow us to de-clutter the UI. The instructor scores report UI has multiple ways to expand and collapse sections of information, links are placed in some unusual places, and the page can get so cluttered that it is difficult to distinguish one thing from another. This will be cleaned up by removing some links, removing the author feedback and the charts, and placing the reviews and scores into more discernible sections. The graphs and charts will be on their own tab so they can be larger and easier to read. Since they are not confined to a single bar of a fixed height new graphs and charts can be added. The heatgrid will no longer be coupled with the author feedback and there will be a standard author feedback view which encompasses all information needs. Each of these tabs will be rendered using their own partial. &lt;br /&gt;
&lt;br /&gt;
To prevent the application from pulling data for tabs which the user will not view, AJAX calls will be used to access the data on demand without reloading the page. These calls can also be expanded to request data for individual sections. Routes will be created to controller methods specifically to pull the data for each tab so that calls can be made to them to generate the necessary data structures. These will be accessed when a user expands a group or switches tabs.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Plan==&lt;br /&gt;
In order to test the functionality of our updates, we will focus on reusing the existing test cases which validate the current views for the score report and the reviews report.  Since the goal is to make the code which generates these views common, the tests will include generation of pages from a student login and instructor login to verify that the common elements are present.  This will also ensure that the underlying code which is used to generate the two views works as expected.  For the sections of the views which differ, such as the displaying of reviewer name, we will write independent test that ensure the correct name is listed based on the logged in user type.&lt;br /&gt;
&lt;br /&gt;
Additionally to explicitly testing our updates, we will update the other tests which rely on the contents of the pages in order to invoke various functionality. An example of this will be the addition of the tabbed views.  The existing tests will be updated in order to account for the transition to the appropriate tab before selecting links as appropriate.&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=E1735._UI_changes_for_review_and_score_reports&amp;diff=108219</id>
		<title>E1735. UI changes for review and score reports</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=E1735._UI_changes_for_review_and_score_reports&amp;diff=108219"/>
		<updated>2017-04-08T02:56:58Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
This wiki provides details on the tasks that were undertaken as part of the continuous improvement to the Expertiza project.&lt;br /&gt;
===Background===&lt;br /&gt;
[[Expertiza_documentation|Expertiza]] is a web application where students can submit and peer-review learning objects (articles, code, web sites, etc). The Expertiza project is supported by the National Science Foundation.&lt;br /&gt;
&lt;br /&gt;
The application provides a complete system through which students and instructors collaborate on the learning objects as well as submit, review and grade assignments for the courses.&lt;br /&gt;
&lt;br /&gt;
===Overview of Review Functionality===&lt;br /&gt;
&lt;br /&gt;
The Expertiza review system encompasses many types of reviews. Assignments can have multiple submission rounds defined with their own due dates and criteria. Each round can have an associated questionnaire whereby peers are encouraged, or required, to review each others' submissions and rate those submissions using the scores 1 through 5 for each question. The scores for each question are averaged to find the rating for the submission for each  questionnaire response. The average of all questionnaire responses determine the score for the submission.After the reviews are submitted, the recipient of those reviews can rate the reviewers using a similar questionnaire. On this questionnaire, the author of the submission will rate the reviews based on the reviewers understanding, helpfulness, and respectfulness. This is a review of the reviews, thus it is termed a &amp;quot;metareview.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Both students and instructors using Expertiza have the ability to view the reviews and the scores associated with the reviews for each assignment. These screens will display review summary and detail information in various formats such as lists, graphs, and heatgrids. The instructor will be able to see all review and score information for all teams on the assignment whereas a student will only be able to see the review and score information pertaining to them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The reviews and associated scores are available on the scores report. To access the score reports in Expertiza follow these instructions:&lt;br /&gt;
&lt;br /&gt;
'''As a student'''&lt;br /&gt;
&lt;br /&gt;
# Log into Expertiza.&lt;br /&gt;
# Click on the 'Assignments' link on the top navigation bar.&lt;br /&gt;
# Find the assignment in the list and click the title of the assignment.&lt;br /&gt;
# Click on the 'Your scores' link to see the standard view or click on the 'Alternate View' link to see the heatgrid view.&lt;br /&gt;
&lt;br /&gt;
'''As an instructor'''&lt;br /&gt;
&lt;br /&gt;
# Log into Expertiza.&lt;br /&gt;
# Hover over the 'Manage' item on the top navigation bar, then select click the 'Assignments' link.&lt;br /&gt;
# Find the assignment in the list and click the 'View scores' icon (a star with a magnifying glass) in the 'Actions' column.&lt;br /&gt;
# This will bring up the standard view. To see the heatgrid view, click the 'Alternate View' link on the team headings or click the 'view heatgrid' beneath the 'Final Score' when the team is expanded.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Motivation===&lt;br /&gt;
By participating in the overall refactoring effort as part of the continuous improvement of Expertiza, students get an opportunity to work on a open source software project. This helps them gain exposure on the technologies used in the project as well as much needed experience in collaborating with peers as part of the software development process.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Project Purpose==&lt;br /&gt;
&lt;br /&gt;
===Requirements Statement===&lt;br /&gt;
Expertiza displays reviews (i) to the team who was reviewed, and (ii) to the reviewer.  A student user can see all the reviews of his/her team’s project.  The instructor can see all the reviews of everyone’s project.  The instructor also has access to a review report, which shows, for each reviewer, all the reviews that (s)he wrote. Currently, the score report and review report use completely different code.  This makes the UI non-orthogonal and also causes DRY problems.  So, we would like to have a single way of displaying reviews that would be visible to students (reviews that they did, and reviews that their team received), and instructors (reviews that each time received, sorted by team; and reviews that each student did, sorted by student).&lt;br /&gt;
&lt;br /&gt;
===Required Tasks===&lt;br /&gt;
The tasks involved as part of this requirements change are as follows:&lt;br /&gt;
# Compact the review display&lt;br /&gt;
#* Eliminate the blank lines between items within a single review. Instead vary the background color from line to line to improve readability&lt;br /&gt;
#* With a single click, there should be a way to hide all the reviews, reveal just the headings (as at present), or expand all the reviews&lt;br /&gt;
# At the top of each review, it should say&lt;br /&gt;
#* Who submitted the review. The instructor should see the user’s name and user-ID.&lt;br /&gt;
#* A student should see&lt;br /&gt;
#** “Reviewer #k”, where k is an integer between 1 and n, the number of reviews that have been submitted for this project&lt;br /&gt;
#** The version number of the review&lt;br /&gt;
#** The time the review was submitted&lt;br /&gt;
# There should be a tabbed view to switch between various review views&lt;br /&gt;
#* One tab has overall statistics (averages, min, max, as the present “normal” view)&lt;br /&gt;
#* One tab has the heat map (current “alternate” view)&lt;br /&gt;
#* One tab has a grid view, with no scores, but text comments in the grid squares, and then a “More” link to display the whole comment (which will require expanding the row of the grid)&lt;br /&gt;
#* Switching between reviews from Reviewer k and Reviewer j might also be done by clicking on different tabs.  Or, it might be more convenient to keep the current score view, which lists the n reviews across the page.  Then the student should be able to click on the reviewer number (the instructor would instead click on the reviewer name) and see the review done by that reviewer&lt;br /&gt;
# To make it easy to focus on the reviewer’s feedback, there should be a way to hide and/or gray the criteria (“questions”), so the responses stand out more clearly&lt;br /&gt;
# There needs to be a way to search all reviews (of a particular project, or by a particular individual) for a given text string.  The user should be able to go from one instance of the text string to another by clicking down and up buttons&lt;br /&gt;
&lt;br /&gt;
===Problem Statement===&lt;br /&gt;
In the current state, the score report for instructors and students are built differently though they display the same information using similar UI elements. The application has multiple views into the same information but the way in which those views are accessed, the code which populates them, and the layout of the screens differs unnecessarily between instructors and students and across the views themselves. This leads to redundant code in both the backend and frontend of the application. Furthermore, since the UI is not uniform between instructors and students, instructors may have difficulty assisting students in accessing their score information due to the differences which are present in the UI that leads to confusion.&lt;br /&gt;
&lt;br /&gt;
Following are some of the issues with the current state UI which we seek to rectify.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Scores Report====&lt;br /&gt;
[[File:Problem_Statement_Diagram_1A_-_Instructor_View_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
The scores report for students and instructors have very similar layouts despite being created by different controller methods and views. They both display graphs, reviews on team submissions, author feedback, and score metrics. The primary different between them is that the instructor view (shown above) displays information for all teams in a collapsible accordion widget format while the student view (shown below) display the information only for a single team. There are some further discontinuities between the two UIs. For example, the student cannot access the heatgrid view from within the scores report page. This view is only accessible from the assignment page for students. For instructors there are two ways to access the heatgrid view from a single page. The 'Alternate View' link is adjacent to the team name on the heading bar and there is also a link inexplicably placed beneath the 'Final Score' field.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_1B_-_Student_View_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Author Feedback====&lt;br /&gt;
Author feedback takes two different forms when being shown to instructors but only one form for students. Both students and instructors have access to the author feedback format shown on the left in the student view. This format is identical to the format of the review scores and is displayed on the scores report for both instructor and student. The format on the right is shown on the heatgrid view yet it is only available to instructors. The information conveyed by these two formats is nearly identical and not uniformly available to all consumers of this information.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_2_-_Author_Feedback_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====Graphs and Charts====&lt;br /&gt;
The scores report view has a bar of graphs and charts at the top of the page for both students and instructors. Both views shown below use a donut chart and bar graphs though their method of display is not uniform. The instructor view has titles beneath each item but the student view does not. The 'Submitted work' and 'Author Feedback' titles shown, despite being beneath the bar graphs, are actually headings for the metrics which are displayed beneath the graphs. Also, the graphs contain two labels on the y-axis: the maximum score and the average score. The graphs are so compact that the values on the axis overlap and make them illegible.&lt;br /&gt;
&lt;br /&gt;
[[File:Problem_Statement_Diagram_3_-_Squashed_Graphs_-_65.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Project Design==&lt;br /&gt;
&lt;br /&gt;
===High Level Design===&lt;br /&gt;
The project requirements state that we need to create a standard UI for accessing and consuming review and score information. The emphasis will be on smart and purposeful code reuse as well as ease of navigation to access information. To achieve these results we will redesign the scores report to work for both instructors and students alike. There will be a single route, a single controller method, and a single view that is common t users of both role types. In some instances, such as with the graphs and charts, the data will be different enough to warrant separate methods to retrieve the values needed. In most instances, the data is identical and will be accessed identically.&lt;br /&gt;
&lt;br /&gt;
We will create a standard hierarchy which works for both students (who only need to see a single team's scores) and instructors (who need to see all teams' scores). This hierarchy will be rendered within a page in the form of a set of tabbed panes which contain the contents. We will separate the information among four tabs.&lt;br /&gt;
&lt;br /&gt;
# Reviews and scores&lt;br /&gt;
# Author Feedback&lt;br /&gt;
# Graphs and Charts&lt;br /&gt;
# Heatgrid&lt;br /&gt;
&lt;br /&gt;
Putting these components on different tabs will allow us to de-clutter the UI. The instructor scores report UI has multiple ways to expand and collapse sections of information, links are placed in some unusual places, and the page can get so cluttered that it is difficult to distinguish one thing from another. This will be cleaned up by removing some links, removing the author feedback and the charts, and placing the reviews and scores into more discernible sections. The graphs and charts will be on their own tab so they can be larger and easier to read. Since they are not confined to a single bar of a fixed height new graphs and charts can be added. The heatgrid will no longer be coupled with the author feedback and there will be a standard author feedback view which encompasses all information needs. Each of these tabs will be rendered using their own partial. &lt;br /&gt;
&lt;br /&gt;
To prevent the application from pulling data for tabs which the user will not view, AJAX calls will be used to access the data on demand without reloading the page. These calls can also be expanded to request data for individual sections. Routes will be created to controller methods specifically to pull the data for each tab so that calls can be made to them to generate the necessary data structures. These will be accessed when a user expands a group or switches tabs.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Plan==&lt;br /&gt;
In order to test the functionality of our updates, we will focus on reusing the existing test cases which validate the current views for the score report and the reviews report.  Since the goal is to make the code which generates these views common, the tests will include generation of pages from a student login and instructor login to verify that the common elements are present.  This will also ensure that the underlying code which is used to generate the two views works as expected.  For the sections of the views which differ, such as the displaying of reviewer name.  We will write independent test that ensure the correct name is listed based on the logged in user type.&lt;br /&gt;
&lt;br /&gt;
Additionally to explicitly testing our updates, we will update the other tests which rely on the contents of the pages in order to invoke various functionality. An example of this will be the addition of the tabbed views.  The existing tests will be updated in order to account for the transition to the appropriate tab before selecting links as appropriate.&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Spring_2017/E1724&amp;diff=107345</id>
		<title>CSC/ECE 517 Spring 2017/E1724</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Spring_2017/E1724&amp;diff=107345"/>
		<updated>2017-03-24T00:51:03Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''E1724 - Refactoring Feature Tests'''&lt;br /&gt;
&lt;br /&gt;
'''About Expertiza&lt;br /&gt;
&lt;br /&gt;
Expertiza is an open source web application project based on Ruby on Rails framework. It provides an online interactive platform for instructors to post and grade assignments, and for students to contribute to team-based projects as well as individual assignments.&lt;br /&gt;
&lt;br /&gt;
'''Problem Statement'''&lt;br /&gt;
&lt;br /&gt;
Remove duplicated code in feature tests and improve the overall Code Climate.&lt;br /&gt;
&lt;br /&gt;
'''Refactoring Delayed_mailer Method''' &lt;br /&gt;
&lt;br /&gt;
Delayed_mailer_spec method covers testing scenarios with the email reminder feature targeting various users and tasks.&lt;br /&gt;
This method took time stamps using time_parse function, which would create issues when users change their time zones. Calling time_zone_parse instead of time_parse solves the issue. This method also had massive duplicate code for different test scenarios. A helper method enqueue_delayed_job(stage) is used to encapsulate the task being performed.&lt;br /&gt;
&lt;br /&gt;
After refactoring in delayed_mailer_spec.rb:&lt;br /&gt;
&lt;br /&gt;
    def enqueue_delayed_job(stage)&lt;br /&gt;
       #enqueue a delayed job using current stage’s timestamp&lt;br /&gt;
    end&lt;br /&gt;
&lt;br /&gt;
    describe '&amp;lt;stage&amp;gt; deadline reminder email' do&lt;br /&gt;
       it 'is able to send reminder email for &amp;lt;stage&amp;gt; deadline to &amp;lt;stage_users&amp;gt; ' do&lt;br /&gt;
         enqueue_delayed_job(stage)&lt;br /&gt;
         expect(Delayed::Job.count).to eq(1)&lt;br /&gt;
         expect(Delayed::Job.last.handler).to include(&amp;quot;deadline_type: &amp;lt;stage&amp;gt;&amp;quot;)&lt;br /&gt;
       end&lt;br /&gt;
    end&lt;br /&gt;
&lt;br /&gt;
'''Refactoring Scheduled_task_spec Method''' &lt;br /&gt;
&lt;br /&gt;
Scheduled_task_spec method covers testing scenarios with scheduling for the deadline reminder feature targeting various users and tasks.&lt;br /&gt;
This method took time stamps using time_parse function, which would create issues when users change their time zones. Calling time_zone_parse instead of time_parse solves the issue. This method also had massive duplicate code for different test scenarios. A helper method enqueue_scheduled_tasks(stage) is used to encapsulate the task being performed.&lt;br /&gt;
&lt;br /&gt;
After refactoring in scheduled_spec.rb:&lt;br /&gt;
&lt;br /&gt;
    def enqueue_scheduled_tasks(stage)&lt;br /&gt;
      #enqueue a delayed job using current stage’s timestamp&lt;br /&gt;
    end&lt;br /&gt;
&lt;br /&gt;
    describe '&amp;lt;stage&amp;gt; deadline reminder email' do&lt;br /&gt;
      it 'is able to send reminder email for &amp;lt;stage&amp;gt; deadline to &amp;lt;stage_users&amp;gt; ' do&lt;br /&gt;
        enqueue_scheduled_tasks(stage)&lt;br /&gt;
        expect(Delayed::Job.count).to eq(1)&lt;br /&gt;
        expect(Delayed::Job.last.handler).to include(&amp;quot;deadline_type: &amp;lt;stage&amp;gt;&amp;quot;)&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
&lt;br /&gt;
'''Refactoring Assignment_creation_spec''' &lt;br /&gt;
&lt;br /&gt;
Assignment_creation_spec covers testing scenarios that create public and private assignments as well as the various options in this assignment creation.&lt;br /&gt;
Using CodeClimate it was identified that a large portion of the code was duplicated across multiple test cases which violates the DRY principle. The redundant code was generalized and placed in methods instead of being written in each of the test cases and redundant methods that were never called were removed from class.&lt;br /&gt;
&lt;br /&gt;
The code below is a sample of the refactored code where instead of having redundant code, handle_questionaire is called with a few parameters and all of the redundant code in the test cases is replaced.&lt;br /&gt;
&lt;br /&gt;
  def validate_attributes(questionaire_name)&lt;br /&gt;
    questionnaire = get_questionnaire(questionaire_name).first&lt;br /&gt;
    expect(questionnaire).to have_attributes(&lt;br /&gt;
      questionnaire_weight: 50,&lt;br /&gt;
      notification_limit: 50&lt;br /&gt;
    )&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
  def validate_dropdown&lt;br /&gt;
    questionnaire = Questionnaire.where(name: &amp;quot;ReviewQuestionnaire2&amp;quot;).first&lt;br /&gt;
    assignment_questionnaire = AssignmentQuestionnaire.where(assignment_id: @assignment.id, questionnaire_id: questionnaire.id).first&lt;br /&gt;
    expect(assignment_questionnaire.dropdown).to eq(false)&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
  def fill_in_questionaire(questionaire_css, questionaire_name)&lt;br /&gt;
    within(:css, questionaire_css) do&lt;br /&gt;
      select questionaire_name, from: 'assignment_form[assignment_questionnaire][][questionnaire_id]'&lt;br /&gt;
      uncheck('dropdown')&lt;br /&gt;
      select &amp;quot;Scale&amp;quot;, from: 'assignment_form[assignment_questionnaire][][dropdown]'&lt;br /&gt;
      fill_in 'assignment_form[assignment_questionnaire][][questionnaire_weight]', with: '50'&lt;br /&gt;
      fill_in 'assignment_form[assignment_questionnaire][][notification_limit]', with: '50'&lt;br /&gt;
    end&lt;br /&gt;
    click_button 'Save'&lt;br /&gt;
    sleep 1&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
  def handle_questionaire(questionaire_css, questionaire_name, test_attributes)&lt;br /&gt;
    fill_in_questionaire(questionaire_css, questionaire_name)&lt;br /&gt;
    if test_attributes&lt;br /&gt;
      validate_attributes(questionaire_name)&lt;br /&gt;
    else&lt;br /&gt;
      validate_dropdown&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Refactoring Instructor_interface_spec''' &lt;br /&gt;
&lt;br /&gt;
Instructor_interface_spec covers testing scenarios like creating a course, importing tests, and viewing publishing rights.&lt;br /&gt;
Unlike assignment_creation_spec, the largest violation (as determined by CodeClimate) of the DRY principle was functionality that was exactly duplicated in questionnaire_spec. In order to fix this /spec/helpers/instructor_interface_helper_spec was created as a module and then included in both of the other Ruby files as a mixin.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Refactoring questionnaire_spec'''&lt;br /&gt;
&lt;br /&gt;
The questionnaire_spec covers testing scenarios relating to creating questionnaires and filling them out.  The unique thing about the questionnaire_spec was that it had a lot of instances of similar code, but not identical.  This is because there are many different types of questions and every variation has to be tested.  Aside from the specific question name, the process of testing the editing and deleting of each question type was the same.  In order to make this more generic and repeatable, a new method was created which took the question type as an input and an each operator was used to cycle through each type and to test editing and deletion.  Below is the definition which was created to test each question type for the ability to edit and delete.&lt;br /&gt;
&lt;br /&gt;
  question_type = %w(Criterion Scale Dropdown Checkbox TextArea TextField UploadFile SectionHeader TableHeader ColumnHeader)&lt;br /&gt;
&lt;br /&gt;
  def load_question question_type, verify_button&lt;br /&gt;
    load_questionnaire&lt;br /&gt;
    fill_in('question_total_num', with: '1')&lt;br /&gt;
    select(question_type, from: 'question_type')&lt;br /&gt;
    click_button &amp;quot;Add&amp;quot;&lt;br /&gt;
  &lt;br /&gt;
    expect(page).to have_content('Remove') if verify_button&lt;br /&gt;
  &lt;br /&gt;
    click_button &amp;quot;Save review questionnaire&amp;quot;&lt;br /&gt;
  &lt;br /&gt;
    expect(page).to have_content('All questions has been successfully saved!') if verify_button&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
  def edit_created_question&lt;br /&gt;
    first(&amp;quot;textarea[placeholder='Edit question content here']&amp;quot;).set &amp;quot;Question edit&amp;quot;&lt;br /&gt;
    click_button &amp;quot;Save review questionnaire&amp;quot;&lt;br /&gt;
    expect(page).to have_content('All questions has been successfully saved!')&lt;br /&gt;
    expect(page).to have_content('Question edit')&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
  def check_deleted_question&lt;br /&gt;
    click_on('Remove')&lt;br /&gt;
    expect(page).to have_content('You have successfully deleted the question!')&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
  def choose_check_type command_type&lt;br /&gt;
    if command_type == 'edit'&lt;br /&gt;
      edit_created_question&lt;br /&gt;
    else&lt;br /&gt;
      check_deleted_question&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
  describe &amp;quot;Edit and delete a question&amp;quot; do&lt;br /&gt;
    question_type.each do |q_type|&lt;br /&gt;
      %w(edit delete).each do |q_command|&lt;br /&gt;
        it &amp;quot;is able to &amp;quot; + q_command + &amp;quot; &amp;quot; + q_type + &amp;quot; question&amp;quot; do&lt;br /&gt;
          load_question q_type, false&lt;br /&gt;
          choose_check_type q_command&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
'''Refactoring quiz_spec'''&lt;br /&gt;
The quiz_spec covers the testing of creation and use of quizes for instructors and students.  Similar to the other files which were refactored, this spec had several areas of duplicated code which were extracted and placed into individual definitions.  The interesting thing about updating this module was it's high ABC (Assignment, Branch, Condition) count.  In order to reduce this metric, several of the definitions needed to be split into logical methods to be called by the refactored methods.  Below shows and example where new definitions were made in order to reduce ABC score.&lt;br /&gt;
&lt;br /&gt;
  def fill_in_choices&lt;br /&gt;
    # Fill in for all 4 choices&lt;br /&gt;
    fill_in 'new_choices_1_MultipleChoiceRadio_1_txt', with: 'Test Quiz 1'&lt;br /&gt;
    fill_in 'new_choices_1_MultipleChoiceRadio_2_txt', with: 'Test Quiz 2'&lt;br /&gt;
    fill_in 'new_choices_1_MultipleChoiceRadio_3_txt', with: 'Test Quiz 3'&lt;br /&gt;
    fill_in 'new_choices_1_MultipleChoiceRadio_4_txt', with: 'Test Quiz 4'&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
  def fill_in_quiz&lt;br /&gt;
    # Fill in the form for Name&lt;br /&gt;
    fill_in 'questionnaire_name', with: 'Quiz for test'&lt;br /&gt;
 &lt;br /&gt;
    # Fill in the form for Question 1&lt;br /&gt;
    fill_in 'text_area', with: 'Test Question 1'&lt;br /&gt;
 &lt;br /&gt;
    # Choose the quiz to be a single choice question&lt;br /&gt;
    page.choose('question_type_1_type_multiplechoiceradio')&lt;br /&gt;
 &lt;br /&gt;
    fill_in_choices&lt;br /&gt;
 &lt;br /&gt;
    # Choose the first one to be the correct answer&lt;br /&gt;
    page.choose('new_choices_1_MultipleChoiceRadio_1_iscorrect_1')&lt;br /&gt;
 &lt;br /&gt;
    # Save quiz&lt;br /&gt;
    click_on 'Create Quiz'&lt;br /&gt;
  end&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83939</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83939"/>
		<updated>2014-03-04T02:28:45Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Introduction to Linked-List Parallel Programming */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
# Copy the row 1 array to row 2.&lt;br /&gt;
# Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
# Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
# Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
# Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
# Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
# This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
# Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One algorithm that pointer doubling can be used for is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this algorithm in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
The advantages of pointer doubling aren't just for single linked lists. This algorithm can also be used for finding the roots of trees, which we will talk more about in the next section.  The way in which this algorithm is used, is similar to the previous example.  Each node will point to the next node until it reaches the end.  The result is that every node will have a pointer to the root of the tree.  Below is an example of this process.[[#References|&amp;lt;sup&amp;gt;[24]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
[[File:Pointer_Jumping_Example.png]]&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
In this section, we will begin by showing different serial tree traversal algorithm using the tree shown below.[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] The four ordering algorithms that we will cover are pre-order, in-order, post-order, and level order.  &lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
Now that we have seen what a standard parallel tree traversal is, we will look at how trees can be parallelized. In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the parent node to all the subtrees.  Mathematically speaking, for a tree divided among 'n'&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold 'n – 1' nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree. While the processors only need to synchronize once, at &lt;br /&gt;
the end, the parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  As the degree of parallelism is increased, the speed up increases as per Amdahl's law. One thing to remember in the parllelization though, is that the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;br /&gt;
#http://en.wikipedia.org/wiki/Pointer_jumping&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83938</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83938"/>
		<updated>2014-03-04T02:28:11Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Introduction to Linked-List Parallel Programming */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
# Copy the row 1 array to row 2.&lt;br /&gt;
# Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
# Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
# Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
# Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
# Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
# This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
# Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One algorithm that pointer doubling can be used for is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this algorithm in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
The advantages of pointer doubling aren't just for single linked lists. This algorithm can also be used for finding the roots of trees, which we will talk more about in the next section.  The way in which this algorithm is used, is similar to the previous example.  Each node will point to the next node until it reaches the end.  The result is that every node will have a pointer to the root of the tree.  Below is an example of this process.[[#References]&amp;lt;sup&amp;gt;[24]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
[[File:Pointer_Jumping_Example.png]]&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
In this section, we will begin by showing different serial tree traversal algorithm using the tree shown below.[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] The four ordering algorithms that we will cover are pre-order, in-order, post-order, and level order.  &lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
Now that we have seen what a standard parallel tree traversal is, we will look at how trees can be parallelized. In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the parent node to all the subtrees.  Mathematically speaking, for a tree divided among 'n'&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold 'n – 1' nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree. While the processors only need to synchronize once, at &lt;br /&gt;
the end, the parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  As the degree of parallelism is increased, the speed up increases as per Amdahl's law. One thing to remember in the parllelization though, is that the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;br /&gt;
#http://en.wikipedia.org/wiki/Pointer_jumping&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83937</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83937"/>
		<updated>2014-03-04T02:27:43Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
# Copy the row 1 array to row 2.&lt;br /&gt;
# Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
# Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
# Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
# Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
# Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
# This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
# Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One algorithm that pointer doubling can be used for is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this algorithm in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
The advantages of pointer doubling aren't just for single linked lists. This algorithm can also be used for finding the roots of trees, which we will talk more about in the next section.  The way in which this algorithm is used, is similar to the previous example.  Each node will point to the next node until it reaches the end.  The result is that every node will have a pointer to the root of the tree.  Below is an example of this process.[[#References]&amp;lt;sup&amp;gt;[24]&amp;lt;/sup]]&lt;br /&gt;
&lt;br /&gt;
[[File:Pointer_Jumping_Example.png]]&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
In this section, we will begin by showing different serial tree traversal algorithm using the tree shown below.[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] The four ordering algorithms that we will cover are pre-order, in-order, post-order, and level order.  &lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
Now that we have seen what a standard parallel tree traversal is, we will look at how trees can be parallelized. In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the parent node to all the subtrees.  Mathematically speaking, for a tree divided among 'n'&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold 'n – 1' nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree. While the processors only need to synchronize once, at &lt;br /&gt;
the end, the parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  As the degree of parallelism is increased, the speed up increases as per Amdahl's law. One thing to remember in the parllelization though, is that the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;br /&gt;
#http://en.wikipedia.org/wiki/Pointer_jumping&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83936</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83936"/>
		<updated>2014-03-04T02:27:19Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Introduction to Linked-List Parallel Programming */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
# Copy the row 1 array to row 2.&lt;br /&gt;
# Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
# Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
# Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
# Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
# Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
# This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
# Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One algorithm that pointer doubling can be used for is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this algorithm in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
The advantages of pointer doubling aren't just for single linked lists. This algorithm can also be used for finding the roots of trees, which we will talk more about in the next section.  The way in which this algorithm is used, is similar to the previous example.  Each node will point to the next node until it reaches the end.  The result is that every node will have a pointer to the root of the tree.  Below is an example of this process.[[#References]&amp;lt;sup&amp;gt;[24]&amp;lt;/sup]]&lt;br /&gt;
&lt;br /&gt;
[[File:Pointer_Jumping_Example.png]]&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
In this section, we will begin by showing different serial tree traversal algorithm using the tree shown below.[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] The four ordering algorithms that we will cover are pre-order, in-order, post-order, and level order.  &lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
Now that we have seen what a standard parallel tree traversal is, we will look at how trees can be parallelized. In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the parent node to all the subtrees.  Mathematically speaking, for a tree divided among 'n'&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold 'n – 1' nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree. While the processors only need to synchronize once, at &lt;br /&gt;
the end, the parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  As the degree of parallelism is increased, the speed up increases as per Amdahl's law. One thing to remember in the parllelization though, is that the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=File:Pointer_Jumping_Example.png&amp;diff=83935</id>
		<title>File:Pointer Jumping Example.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=File:Pointer_Jumping_Example.png&amp;diff=83935"/>
		<updated>2014-03-04T02:26:39Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83934</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83934"/>
		<updated>2014-03-04T02:23:53Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Introduction to Linked-List Parallel Programming */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
# Copy the row 1 array to row 2.&lt;br /&gt;
# Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
# Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
# Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
# Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
# Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
# This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
# Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One algorithm that pointer doubling can be used for is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this algorithm in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
The advantages of pointer doubling aren't just for single linked lists. This algorithm can also be used for finding the roots of trees, which we will talk more about in the next section.  The way in which this algorithm is used, is similar to the previous example.  Each node will point to the next node until it reaches the end.  The result is that every node will have a pointer to the root of the tree.  Below is an example of this process.[[#References]&amp;lt;sup&amp;gt;[22]&amp;lt;/sup]]&lt;br /&gt;
&lt;br /&gt;
[[File:Pointer_Jumping_Example.svg]]&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
In this section, we will begin by showing different serial tree traversal algorithm using the tree shown below.[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] The four ordering algorithms that we will cover are pre-order, in-order, post-order, and level order.  &lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
Now that we have seen what a standard parallel tree traversal is, we will look at how trees can be parallelized. In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the parent node to all the subtrees.  Mathematically speaking, for a tree divided among 'n'&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold 'n – 1' nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree. While the processors only need to synchronize once, at &lt;br /&gt;
the end, the parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  As the degree of parallelism is increased, the speed up increases as per Amdahl's law. One thing to remember in the parllelization though, is that the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83933</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83933"/>
		<updated>2014-03-04T02:07:55Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Introduction to Linked-List Parallel Programming */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One algorithm that pointer doubling can be used for is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this algorithm in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
In this section, we will begin by showing different serial tree traversal algorithm using the tree shown below.[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] The four ordering algorithms that we will cover are pre-order, in-order, post-order, and level order.  &lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
Now that we have seen what a standard parallel tree traversal is, we will look at how trees can be parallelized. In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the parent node to all the subtrees.  Mathematically speaking, for a tree divided among 'n'&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold 'n – 1' nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree. While the processors only need to synchronize once, at &lt;br /&gt;
the end, the parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  As the degree of parallelism is increased, the speed up increases as per Amdahl's law. One thing to remember in the parllelization though, is that the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83932</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83932"/>
		<updated>2014-03-04T02:00:02Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Parallel Solution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses of pointer doubling is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
In this section, we will begin by showing different serial tree traversal algorithm using the tree shown below.[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] The four ordering algorithms that we will cover are pre-order, in-order, post-order, and level order.  &lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
Now that we have seen what a standard parallel tree traversal is, we will look at how trees can be parallelized. In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the parent node to all the subtrees.  Mathematically speaking, for a tree divided among 'n'&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold 'n – 1' nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree. While the processors only need to synchronize once, at &lt;br /&gt;
the end, the parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  As the degree of parallelism is increased, the speed up increases as per Amdahl's law. One thing to remember in the parllelization though, is that the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83931</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83931"/>
		<updated>2014-03-04T01:54:19Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Serial Code Example */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses of pointer doubling is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
In this section, we will begin by showing different serial tree traversal algorithm using the tree shown below.[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] The four ordering algorithms that we will cover are pre-order, in-order, post-order, and level order.  &lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
Now that we have seen what a standard parallel tree traversal is, we will look at how trees can be parallelized. In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the parent node to all the subtrees.  Mathematically speaking, for a tree divided among 'n'&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold 'n – 1' nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree. While the processors only need to synchronize once, at &lt;br /&gt;
the end, the parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83930</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83930"/>
		<updated>2014-03-04T01:51:20Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Parallel Solution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses of pointer doubling is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
In this section, we will begin by showing a serial tree traversal algorithm using the tree shown below.[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
Now that we have seen what a standard parallel tree traversal is, we will look at how trees can be parallelized. In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the parent node to all the subtrees.  Mathematically speaking, for a tree divided among 'n'&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold 'n – 1' nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree. While the processors only need to synchronize once, at &lt;br /&gt;
the end, the parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83929</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83929"/>
		<updated>2014-03-04T01:49:07Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Parallel Solution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses of pointer doubling is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
In this section, we will begin by showing a serial tree traversal algorithm using the tree shown below.[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the parent node to all the subtrees.  Mathematically speaking, for a tree divided among 'n'&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold 'n – 1' nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree. While the processors only need to synchronize once, at &lt;br /&gt;
the end, the parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83928</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83928"/>
		<updated>2014-03-04T01:45:40Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Serial Code Example */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses of pointer doubling is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
In this section, we will begin by showing a serial tree traversal algorithm using the tree shown below.[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83870</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83870"/>
		<updated>2014-03-03T03:20:44Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Parallel Solution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses of pointer doubling is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve complete parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83869</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83869"/>
		<updated>2014-03-03T02:29:46Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Introduction to Linked-List Parallel Programming */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses of pointer doubling is to perform partial sums of a linked list. The way in which this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows an example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83868</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83868"/>
		<updated>2014-03-03T02:28:14Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Introduction to Linked-List Parallel Programming */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses of pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83863</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83863"/>
		<updated>2014-02-27T04:21:40Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
Topic Write-up: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83862</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83862"/>
		<updated>2014-02-27T04:21:25Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
Topic Writeup: http://courses.ncsu.edu/csc506/lec/001/homework/ch5_6.html&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83861</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83861"/>
		<updated>2014-02-27T04:16:22Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Parallel Solution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83860</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83860"/>
		<updated>2014-02-27T03:50:23Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock-based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83859</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83859"/>
		<updated>2014-02-27T03:49:44Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor-to-processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83858</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83858"/>
		<updated>2014-02-27T03:47:25Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Parallel Solution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:for all vertices u at level d in parallel do&lt;br /&gt;
::for all adjacencies v of u in parallel do&lt;br /&gt;
::dv = D[v];&lt;br /&gt;
::if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
:::vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
:::if(vis == 0) // v is added to a stack only once&lt;br /&gt;
::::D[v] = d+1;&lt;br /&gt;
::::pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
::if(dv == d + 1)&lt;br /&gt;
:::fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
:::fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization.&lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83857</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83857"/>
		<updated>2014-02-27T03:45:48Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Breadth First Search - Serial Version */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
public void search(Graph g)&lt;br /&gt;
{&lt;br /&gt;
:g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
:g.mark(false);          // unmark the whole graph&lt;br /&gt;
:refresh(null);          // and redraw it&lt;br /&gt;
:Vertex r = g.root();    // the root is painted grey&lt;br /&gt;
:g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
:java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
:queue.addElement(r);    // and put in a queue&lt;br /&gt;
&lt;br /&gt;
:while(!queue.isEmpty())&lt;br /&gt;
:{&lt;br /&gt;
::Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
::queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
::g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
::int dp = g.degreePlus(u);&lt;br /&gt;
::for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
::{&lt;br /&gt;
:::Vertex v = g.ithSucc(i, u);&lt;br /&gt;
:::if(Color.white == g.color(v))&lt;br /&gt;
:::{		    &lt;br /&gt;
::::queue.addElement(v);		    &lt;br /&gt;
::::g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
:::}&lt;br /&gt;
::}&lt;br /&gt;
::g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
::g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization. &lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83856</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83856"/>
		<updated>2014-02-27T03:34:57Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* Parallel Code Solution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts = new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private void incrementCount(String q) {&lt;br /&gt;
:Integer oldVal, newVal;&lt;br /&gt;
:do {&lt;br /&gt;
::oldVal = queryCounts.get(q);&lt;br /&gt;
::newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
:} while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
:Map m = new ConcurrentHashMap();&lt;br /&gt;
:Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization. &lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83855</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83855"/>
		<updated>2014-02-27T03:32:29Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* HashMap Code with Locking */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&amp;lt;br&amp;gt;&lt;br /&gt;
private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
:Integer cnt = queryCounts.get(q);&lt;br /&gt;
:if (cnt == null) {&lt;br /&gt;
::queryCounts.put(q, 1);&lt;br /&gt;
:} else {&lt;br /&gt;
::queryCounts.put(q, cnt + 1);&lt;br /&gt;
:}&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Map m = Collections.synchronizedMap(new HashMap());&amp;lt;br&amp;gt;&lt;br /&gt;
Set s = m.keySet(); // set of keys in hashmap&amp;lt;br&amp;gt;&lt;br /&gt;
synchronized(m) { // synchronizing on map&lt;br /&gt;
:Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
:while (i.hasNext())&lt;br /&gt;
::foo(i.next());&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization. &lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83854</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83854"/>
		<updated>2014-02-27T03:26:26Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
# The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
# Perform the traversal on the parent part of the tree.&lt;br /&gt;
# Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above, but wait after it finishes one generation.&lt;br /&gt;
# Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
# Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
# Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
::parallel begin&lt;br /&gt;
:::Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent. The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
:::Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
::parallel end&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
:Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
:for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
::parbegin&lt;br /&gt;
:::Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
:::Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
:::/* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
:::hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
:::/* Phase 2. Build partial lists */&lt;br /&gt;
:::Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
::parend;&lt;br /&gt;
:Step 2. Link up partial lists.&lt;br /&gt;
::/* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
::P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
::/* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
::for each of the p blocks do&lt;br /&gt;
:::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for i := 1 to (n/p^2) do&lt;br /&gt;
::::begin&lt;br /&gt;
:::::Pi is given at most a node mi in each iteration.&lt;br /&gt;
:::::if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
:::::then&lt;br /&gt;
::::::if the global list for the level of node mi is empty&lt;br /&gt;
::::::then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
::::::else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
::::::end;&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
:::::if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
:::parend;&lt;br /&gt;
:Step 4. Obtain ranking&lt;br /&gt;
::LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
:Step 5. Output the result&lt;br /&gt;
::for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
:::parbegin&lt;br /&gt;
::::for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
:::parend;&lt;br /&gt;
&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
* pre-order traversal: select the first copy of each node;&lt;br /&gt;
* post-order traversal: select the last copy of each node;&lt;br /&gt;
* in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization. &lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83853</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83853"/>
		<updated>2014-02-27T03:16:22Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Tree_Traversal is&lt;br /&gt;
:type Node;&lt;br /&gt;
:type Node_Access is access Node;&lt;br /&gt;
:type Node is record&lt;br /&gt;
::Left : Node_Access := null;&lt;br /&gt;
::Right : Node_Access := null;&lt;br /&gt;
::Data : Integer;&lt;br /&gt;
:end record;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
::procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Destroy_Tree(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then &lt;br /&gt;
:::Destroy_Tree(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Free(N);&lt;br /&gt;
:end Destroy_Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
::Temp : Node_Access := new Node;&lt;br /&gt;
:begin&lt;br /&gt;
::Temp.Data := Value;&lt;br /&gt;
::Temp.Left := Left;&lt;br /&gt;
::Temp.Right := Right;&lt;br /&gt;
::return Temp;&lt;br /&gt;
:end Tree;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Preorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Preorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Preorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Preorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Inorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Inorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Inorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
:end Inorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Postorder(N : Node_Access) is&lt;br /&gt;
:begin&lt;br /&gt;
::if N.Left /= null then&lt;br /&gt;
:::Postorder(N.Left);&lt;br /&gt;
::end if;&lt;br /&gt;
::if N.Right /= null then&lt;br /&gt;
:::Postorder(N.Right);&lt;br /&gt;
::end if;&lt;br /&gt;
::Put(Integer'Image(N.Data));&lt;br /&gt;
:end Postorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
Procedure Levelorder(N : Node_Access) is&lt;br /&gt;
::package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
::use Queues;&lt;br /&gt;
::Node_Queue : List;&lt;br /&gt;
::Next : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
::Node_Queue.Append(N);&lt;br /&gt;
::while not Is_Empty(Node_Queue) loop&lt;br /&gt;
:::Next := First_Element(Node_Queue);&lt;br /&gt;
:::Delete_First(Node_Queue);&lt;br /&gt;
:::Put(Integer'Image(Next.Data));&lt;br /&gt;
:::if Next.Left /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Left);&lt;br /&gt;
:::end if;&lt;br /&gt;
:::if Next.Right /= null then&lt;br /&gt;
::::Node_Queue.Append(Next.Right);&lt;br /&gt;
:::end if;&lt;br /&gt;
::end loop;&lt;br /&gt;
:end Levelorder;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
N : Node_Access;&lt;br /&gt;
:begin&lt;br /&gt;
:N := Tree(1, &lt;br /&gt;
::Tree(2,&lt;br /&gt;
:::Tree(4,&lt;br /&gt;
::::Tree(7, null, null),&lt;br /&gt;
::::null),&lt;br /&gt;
:::Tree(5, null, null)),&lt;br /&gt;
::Tree(3,&lt;br /&gt;
:::Tree(6,&lt;br /&gt;
::::Tree(8, null, null),&lt;br /&gt;
::::Tree(9, null, null)),&lt;br /&gt;
:::null));&lt;br /&gt;
 &lt;br /&gt;
:Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
:Preorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
:Inorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
:Postorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
:Levelorder(N);&lt;br /&gt;
:New_Line;&lt;br /&gt;
:Destroy_Tree(N);&lt;br /&gt;
:end Tree_traversal;&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
   Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
      parbegin&lt;br /&gt;
         Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
         Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
         /* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
         hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
         /* Phase 2. Build partial lists */&lt;br /&gt;
         Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
      parend;&lt;br /&gt;
   Step 2. Link up partial lists.&lt;br /&gt;
      /* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
      P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
      /* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
      for each of the p blocks do&lt;br /&gt;
         for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
         parbegin&lt;br /&gt;
            for i := 1 to (n/p^2) do&lt;br /&gt;
            begin&lt;br /&gt;
               Pi is given at most a node mi in each iteration.&lt;br /&gt;
               if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
               then&lt;br /&gt;
                  if the global list for the level of node mi is empty&lt;br /&gt;
                  then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
                  else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
                  end;&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
               if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 4. Obtain ranking&lt;br /&gt;
      LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
   Step 5. Output the result&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
         parend;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization. &lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83852</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83852"/>
		<updated>2014-02-27T02:46:10Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
   Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
      parbegin&lt;br /&gt;
         Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
         Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
         /* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
         hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
         /* Phase 2. Build partial lists */&lt;br /&gt;
         Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
      parend;&lt;br /&gt;
   Step 2. Link up partial lists.&lt;br /&gt;
      /* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
      P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
      /* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
      for each of the p blocks do&lt;br /&gt;
         for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
         parbegin&lt;br /&gt;
            for i := 1 to (n/p^2) do&lt;br /&gt;
            begin&lt;br /&gt;
               Pi is given at most a node mi in each iteration.&lt;br /&gt;
               if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
               then&lt;br /&gt;
                  if the global list for the level of node mi is empty&lt;br /&gt;
                  then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
                  else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
                  end;&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
               if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 4. Obtain ranking&lt;br /&gt;
      LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
   Step 5. Output the result&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
         parend;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization. &lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
#P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for shared-memory multiprocessor system,” US Patent number: 6578131, 2003 http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;br /&gt;
#Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83851</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83851"/>
		<updated>2014-02-27T02:44:59Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
   Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
      parbegin&lt;br /&gt;
         Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
         Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
         /* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
         hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
         /* Phase 2. Build partial lists */&lt;br /&gt;
         Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
      parend;&lt;br /&gt;
   Step 2. Link up partial lists.&lt;br /&gt;
      /* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
      P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
      /* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
      for each of the p blocks do&lt;br /&gt;
         for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
         parbegin&lt;br /&gt;
            for i := 1 to (n/p^2) do&lt;br /&gt;
            begin&lt;br /&gt;
               Pi is given at most a node mi in each iteration.&lt;br /&gt;
               if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
               then&lt;br /&gt;
                  if the global list for the level of node mi is empty&lt;br /&gt;
                  then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
                  else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
                  end;&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
               if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 4. Obtain ranking&lt;br /&gt;
      LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
   Step 5. Output the result&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
         parend;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization. &lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
Calvin C.-Y.Chen, Sajal K. Das, &amp;quot;Parallel Breadth-first and Breadth-depth traversals of general trees&amp;quot;, Advances in Computing and Information - ICCP '90, ISBN: 978-3-540-46677-2&lt;br /&gt;
&lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83850</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83850"/>
		<updated>2014-02-27T02:42:54Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
   Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
      parbegin&lt;br /&gt;
         Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
         Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
         /* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
         hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
         /* Phase 2. Build partial lists */&lt;br /&gt;
         Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
      parend;&lt;br /&gt;
   Step 2. Link up partial lists.&lt;br /&gt;
      /* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
      P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
      /* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
      for each of the p blocks do&lt;br /&gt;
         for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
         parbegin&lt;br /&gt;
            for i := 1 to (n/p^2) do&lt;br /&gt;
            begin&lt;br /&gt;
               Pi is given at most a node mi in each iteration.&lt;br /&gt;
               if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
               then&lt;br /&gt;
                  if the global list for the level of node mi is empty&lt;br /&gt;
                  then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
                  else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
                  end;&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
               if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 4. Obtain ranking&lt;br /&gt;
      LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
   Step 5. Output the result&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
         parend;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
As shown, maps are able to be parallelized using very similar traversal methods seen in trees.  We have also highlighted the importance of graphs and their need to be accessed quickly.  Due to this need for access speed, graphs benefit greatly from parallelization. &lt;br /&gt;
&lt;br /&gt;
= Conclusion =&lt;br /&gt;
Through this Wikipedia page we have shown how parallelization can be done for trees, hash tables, and graphs.  While the structures are more complex than the single linked lists outlined in the Solihin text book, their parallelization methods pull heavily from the fundamental locking techniques taught there.  In several cases, the exact same locking techniques are used and it is the LDS which is manipulated to create single linked lists.  In this way we are able to show how these basic principals taught by the text book are able to be expanded and carried into more complex structures and problems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83849</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83849"/>
		<updated>2014-02-27T01:23:43Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
   Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
      parbegin&lt;br /&gt;
         Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
         Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
         /* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
         hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
         /* Phase 2. Build partial lists */&lt;br /&gt;
         Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
      parend;&lt;br /&gt;
   Step 2. Link up partial lists.&lt;br /&gt;
      /* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
      P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
      /* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
      for each of the p blocks do&lt;br /&gt;
         for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
         parbegin&lt;br /&gt;
            for i := 1 to (n/p^2) do&lt;br /&gt;
            begin&lt;br /&gt;
               Pi is given at most a node mi in each iteration.&lt;br /&gt;
               if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
               then&lt;br /&gt;
                  if the global list for the level of node mi is empty&lt;br /&gt;
                  then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
                  else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
                  end;&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
               if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 4. Obtain ranking&lt;br /&gt;
      LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
   Step 5. Output the result&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
         parend;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
Since the tree is transformed into a linked list by the GEN-COMP-NEXT function, it can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.  Since the tree is able to be transformed into a simple linked list, we are able to use the same locking mechanism for multiple linked data structure types.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83847</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83847"/>
		<updated>2014-02-27T01:12:13Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]] &lt;br /&gt;
&lt;br /&gt;
The data structure can be constructed from the input tree using the GEN-COMP-NEXT algorithm. The result is a linked list from the input tree which is represented as a &amp;quot;parent-of&amp;quot; relation with explicit ordering of children.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel where bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
Algorithm BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
   Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
      parbegin&lt;br /&gt;
         Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
         Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
         /* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
         hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
         /* Phase 2. Build partial lists */&lt;br /&gt;
         Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
      parend;&lt;br /&gt;
   Step 2. Link up partial lists.&lt;br /&gt;
      /* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
      P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
      /* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
      for each of the p blocks do&lt;br /&gt;
         for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
         parbegin&lt;br /&gt;
            for i := 1 to (n/p^2) do&lt;br /&gt;
            begin&lt;br /&gt;
               Pi is given at most a node mi in each iteration.&lt;br /&gt;
               if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
               then&lt;br /&gt;
                  if the global list for the level of node mi is empty&lt;br /&gt;
                  then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
                  else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
                  end;&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
               if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 4. Obtain ranking&lt;br /&gt;
      LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
   Step 5. Output the result&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
         parend;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
This linked list can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=File:ExampleTree.png&amp;diff=83736</id>
		<title>File:ExampleTree.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=File:ExampleTree.png&amp;diff=83736"/>
		<updated>2014-02-26T15:55:31Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83735</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83735"/>
		<updated>2014-02-26T15:55:07Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
An example of a parallel Breadth-First tree traversal is shown below.&lt;br /&gt;
&lt;br /&gt;
[[File:ExampleTree.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT for Parallel Tree Traversal:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
Below is the algorithm to perform a Breadth-First tree traversal in parallel wheer bfrank is the output parameter, array[1..n] of integer; level is the input parameter, array[1..n] of integer; and preorder list is an input parameter, array[1..n] of integer.&lt;br /&gt;
&lt;br /&gt;
Algorith BF-TRAVERSAL (bfank, level, preorder-list)&lt;br /&gt;
   Step 1. Divide the preorder-list into p blocks. Each processor builds partial lists from the nodes in the block. The header and the taller arrays for the lists built by processor i are denoted by hd[*,i] and tl[*,i], respectively.&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=p do&lt;br /&gt;
      parbegin&lt;br /&gt;
         Pi works on nodes pre-order-list[k], where (i-1)*(n/p) &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
         Pi initializes list[k] to zreo /* the successor of k-th input in a partial list */&lt;br /&gt;
         /* Phase 1. Pi initializes entries in hd[*,i] and tl[*,i] that are used and entries in hdflag */&lt;br /&gt;
         hd[level[preorder-list[k]], i] := 0; tl[level[preorder-list[k]], i] := 0; hdflag[k] := 0;&lt;br /&gt;
         /* Phase 2. Build partial lists */&lt;br /&gt;
         Pi adds each of the n/p nodes to the partial list for the level of that node and updates hd[*,i], tl[*, i], and list [*] accordingly.&lt;br /&gt;
      parend;&lt;br /&gt;
   Step 2. Link up partial lists.&lt;br /&gt;
      /* Phase 1. Initialize header and tailer for the global lists */&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         initialize head[k] and tail[k] to zero, for (i-1)*n/p &amp;lt;= k &amp;lt; (i*n/p)&lt;br /&gt;
      P1 sets head[n+1] and tail[n+1] to zero;&lt;br /&gt;
      /* Phase 2. Link partial lists to form a list for each level */&lt;br /&gt;
      for each of the p blocks do&lt;br /&gt;
         for all Pi, 1 &amp;lt;= i &amp;lt;= p, do /* all processors work on the same block */&lt;br /&gt;
         parbegin&lt;br /&gt;
            for i := 1 to (n/p^2) do&lt;br /&gt;
            begin&lt;br /&gt;
               Pi is given at most a node mi in each iteration.&lt;br /&gt;
               if hdflag[mi] = 1 /* The first element in its partial list */&lt;br /&gt;
               then&lt;br /&gt;
                  if the global list for the level of node mi is empty&lt;br /&gt;
                  then let head[level[mi]] and tail[level[mi]] point to mi&lt;br /&gt;
                  else list[tail[level[mi]]] := mi and update tail[level[mi]] to be the tail of the partial list for mi.&lt;br /&gt;
                  end;&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 3. Create a linked list to implement the NEXT function&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do&lt;br /&gt;
               if(head[k] != 0 and head[k+1] != 0) then list[tail[k]] := head[k+1];&lt;br /&gt;
         parend;&lt;br /&gt;
   Step 4. Obtain ranking&lt;br /&gt;
      LINKED-LIST-RANKING(list, tmp-rank, n);&lt;br /&gt;
   Step 5. Output the result&lt;br /&gt;
      for all Pi, 1 &amp;lt;= i &amp;lt;= p, do&lt;br /&gt;
         parbegin&lt;br /&gt;
            for k := (i-1)*(n/p)+1 to (i*n/p) do bfrank[preorder-list[k]] := tmp-rank[k];&lt;br /&gt;
         parend;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
This linked list can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83723</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83723"/>
		<updated>2014-02-26T05:28:31Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. This is shown in the figure below.  Since we are using 4 processors, we will only need 3 nodes in common.  This is because one node is capable of having two branches.  If the size of the tree was increased and the number of processors was also increased, the number of shared nodes would also increase to support the increased number of sub-trees.&lt;br /&gt;
&lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT for Parallel Tree Traversal:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
This linked list can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83722</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83722"/>
		<updated>2014-02-26T05:13:23Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. &lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so parallelizable portion approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
code parallelization of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT for Parallel Tree Traversal:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
This linked list can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83721</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83721"/>
		<updated>2014-02-26T03:26:35Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node with the value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums1.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums2.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums3.png]]&lt;br /&gt;
&lt;br /&gt;
[[File:PartialSums4.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. &lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so it approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
speedup of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT for Parallel Tree Traversal:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
This linked list can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=File:PartialSums4.png&amp;diff=83720</id>
		<title>File:PartialSums4.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=File:PartialSums4.png&amp;diff=83720"/>
		<updated>2014-02-26T03:24:53Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=File:PartialSums3.png&amp;diff=83719</id>
		<title>File:PartialSums3.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=File:PartialSums3.png&amp;diff=83719"/>
		<updated>2014-02-26T03:24:46Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=File:PartialSums2.png&amp;diff=83718</id>
		<title>File:PartialSums2.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=File:PartialSums2.png&amp;diff=83718"/>
		<updated>2014-02-26T03:24:39Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=File:PartialSums1.png&amp;diff=83717</id>
		<title>File:PartialSums1.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=File:PartialSums1.png&amp;diff=83717"/>
		<updated>2014-02-26T03:24:21Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83716</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83716"/>
		<updated>2014-02-26T03:21:35Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
One of the uses for pointer doubling is to perform partial sums of a linked list. The way that this is accomplished is by adding the value held by the node wit hthe value stored in the node it is pointing to.  This is repeated until all pointers have reached the end of the list.  The result is a linked list which contains the sum of the node and all preceeding nodes.  Below shows and example of this operation in action.&lt;br /&gt;
&lt;br /&gt;
[[File:linkedlist.gif]]&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. &lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so it approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
speedup of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT for Parallel Tree Traversal:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
This linked list can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83649</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83649"/>
		<updated>2014-02-25T03:46:32Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures (LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
[[File:linkedlist.gif]]&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. &lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so it approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
speedup of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT for Parallel Tree Traversal:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
This linked list can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using its list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search(Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while(!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for(int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if(Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if(dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if(vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if(dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83648</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83648"/>
		<updated>2014-02-25T03:41:18Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures(LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
[[File:linkedlist.gif]]&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. &lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so it approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
speedup of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT for Parallel Tree Traversal:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
This linked list can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using it's list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search (Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while (!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for (int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if (Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if (dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if (vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if (dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83618</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83618"/>
		<updated>2014-02-24T04:47:37Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2014 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures(LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Concurrency Problem in LDS =&lt;br /&gt;
&lt;br /&gt;
Non-serializable outcomes can often occur  while attempting to parallelize LDS in scenarios like :&lt;br /&gt;
&lt;br /&gt;
Parallel execution of two operations that access a common node, in which at least one operation involves  writing to the node, &lt;br /&gt;
can produce conflicts that lead to non- serializable outcome. Conflicts can occur between LDS operations and memory management &lt;br /&gt;
functions such as memory de-allocation and allocation.  &lt;br /&gt;
&lt;br /&gt;
Solihin discusses three approaches using locks to resolve these issues namely :&lt;br /&gt;
&lt;br /&gt;
===Parallelization among Readers=== &lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[19]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
This is achieved by ensuring [http://en.wikipedia.org/wiki/Mutual_exclusion mutual exclusion] between a read-write and a read-only operation, and not between two read-only operations. This is achieved by defining a read lock and a write lock.&lt;br /&gt;
&lt;br /&gt;
'''Lock compatibility : '''	&lt;br /&gt;
                               Read Lock requested	Write Lock requested&lt;br /&gt;
           Read Locked	          Yes	                       No&lt;br /&gt;
           Write Locked	           No	                       No&lt;br /&gt;
&lt;br /&gt;
===Global Lock Approach===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[20]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
A higher degree of parallelism can be obtained by maintaining a single global lock for each LDS. This however can support only 1 thread modifying a list at any given time.&lt;br /&gt;
&lt;br /&gt;
===Fine-Grain Lock Approach===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[20]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
The Fine-Grain lock approach resolves the previous restriction of only sequentially modifying a given list by maintaining locks for each node. Hence, this is a much more tedious approach. The principle here is that, nodes that are modified must be write locked, and nodes that are being read and hence must remain valid are read locked.&lt;br /&gt;
&lt;br /&gt;
In the following sections we describe the problems related to various LDS and some of their possible solutions.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to its neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
[[File:linkedlist.gif]]&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. &lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so it approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
speedup of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT for Parallel Tree Traversal:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
This linked list can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using it's list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all its successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search (Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while (!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for (int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if (Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if (dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if (vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if (dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83599</id>
		<title>CSC/ECE 506 Spring 2014/5a rm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2014/5a_rm&amp;diff=83599"/>
		<updated>2014-02-24T03:28:03Z</updated>

		<summary type="html">&lt;p&gt;Remcelfr: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Chapter 5a CSC/ECE 506 Spring 2013 / Other linked data structures =&lt;br /&gt;
&lt;br /&gt;
Original wiki : http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/5a_ks&lt;br /&gt;
&lt;br /&gt;
= Overview =&lt;br /&gt;
&lt;br /&gt;
Linked Data Structures(LDS) consists of different types of data structures such as [http://en.wikipedia.org/wiki/Linked_list linked lists], [http://en.wikibooks.org/wiki/Data_Structures/Trees trees], [http://en.wikipedia.org/wiki/Hash_table hash tables] and [http://en.wikibooks.org/wiki/Data_Structures/Graphs graphs]. Although each structure is diversed, LDS traversal shares a common characteristic in reading a node and discovering the other nodes it points to. Hence, this often introduces loop carried dependence. Chapter 5 of Solihin, discusses various algorithms on parallelizing LDS using a simple linked list. In this wiki, we attempt to cover other LDS such as trees, hashes and graphs, and how the parallelization algorithms discussed can be applied to these structures. This wiki explores concurrency problems related to each type and possible solutions for parallelizing them.&lt;br /&gt;
&lt;br /&gt;
= Concurrency Problem in LDS =&lt;br /&gt;
&lt;br /&gt;
Non-serializable outcomes can often occur  while attempting to parallelize LDS in scenarios like :&lt;br /&gt;
&lt;br /&gt;
Parallel execution of two operations that access a common node, in which at least one operation involves  writing to the node, &lt;br /&gt;
can produce conflicts that lead to non- serializable outcome. Conflicts can occur between LDS operations and memory management &lt;br /&gt;
functions such as memory de-allocation and allocation.  &lt;br /&gt;
&lt;br /&gt;
Solihin discusses three approaches using locks to resolve these issues namely :&lt;br /&gt;
&lt;br /&gt;
===Parallelization among Readers=== &lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[19]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
This is achieved by ensuring [http://en.wikipedia.org/wiki/Mutual_exclusion mutual exclusion] between a read-write and a read-only operation, and not between two read-only operations. This is achieved by defining a read lock and a write lock.&lt;br /&gt;
&lt;br /&gt;
'''Lock compatibility : '''	&lt;br /&gt;
                               Read Lock requested	Write Lock requested&lt;br /&gt;
           Read Locked	          Yes	                       No&lt;br /&gt;
           Write Locked	           No	                       No&lt;br /&gt;
&lt;br /&gt;
===Global Lock Approach===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[20]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
A higher degree of parallelism can be obtained by maintaining a single global lock for each LDS. This however can support only 1 thread modifying a list at any given time.&lt;br /&gt;
&lt;br /&gt;
===Fine-Grain Lock Approach===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[20]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
The Fine-Grain lock approach resolves the previous restriction of only sequentially modifying a given list by maintaining locks for each node. Hence, this is a much more tedious approach. The principle here is that, nodes that are modified must be write locked, and nodes that are being read and hence must remain valid are read locked.&lt;br /&gt;
&lt;br /&gt;
In the following sections we describe the problems related to various LDS and some of their possible solutions.&lt;br /&gt;
&lt;br /&gt;
= Introduction to Linked-List Parallel Programming =&lt;br /&gt;
&lt;br /&gt;
One component that tends to link together various data structures is their reliance at some level on an internal pointer-based linked list.  For example, hash tables have linked lists to support chained links to a given bucket in order to resolve collisions, trees have linked lists with left and right tree node paths, and graphs have linked lists to determine shortest path algorithms.&lt;br /&gt;
&lt;br /&gt;
But what mechanism allows us to generate parallel algorithms for these structures?  &lt;br /&gt;
&lt;br /&gt;
For an array processing algorithm, a common technique used at the processor level is the copy-scan technique.  This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row.  From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion.  Take the following grid:[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The basic process for copy-scan would be to:&lt;br /&gt;
  Step 1) Copy the row 1 array to row 2.&lt;br /&gt;
  Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.&lt;br /&gt;
  Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).&lt;br /&gt;
&lt;br /&gt;
[[File:CopyScan.gif]]&lt;br /&gt;
&lt;br /&gt;
But how does this same process work in the linked list world?&lt;br /&gt;
&lt;br /&gt;
With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.[[#References|&amp;lt;sup&amp;gt;[1]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  Step 1) Each processor will make a copy of the pointer it holds to it's neighbor.&lt;br /&gt;
  Step 2) Next, each processor will make a pointer to the processor 2 steps away.&lt;br /&gt;
  Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.&lt;br /&gt;
  Step 4) Perform the parallel operations to generate the desired result.&lt;br /&gt;
&lt;br /&gt;
[[File:linkedlist.gif]]&lt;br /&gt;
&lt;br /&gt;
However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism or other parallel technique for [http://en.wikipedia.org/wiki/Critical_section critical sections] in order to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions].  To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.&lt;br /&gt;
&lt;br /&gt;
In this chapter, we will explore 3 linked-list based data structures and the parallelization opportunities as well as the concurrency issues they present: hash tables, trees, and graphs.&lt;br /&gt;
&lt;br /&gt;
== Trees ==&lt;br /&gt;
&lt;br /&gt;
=== Tree Intro ===&lt;br /&gt;
&lt;br /&gt;
A tree data structure [[#References|&amp;lt;sup&amp;gt;[2]&amp;lt;/sup&amp;gt;]] contains a set of ordered nodes with one parent node followed by zero or more child nodes.  Typically this tree structure is used with searching or sorting algorithms to achieve log(n) efficiencies.  Assuming you have a balanced tree, or a relatively equal set of nodes under each branching structure of the tree, and assuming a proper ordering structure, searches/inserts/deletes should occur far more quickly than having to traverse an entire list.&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
One potential slowdown in a tree data structure could occur during the traversal process.  Even though search/update/insert can occur in a logarithmic fashion, traversal operations such as in-order, pre-order, post-order traversals can still require a full sequence of the list to generate all output.  This gives an opportunity to generate parallel code by having various portions of the traversal occur on different processors.&lt;br /&gt;
&lt;br /&gt;
=== Serial Code Example ===&lt;br /&gt;
&lt;br /&gt;
Below is a code[[#References|&amp;lt;sup&amp;gt;[8]&amp;lt;/sup&amp;gt;]] for serial tree traversal algorithms&lt;br /&gt;
with behavior as the figure below shows:&lt;br /&gt;
&lt;br /&gt;
[[File: tree.PNG]]&lt;br /&gt;
&lt;br /&gt;
procedure Tree_Traversal is&lt;br /&gt;
   type Node;&lt;br /&gt;
   type Node_Access is access Node;&lt;br /&gt;
   type Node is record&lt;br /&gt;
      Left : Node_Access := null;&lt;br /&gt;
      Right : Node_Access := null;&lt;br /&gt;
      Data : Integer;&lt;br /&gt;
   end record;&lt;br /&gt;
&lt;br /&gt;
   procedure Destroy_Tree(N : in out Node_Access) is&lt;br /&gt;
      procedure free is new Ada.Unchecked_Deallocation(Node, Node_Access);&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Destroy_Tree(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then &lt;br /&gt;
         Destroy_Tree(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Free(N);&lt;br /&gt;
   end Destroy_Tree;&lt;br /&gt;
&lt;br /&gt;
   function Tree(Value : Integer; Left : Node_Access; Right : Node_Access) return Node_Access is&lt;br /&gt;
      Temp : Node_Access := new Node;&lt;br /&gt;
   begin&lt;br /&gt;
      Temp.Data := Value;&lt;br /&gt;
      Temp.Left := Left;&lt;br /&gt;
      Temp.Right := Right;&lt;br /&gt;
      return Temp;&lt;br /&gt;
   end Tree;&lt;br /&gt;
&lt;br /&gt;
   procedure Preorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Preorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Preorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Preorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Inorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Inorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Inorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
   end Inorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Postorder(N : Node_Access) is&lt;br /&gt;
   begin&lt;br /&gt;
      if N.Left /= null then&lt;br /&gt;
         Postorder(N.Left);&lt;br /&gt;
      end if;&lt;br /&gt;
      if N.Right /= null then&lt;br /&gt;
         Postorder(N.Right);&lt;br /&gt;
      end if;&lt;br /&gt;
      Put(Integer'Image(N.Data));&lt;br /&gt;
   end Postorder;&lt;br /&gt;
&lt;br /&gt;
   procedure Levelorder(N : Node_Access) is&lt;br /&gt;
      package Queues is new Ada.Containers.Doubly_Linked_Lists(Node_Access);&lt;br /&gt;
      use Queues;&lt;br /&gt;
      Node_Queue : List;&lt;br /&gt;
      Next : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
      Node_Queue.Append(N);&lt;br /&gt;
      while not Is_Empty(Node_Queue) loop&lt;br /&gt;
         Next := First_Element(Node_Queue);&lt;br /&gt;
         Delete_First(Node_Queue);&lt;br /&gt;
         Put(Integer'Image(Next.Data));&lt;br /&gt;
         if Next.Left /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Left);&lt;br /&gt;
         end if;&lt;br /&gt;
         if Next.Right /= null then&lt;br /&gt;
            Node_Queue.Append(Next.Right);&lt;br /&gt;
         end if;&lt;br /&gt;
      end loop;&lt;br /&gt;
   end Levelorder;&lt;br /&gt;
&lt;br /&gt;
   N : Node_Access;&lt;br /&gt;
   begin&lt;br /&gt;
   N := Tree(1, &lt;br /&gt;
      Tree(2,&lt;br /&gt;
         Tree(4,&lt;br /&gt;
            Tree(7, null, null),&lt;br /&gt;
            null),&lt;br /&gt;
         Tree(5, null, null)),&lt;br /&gt;
      Tree(3,&lt;br /&gt;
         Tree(6,&lt;br /&gt;
            Tree(8, null, null),&lt;br /&gt;
            Tree(9, null, null)),&lt;br /&gt;
         null));&lt;br /&gt;
 &lt;br /&gt;
   Put(&amp;quot;preorder:    &amp;quot;);&lt;br /&gt;
   Preorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;inorder:     &amp;quot;);&lt;br /&gt;
   Inorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;postorder:   &amp;quot;);&lt;br /&gt;
   Postorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Put(&amp;quot;level order: &amp;quot;);&lt;br /&gt;
   Levelorder(N);&lt;br /&gt;
   New_Line;&lt;br /&gt;
   Destroy_Tree(N);&lt;br /&gt;
   end Tree_traversal;&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[21]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
In many ways, a tree is the perfect candidate for parallelism.  In a tree, each &lt;br /&gt;
node/subtree is independent.  As a result, we can split up a large tree into 2, 4, 8, or &lt;br /&gt;
more subtrees and hold subtree on each processor.  Then, the only duplicated data &lt;br /&gt;
that must be kept on all processors is the tiny tip of the tree that is the parent of all &lt;br /&gt;
of the individual subtrees.  Mathematically speaking, for a tree divided among n&lt;br /&gt;
processors (where n is a power of two), the processors only need to hold n – 1 nodes &lt;br /&gt;
in common – no matter how big the tree itself is. &lt;br /&gt;
The fact that trees are comprised of independent sub-trees makes parallelizing them &lt;br /&gt;
very easy.  Properly done, the portion of these traversals that is parallelizable grows &lt;br /&gt;
at 2n for an n-generation tree, while the processors only need to synchronize once, at &lt;br /&gt;
the end, so it approaches 100% for large trees (but keep in mind [http://en.wikipedia.org/wiki/Amdahl's_law Amdahl’s Law], &lt;br /&gt;
footnote 1).  The basic steps for parallelizing these traversals are as follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the appropriate C algorithm detailed above.&lt;br /&gt;
   3. The processor will return its result that can be used exactly as if it was a serial processor.&lt;br /&gt;
&lt;br /&gt;
[[File:parallel_tree.png]][[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
A [http://www.cs.bu.edu/teaching/c/tree/breadth-first/ Breadth-First traversal] is somewhat more complicated to implement as a &lt;br /&gt;
parallel system because at each level, it must access nodes from all of the parallel &lt;br /&gt;
processors.  Theoretically, a Breadth-First traversal can achieve the same 100% &lt;br /&gt;
speedup of Pre-, In-, and Post-Order traversals.  However, the amount of processor to&lt;br /&gt;
processor data transmission adds in a greater potential for delays, thus slowing &lt;br /&gt;
down the algorithm.  Nevertheless, as the size of the tree increases the size of the &lt;br /&gt;
generations grows at the rate of 2n while the number of synchronizations grows at a &lt;br /&gt;
rate of n for an n-generation tree, so the parallelizable portion of these traversals &lt;br /&gt;
also approaches 100%.  The basic steps for parallelizing this traversal are as &lt;br /&gt;
follows:&lt;br /&gt;
&lt;br /&gt;
   1. Perform the traversal on the parent part of the tree.&lt;br /&gt;
   2. Whenever you get to a node that is only present on one processor, ask that processor to execute the Breadth-First C algorithm detailed above,&lt;br /&gt;
      but wait after it finishes one generation.&lt;br /&gt;
   3. Combine all the one-generation results from the different processors in the correct order.&lt;br /&gt;
   4. Allow each processor to execute the next generation of the Breadth-First C algorithm detailed above, and then wait again.&lt;br /&gt;
   5. Repeat Steps 3 and 4 until there are no nodes remaining&lt;br /&gt;
[[#References|&amp;lt;sup&amp;gt;[18]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Here, since each processor is assigned an independent sub-tree, elaborate locks are not required.&lt;br /&gt;
&lt;br /&gt;
Algorithm GEN-COMP-NEXT for Parallel Tree Traversal:&lt;br /&gt;
   for all Pi, 1&amp;lt;=i&amp;lt;=n, do&lt;br /&gt;
     parallel begin&lt;br /&gt;
        Step 1: Processor Pi builds the jth field of i's parent node if i is the jth child of its parent.&lt;br /&gt;
                The jth field (if it is not the last field) is stored in the ith index of array SUPERNODE.&lt;br /&gt;
        Step 2: Processor Pi builds node i's last fiend whose array index is (n+1)&lt;br /&gt;
     parallel end&lt;br /&gt;
&lt;br /&gt;
To obtain the required tree-traversals, the following rules are operated on the linked list produced by algorithm GEN-COMP-NEXT:&lt;br /&gt;
   pre-order traversal: select the first copy of each node;&lt;br /&gt;
   post-order traversal: select the last copy of each node;&lt;br /&gt;
   in-order traversal: delete the first copy of each node if it is not a leaf and delete the last copy of each node if it has more than one child.&lt;br /&gt;
This linked list can be locked while editing as per the LDS chapter in Solihin book. Either a Global lock approach, Fine Grained approach or [http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock Read-Write Locks] can be used.&lt;br /&gt;
In this fashion we have broken up the linked list of the tree into successive parts and imposed a divide-and-conquer technique to complete the traversal.&lt;br /&gt;
&lt;br /&gt;
== Hash Tables ==&lt;br /&gt;
'''Hash Table Intro '''&lt;br /&gt;
&lt;br /&gt;
Hash tables[[#References|&amp;lt;sup&amp;gt;[4]&amp;lt;/sup&amp;gt;]] are very efficient data structures often used in searching algorithms for fast lookup operations. They are used extensively in data processing as in involves a vast amount of data through the hash table using as few indirection's in the storage structure as possible.   &lt;br /&gt;
&lt;br /&gt;
A single hash table level look can easily become a bottleneck, thus several method were developed to overcome this difficulty. Hash tables contain a series of &amp;quot;buckets&amp;quot; that function like indexes into an array, each of which can be accessed directly using their key value.  The bucket for which a piece of data will be placed is determine by a special hashing function.&lt;br /&gt;
&lt;br /&gt;
The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index.  With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.&lt;br /&gt;
&lt;br /&gt;
In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value.  One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket.  The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list. [[#References|&amp;lt;sup&amp;gt;[7]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
There are several parallel implementations of hash tables available that use lock based synchronization. Larson et al. use two lock levels, there is one global table level lock, and there is one separate lightweight lock (a flag) for each bucket. The high level lock is just used for setting the bucket level flags and released right afterwards. This ensures a fine grained mutual exclusion (concurrent operations on bucket level), but needs only one real lock for the implementation. &lt;br /&gt;
&lt;br /&gt;
A scalable hash table for shared memory multi-processor (SMP) supports very high rates of concurrent operations (e.g., insert, delete, &lt;br /&gt;
and lookup), while simultaneously reducing cache misses. The SMP system has a memory subsystem and a processor subsystem interconnected &lt;br /&gt;
via a bus structure. &lt;br /&gt;
&lt;br /&gt;
The hash table is stored in the memory subsystem to facilitate access to data items. The hash table is segmented into multiple buckets, &lt;br /&gt;
with each bucket containing a reference to a linked list of bucket nodes that hold references to data items with keys that hash to a common value. Individual bucket nodes contain multiple signature-pointer pairs that reference corresponding data items.&lt;br /&gt;
Each signature-pointer pair has a hash signature computed from a key of the data item and a pointer to the data item. The first bucket &lt;br /&gt;
node in the linked list for each of the buckets is stored in the hash table.&lt;br /&gt;
&lt;br /&gt;
To enable multithread access, while serializing operation of the table, the SMP system utilizes two levels of locks: a table lock and&lt;br /&gt;
multiple bucket locks. The table lock allows access by a single processing thread to the table while blocking access for other processing&lt;br /&gt;
threads. The table lock is held just long enough for the thread to acquire the bucket lock of a particular bucket node. Once the table lock&lt;br /&gt;
is released, another thread can access the hash table and any one of the other buckets.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:315px-Hash table 3 1 1 0 1 0 0 SP.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Hash tables can be very well suited to parallel applications.  For example, system code responsible for caching between multiple processors could itself be an ideal opportunity for a shared hashmap.  Each processor sharing one common cache would be able to access the relevant information all in one location.&lt;br /&gt;
&lt;br /&gt;
This would, however, involve a good bit of synchronization, as each processor would need to wait in case a lock was being placed on a specific bucket in the cache hashmap.  Unfortunately, traditional locking would be a bad solution to this problem as processors need to run very quickly.  Having to wait for locks would destroy the application processing time.  The need for a non-locking solution is critical to performance.&lt;br /&gt;
&lt;br /&gt;
In Java, the standard class utilized for hashing is the HashMap[[#References|&amp;lt;sup&amp;gt;[17]&amp;lt;/sup&amp;gt;]] class.  This class has a fundamental weakness though in that the entire map requires synchronization prior to each access.  This causes a lot of contention and many bottlenecks on a parallel machine.&lt;br /&gt;
&lt;br /&gt;
Below, I will present a Java-based solution to this problem by using a ConcurrentHashMap class.  This class only requires a portion of the map to be locked and reads can generally occur with no locking whatsoever.&lt;br /&gt;
&lt;br /&gt;
=== HashMap Code with Locking ===&lt;br /&gt;
&lt;br /&gt;
'''  Simple synchronized example to increment a counter.'''[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
  private Map&amp;lt;String,Integer&amp;gt; queryCounts = new HashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private '''synchronized''' void incrementCount(String q) {&lt;br /&gt;
    Integer cnt = queryCounts.get(q);&lt;br /&gt;
    if (cnt == null) {&lt;br /&gt;
      queryCounts.put(q, 1);&lt;br /&gt;
    } else {&lt;br /&gt;
      queryCounts.put(q, cnt + 1);&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code was written using an ordinary HashMap data structure.  Notice that we use the synchronized keyword here to signify that only one thread can enter this function at any one point in time.  With a really large number of threads, however, waiting to enter the synchronized operation could be a major bottleneck.&lt;br /&gt;
&lt;br /&gt;
'''  Iterator example for synchronized HashMap.'''&lt;br /&gt;
&lt;br /&gt;
  Map m = Collections.synchronizedMap(new HashMap());&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  synchronized(m) { // synchronizing on map&lt;br /&gt;
    Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
    while (i.hasNext())&lt;br /&gt;
      foo(i.next());&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
In the above example, we show how an iterator could be used to traverse over a map.  In this case, we would need to utilize the synchronizedMap function available in the Collections interface.  Also, as you may notice, once the iterator code begins we must actually synchronize on the entire map in order to iterate through the results.  But what if several processors wish to iterate through at the same time?&lt;br /&gt;
&lt;br /&gt;
=== Parallel Code Solution ===&lt;br /&gt;
&lt;br /&gt;
The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately.  However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization.  Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.&lt;br /&gt;
&lt;br /&gt;
One such example in Java is the ConcurrentHashMap[[#References|&amp;lt;sup&amp;gt;[9]&amp;lt;/sup&amp;gt;]], which acts as a synchronized version of the [http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html HashMap].  With this structure, there is full concurrency of retrievals and adjustable expected concurrency for updates.  There is, however, no locking in this data structure and retrievals will typically run in parallel along with updates/deletes.  Retrievals, however, will receive all most recently completed transactions even if it cannot get the values that haven't finished being updated.  This both allows for efficiency and greater concurrency.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Counter Increment Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  private ConcurrentMap&amp;lt;String,Integer&amp;gt; queryCounts =&lt;br /&gt;
    new ConcurrentHashMap&amp;lt;String,Integer&amp;gt;(1000);&lt;br /&gt;
  private void incrementCount(String q) {&lt;br /&gt;
    Integer oldVal, newVal;&lt;br /&gt;
    do {&lt;br /&gt;
      oldVal = queryCounts.get(q);&lt;br /&gt;
      newVal = (oldVal == null) ? 1 : (oldVal + 1);&lt;br /&gt;
    } while (!queryCounts.replace(q, oldVal, newVal));&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
The above code snippet represents an alternative to the serial option presented in the previous section, while also avoiding much of the locking that takes place using the synchronized functions or synchronized blocks.  With ConcurrentHashMap, however, notice that we must implement some new code in order to handle the fact that a variety of inserts/updates could be running at the same time.  The replace() function here acts much like a compare-and-set operation typically used with concurrent code.  Basically, the value would be changed only if not equal to the previously mapped value.  This is much more efficient that locking the entire function as we often do not expect unequal values.&lt;br /&gt;
&lt;br /&gt;
'''Parallel Traversal Alternative:'''&lt;br /&gt;
&lt;br /&gt;
  Map m = new ConcurrentHashMap();&lt;br /&gt;
  Set s = m.keySet(); // set of keys in hashmap&lt;br /&gt;
  Iterator i = s.iterator(); // Must be in synchronized block&lt;br /&gt;
  while (i.hasNext())&lt;br /&gt;
    foo(i.next());&lt;br /&gt;
&lt;br /&gt;
In the case of a traversal, recall that ConcurrentHashMaps require to locking on read operations.  Thus we can actually remove the synchronized condition here and iterate in a normal fashion.&lt;br /&gt;
&lt;br /&gt;
== Graphs ==&lt;br /&gt;
=== Graph Intro ===&lt;br /&gt;
&lt;br /&gt;
A graph data structure[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] is another type of linked-list structure that focuses on data relationships and the most efficient ways to traverse from one node to another.  For example, in a networking application, one network node may have connections to a variety of other network nodes.  These nodes then also link to a variety of other nodes in the network.  Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain.  This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.&lt;br /&gt;
&lt;br /&gt;
[[File:250px-6n-graf.svg.png]]&lt;br /&gt;
&lt;br /&gt;
=== Opportunities for Parallelization ===&lt;br /&gt;
&lt;br /&gt;
Graphs[[#References|&amp;lt;sup&amp;gt;[10]&amp;lt;/sup&amp;gt;]] consist of a finite set of ordered pairs called edges or arcs, of certain entities called nodes or vertices.  From one given vertex, one would typically want to order the different paths from one vertex to another using it's list of edges or, more than likely, would be interested in the fastest means of getting from one of these vertexes to some sort of destination vertex.&lt;br /&gt;
&lt;br /&gt;
Graph nodes typically will keep their list of edges in a linked list.  Also, when attempting to create a shortest path algorithm on the fly, the graph will typically use a combination of a linked list to represent the path as it's being built, along with a queue that is used for each step of that process.  Synchronizing all of these can be a major challenge.&lt;br /&gt;
&lt;br /&gt;
Much like the hash table, graphs cannot afford to be slow and must often generate results in a very efficient manner.  Having to lock on each list of edges or locking on a shortest path list would really be a major obstacle.&lt;br /&gt;
&lt;br /&gt;
Certainly though, the need for parallel processing becomes critical when you consider, for example, that social networking has become such a major proponent of graph algorithms.  Facebook now has roughly a billion users[[#References|&amp;lt;sup&amp;gt;[15]&amp;lt;/sup&amp;gt;]] and each user has series of friend links that must be analyzed and examined.  This list just keeps growing and growing.&lt;br /&gt;
&lt;br /&gt;
One of the most significant opportunities for a parallel algorithm with a graph data structure is with the traversal algorithms.  We can use Breadth-First search[[#References|&amp;lt;sup&amp;gt;[16]&amp;lt;/sup&amp;gt;]] as an example of this, starting from an initial node and expanding outwards until reaching the destination node.&lt;br /&gt;
&lt;br /&gt;
=== Breadth First Search - Serial Version ===&lt;br /&gt;
&lt;br /&gt;
The following shows a sample of a bread first algorithm which traverses from the city of Frankfurt to Augsburg and Stuttgart Germany.  In does so, the graph begins at a root node (Frankfurt) and expands outwardly to all connected nodes on each step.  From there, each of those nodes proceeds to expand the search outwards until all nodes have been covered.&lt;br /&gt;
&lt;br /&gt;
[[File:GermanyBFS.png]][[#References|&amp;lt;sup&amp;gt;[13]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following code snippet implements a BFS search function.  This function utilized coloring schemes and marks to denote that a node has been visited.  It begins with an initial vertex in a queue and expands outward to all it's successors until no further elements remain unmarked.[[#References|&amp;lt;sup&amp;gt;[12]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
  public void search (Graph g)&lt;br /&gt;
  {&lt;br /&gt;
    g.paint(Color.white);   // paint all the graph vertices with white&lt;br /&gt;
    g.mark(false);          // unmark the whole graph&lt;br /&gt;
    refresh(null);          // and redraw it&lt;br /&gt;
    Vertex r = g.root();	// the root is painted grey&lt;br /&gt;
    g.paint(r, Color.gray);       refresh(g.box(r));&lt;br /&gt;
    java.util.Vector queue = new java.util.Vector();	&lt;br /&gt;
    queue.addElement(r);	// and put in a queue&lt;br /&gt;
    while (!queue.isEmpty())&lt;br /&gt;
    {&lt;br /&gt;
      Vertex u = (Vertex) queue.firstElement();&lt;br /&gt;
      queue.removeElement(u); // extract a vertex from the queue&lt;br /&gt;
      g.mark(u, true);          refresh(g.box(u));&lt;br /&gt;
      int dp = g.degreePlus(u);&lt;br /&gt;
      for (int i = 0; i &amp;lt; dp; i++) // look at its successors&lt;br /&gt;
      {&lt;br /&gt;
        Vertex v = g.ithSucc(i, u);&lt;br /&gt;
        if (Color.white == g.color(v))&lt;br /&gt;
        {		    &lt;br /&gt;
          queue.addElement(v);		    &lt;br /&gt;
          g.paint(v, Color.gray);   refresh(g.box(v));&lt;br /&gt;
        }&lt;br /&gt;
     }&lt;br /&gt;
     g.paint(u, Color.black);  refresh(g.box(u));&lt;br /&gt;
     g.mark(u, false);         refresh(g.box(u));	    &lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
=== Parallel Solution ===&lt;br /&gt;
&lt;br /&gt;
But could we introduce parallel mechanisms into this Breadth First search?  The most logical and effective way, instead of utilizing locks and synchronized regions, is to use data parallel techniques during the traversal.  This can be accomplished by having each node of a given breadth search step be sent to a separate processor.  So, using the above example, instead of having Frankfurt-&amp;gt;Mannheim followed by Frankfurt-&amp;gt;Wurzburg followed by Frankfurt-&amp;gt;Bassel on the same processor, Frankfurt could split out all 3 searches in a parallel fashion onto 3 different processors.  Then, possible some cleanup code would be left at the end to visit any remaining untouched nodes.  In a network routing applications, being able to split up the search for each IP address code would make searched significantly faster that allowing one processor to be a bottleneck.&lt;br /&gt;
&lt;br /&gt;
Using locking pseudocode, you might have an algorithm similar to this:&lt;br /&gt;
&lt;br /&gt;
  for all vertices u at level d in parallel do&lt;br /&gt;
    for all adjacencies v of u in parallel do&lt;br /&gt;
    dv = D[v];&lt;br /&gt;
    if (dv &amp;lt; 0) // v is visited for the first time&lt;br /&gt;
      vis = fetch_and_add(&amp;amp;Visited[v], 1);  '''LOCK'''&lt;br /&gt;
      if (vis == 0) // v is added to a stack only once&lt;br /&gt;
        D[v] = d+1;&lt;br /&gt;
        pS[count++] = v; // Add v to local thread stack&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
    if (dv == d + 1)&lt;br /&gt;
      fetch_and_add(&amp;amp;sigma[v], sigma[u]);  '''LOCK'''&lt;br /&gt;
      fetch_and_add(&amp;amp;Pcount[v], 1); // Add u to predecessor list of v  '''LOCK'''&lt;br /&gt;
&lt;br /&gt;
A much better parallel algorithm is represented in the following pseudocode.  Notice that each of the vertices is sent to a separate processor and send/receive operations will eventually sync up the path information.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-code.PNG]][[#References|&amp;lt;sup&amp;gt;[14]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
The following graph also shows how now each of the regional sets of vertices being search can be added to the path in a parallel fashion.&lt;br /&gt;
&lt;br /&gt;
[[File:Parallel-graph.PNG]][[#References|&amp;lt;sup&amp;gt;[11]&amp;lt;/sup&amp;gt;]]&lt;br /&gt;
&lt;br /&gt;
Every set of vertices in the same distance from the source is assigned to a processor. This set of vertices is called a regional set of vertices. The goal is to find the shortest path connecting each region.&lt;br /&gt;
&lt;br /&gt;
= Quiz =&lt;br /&gt;
&lt;br /&gt;
1. Describe the copy-scan technique.&lt;br /&gt;
&lt;br /&gt;
2. Describe the pointer doubling technique.&lt;br /&gt;
&lt;br /&gt;
3. Which concurrency issues are of the most concern in a tree data structure?&lt;br /&gt;
&lt;br /&gt;
4. What is the alternative to using a copy-scan technique in pointer-based programming?&lt;br /&gt;
&lt;br /&gt;
5. Which concurrency issues are of the most concern with hash table data structures?&lt;br /&gt;
&lt;br /&gt;
6. Which concurrency issues are of the most concern with graph data structures?&lt;br /&gt;
&lt;br /&gt;
7. Why would you not want locking mechanisms in hash tables?&lt;br /&gt;
&lt;br /&gt;
8. What is the nature of the linked list in a tree structure?&lt;br /&gt;
&lt;br /&gt;
9. Describe a parallel alternative in the tree data structure.&lt;br /&gt;
&lt;br /&gt;
10. Describe a parallel alternative in a graph data structure.&lt;br /&gt;
&lt;br /&gt;
= References =&lt;br /&gt;
&lt;br /&gt;
#http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_%28data_structure%29&lt;br /&gt;
#http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf&lt;br /&gt;
#http://www.devjavasoft.org/code/classhashtable.html&lt;br /&gt;
#http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html&lt;br /&gt;
#http://web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java&lt;br /&gt;
#http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg&lt;br /&gt;
#http://rosettacode.org/wiki/Talk:Tree_traversal&lt;br /&gt;
#http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap.shtml&lt;br /&gt;
#http://en.wikipedia.org/wiki/Graph_%28abstract_data_type%29&lt;br /&gt;
#http://www.cc.gatech.edu/~bader/papers/PPoPP12/PPoPP-2012-part2.pdf&lt;br /&gt;
#http://renaud.waldura.com/portfolio/graph-algorithms/classes/graph/BFSearch.java&lt;br /&gt;
#http://en.wikipedia.org/w/index.php?title=File%3AGermanyBFS.svg&lt;br /&gt;
#http://sc05.supercomputing.org/schedule/pdf/pap346.pdf&lt;br /&gt;
#http://www.facebook.com/press/info.php?statistics&lt;br /&gt;
#http://en.wikipedia.org/wiki/Breadth-first_search&lt;br /&gt;
#http://code.wikia.com/wiki/Hashmap&lt;br /&gt;
#http://www.shodor.org/petascale/materials/UPModules/Binary_Tree_Traversal&lt;br /&gt;
#http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&lt;br /&gt;
#http://dl.acm.org/citation.cfm?id=320078&lt;br /&gt;
#http://en.wikipedia.org/wiki/Tree_traversal&lt;br /&gt;
P.-A. Larson, M. R. Krishnan,and G. V. Reilly, “Scaleable hash table for &lt;br /&gt;
shared-memory multiprocessor system,” US Patent number: 6578131, &lt;br /&gt;
2003&lt;br /&gt;
http://ww2.cs.mu.oz.au/~pjs/papers/paralleldp.pdf&lt;/div&gt;</summary>
		<author><name>Remcelfr</name></author>
	</entry>
</feed>