CSC/ECE 506 Spring 2012/ch5a ja

From Expertiza_Wiki
Revision as of 17:55, 27 February 2012 by Japlemmo (talk | contribs)
Jump to navigation Jump to search

Chapter 5a CSC/ECE 506 Spring 2012 / ch5a

An exploration and summary of concurrency issues as it relates to linked-list based data structures such as hash tables, trees, and graphs. This topic examines concurrency problems related to each type and possible solutions to allow for parallelization.

Introduction to Linked-List Parallel Programming

In determining opportunities for parallel programming during the software design process, many programmers will focus exclusively on loop or array structures, rather than also considering that linked-list pointer-based structures could also provide opportunities to run code in parallel across different processors.

For example, one common parallel technique used in an array processing algorithm is the copy-scan technique. This technique involves copying rows of data from one processor to another in a log(n) fashion until all processors have their own copy of that row. From there, you could perform a reduction technique to generate a sum of all the data, all while working in a parallel fashion. Take the following grid:

The basic process for copy-scan would be to:

 Step 1) Copy the row 1 array to row 2.
 Step 2) Copy the row 1 array to row 3, row 2 to row 4, etc on the next run.
 Step 3) Continue in this manner until all rows have been copied in a log(n) fashion.
 Step 4) Perform the parallel operations to generate the desired result (reduction for sum, etc.).

But how does this same process work in the linked list world?

With linked lists, there is a concept called pointer doubling, which works in a very similar manner to copy-scan.

 Step 1) Each processor will make a copy of the pointer it holds to it's neighbor.
 Step 2) Next, each processor will make a pointer to the processor 2 steps away.
 Step 3) This continues in logarithmic fashion until each processor has a pointer to the end of the chain.
 Step 4) Perform the parallel operations to generate the desired result.

However, with linked list programming, similar to array-based programming, it becomes imperative to have some sort of locking mechanism on critical sections to avoid race conditions. To make sure the results are correct, it is important that operations can be serialized appropriately and that data remains current and synchronized.

In this chapter, we will explore 3 linked-list based data structures and the concurrency issues they present: hash tables, trees, and graphs.

Linked Data Structure Conflicts

Insertions

Deletions

Search

Linked Data Structures

Given this background knowledge, we can begin to explore solutions to some specific data structures that also utilized linked lists in some fashion.

Trees

Tree Intro

Serial Code Example

Parallel Code Solution

Hash Tables

Hash Table Intro

Hash tables are very efficient data structures often used in searching algorithms for fast lookup operations. Hash tables contain a series of "buckets" that function like indexes into an array, each of which can be accessed directly using their key value. The bucket for which a piece of data will be placed is determine by a special hashing function.

The major advantage of a hash table is that lookup times are essentially a constant value, much like an array with a known index. With a proper hashing function in place, it should be fairly rare that any 2 keys would generate the same value.

In the case that 2 keys do map to the same position, there is a conflict that must be dealt with in some fashion to obtain the correct value. One way that is relevant to linked list structures is to have a chained hash table in which a linked list is created with all values that have been placed in that particular bucket. The developer would have to not only take into account the proper bucket for the data being searched for, but also must considered the chained linked list.

Serial Code Example

 /**
  *  Put a key/value pair into the table by hashing the key to get the array
  *  index of the chain that should hold the node containing the pair.
  *  If an existing node with the same key is present then overwrite its value,
  *  otherwise add a new node.
  *
  *@param  key  key object reference
  *@param  val  data object reference
  */
 public void put(Object key, Object val)
 {
   int index = getIndex(key);
   Node node = find(key, data[index]);
   // Use helper method to determine if node with same key is already in
   // chain. If not, add a new node to head of chain.
   if (node == null)
   {
     data[index] = new Node(data[index], key, val);
   }
   else
   {
     // Otherwise update data value of existing node.
     node.val = val;
   }
 }
 /**
  *  Given a key object, return a data object from the table or
  *  null if it is not found.
  *
  *@param  key  key object reference
  *@return      data object reference or null if no object found
  */
 public Object get(Object key)
 {
   Node temp = find(key, data[getIndex(key)]);
   if (temp != null)
   {
     return temp.val;
   }
   else
   {
     return null;
   }
 }
 /**
  * Remove a key/value pair from table if present, otherwise make no change.
  *
  *@param  key  key object reference
  */
 public void remove(Object key)
 {
   int index = getIndex(key);
   Node ref = data[index];
   Node previous = null;
   while (ref != null)
   {
     if ((ref.key).equals(key))
     {
       if (previous == null)
       {
         data[index] = ref.next;
       }
       else
       {
         previous.next = ref.next;
       }
       return;
     }
     previous = ref;
     ref = ref.next;
   }
 }
 /**
  *  Given a key, use a hash function to obtain the array index of the chain
  *  corresponding to the key.
  *  The hashCode method inherited (and possibly overridden)
  *  from class Object is called to do the hashing, with the returned
  *  value constrained to the hash table array bounds.
  *  The distribution of objects in the hash table will depend
  *  on the quality of the hash function implemented by hashCode.
  *
  *@param  key  reference to object to be hashed
  *@return      array index of chain corresponding to key
  */
 private int getIndex(Object key)
 {
   return Math.abs(key.hashCode() % data.length);
 }
 /**
  *  Find node given a key and a chain to search.
  *
  *@param  key    key object reference
  *@param  ref    node object reference where search will start
  *@return        node object holding key or null if not found
  */
 private Node find(Object key, Node ref)
 {
   while (ref != null)
   {
     if ((ref.key).equals(key))
     {
       return ref;
     }
     ref = ref.next;
   }
   return null;
 }

Parallel Code Solution

The key issue in a hash table in a parallel environment is to make sure any update/insert/delete sequences have been completed properly prior to attempting subsequent operations to make sure the data has been synched appropriately. However, since access speed is such a critical component of the design of a hash table, it is essential to try and avoid using too many locks for performing synchronization. Fortunately, a number of lock-free hash designs have been implemented to avoid this bottleneck.


Graphs

Graph Intro

A graph data structure is another type of linked-list structure that focuses on data that requires linking according to various types of relationships. For example, in a networking application, one network node may have connections to a variety of other network nodes. These nodes then also link to a variety of other nodes in the network. Using this connection of nodes, it would be possible to then find a path from one specific node to another in the chain. This could be accomplished by having each node contain a linked list of pointers to all other reachable nodes.

Serial Code Example

   /**
    * Add a new edge to the graph.
    */
   public void addEdge( String sourceName, String destName, double cost )
   {
       Vertex v = getVertex( sourceName );
       Vertex w = getVertex( destName );
       v.adj.add( new Edge( w, cost ) );
   }
   /**
    * Single-source unweighted shortest-path algorithm.
    */
   public void unweighted( String startName )
   {
       clearAll( ); 
       Vertex start = (Vertex) vertexMap.get( startName );
       if( start == null )
           throw new NoSuchElementException( "Start vertex not found" );
       LinkedList q = new LinkedList( );
       q.addLast( start ); start.dist = 0;
       while( !q.isEmpty( ) )
       {
           Vertex v = (Vertex) q.removeFirst( );
           for( Iterator itr = v.adj.iterator( ); itr.hasNext( ); )
           {
               Edge e = (Edge) itr.next( );
               Vertex w = e.dest;
               if( w.dist == INFINITY )
               {
                   w.dist = v.dist + 1;
                   w.prev = v;
                   q.addLast( w );
               }
           }
       }
   }

Parallel Code Solution

Quiz

References

http://oreilly.com/catalog/masteralgoc/chapter/ch08.pdf http://www.devjavasoft.org/code/classhashtable.html http://osr600doc.sco.com/en/SDK_c++/_Intro_graph.html web.eecs.utk.edu/~berry/cs302s02/src/code/Chap14/Graph.java http://people.engr.ncsu.edu/efg/506/s01/lectures/notes/lec8.html http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg