CSC/ECE 517 Fall 2011/ch1 1b ds

From Expertiza_Wiki
Jump to navigation Jump to search

Cover the history of Collections frameworks back to the late 1970s or early 1980s, and how they were retrofitted to various languages. Discuss the advantage of a standard Collections framework (e.g., Java) vs. several competing frameworks (e.g., C++), and the advantage of collections built into the language (e.g., Ruby) vs. collections as class libraries. Give examples to illustrate the advantages you identify.

Introduction

A collection refers to a group of data items which have some commonality like a "a bunch of objects stored in a structured manner". Collections are very common in daily life, some examples would be a collection of science fiction books, the names of employees in a given department etc. Given that object oriented design concepts try to model the real world, it is natural to include support for creating and manipulating collections in languages which support object oriented development. Collections allows you to organize data in memory in various convenient ways. Example : a simple list of strings, a mapping or association of strings to strings (e.g. country names to the names of their capitals);

Almost all major programming and scripting languages that are used today such as Perl, Python], JavaScript, PHP, Ruby, C++ and Java have support for collections in some form or the other. Some languages have liner arrays (which can have different types of elements) and associative arrays built into the language whereas some other languages provide support through external libraries.

In this article we compare and contrast the two most popular collections frameworks/libraries in use today, C++ STL and Java Collections Framework (JCF). Also we compare collections in Ruby which does not have a formal collections framework against C++ and Java which define a formal framework/library for collections.

History

The history of collection frameworks can be analyzed in two aspects. Languages which have some basic collection containers built into them and languages with a formal collection framework. A collection framework may be defined as a generic reusable collection of classes which implements data structures and algorithms to create and manipulate groups of related objects. Few languages support a formal collections framework. Most of the modern programming languages have sequential arrays and associative array's built into them. A programmer may use these basic containers to create their own custom collection classes. A comparison of programming languages that implement associative arrays can be found here.

The history of a collections framework or a collections library originate from ideas related to generic programming developed by Alexander Stepanov in the early 80s. At that time Ada was the only language which had support for generics. An initial generic library for list processing in Ada was developed by Alexander Stepanov and David Musser. However Ada did not achieve widespread acceptance by the programming community. As C/C++ became more widespread attention was given to developing a generic library in C++. Finally a proposal was made to the ANSI/ISO committee in 1993 and the C++ STL was finally accepted in 1994.

In the case of Java the initial versions prior to JDK-1.2 did not have a collections framework. These versions just defined Vectors and HashTables which were used to group objects together. This is reminiscent of the way things are done in languages such as Ruby, Python etc. The formal collections framework was introduced in 1998 with JDK-1.2 which borrowed a lot form Doug Lea's Collections package. The goal of this framework was to be consistent with C++ STL. But soon Sun Microsystems decided that Java needed a compact collections framework and consistency with C++ STL was no longer a priority. This led to the development of the Java Collections Framework as we know it today.

Java Collections Framework (JCF)


Java Collections Framework is a library of classes that implement collections. The collections framework also provides utility methods to perform functions such as sorting a list of data.The collections in Java are mainly the following: List, Set, Map and Queue. Java provides these collections as interfaces from which one can write their own classes or can use the inbuilt Java classes that implement these interfaces.

List

List is a collection of objects that has a fixed order (when we add an object to the list, it will stay in the place we put it), and allows objects to be referred to by position.Duplicate elements are allowed in a list. The general-purpose List implementations are ArrayList and LinkedList.

Set

A Set is a Collection that cannot contain duplicate elements. It models the mathematical set abstraction.The Java platform contains three general-purpose Set implementations: HashSet, TreeSet, and LinkedHashSet.

Map

A Map is an object that maps keys to values. A map cannot contain duplicate keys: Each key can map to at most one value. Maps are defined by the java.util.Map interface in Java. The three general-purpose Map implementations are HashMap, TreeMap and LinkedHashMap. The java.util.Map interface is extended by its subinterface, java.util.SortedMap. This interface defines a map that's sorted by the keys provided.

Queue

A Queue is a collection for holding elements prior to processing. Besides basic Collection operations, queues provide additional insertion, removal, and inspection operations. LinkedList implements the Queue interface, providing first in, first out (FIFO) queue operations.

C++ Standard Template Library(STL)

C++ standard template library is a software library that provides implementations of some basic containers for storing collections of related objects as well as algorithms to perform operations on the elements in these containers. All the classes in C++ STL are generic and can be used with any user defined or primitive data types. In general the C++ STL has four major components.

Containers

These are classes which are used to store other objects. Taking an example suppose there is a “jar of jellybeans”, the jar can be thought of as the container while the jelly beans are the objects that are stored in the container. Most collections automatically resize depending on the number of elements in them.The C++ STL provides a number of such classes which can be classified into two categories

Sequence Containers

These containers impose a strict linear order on the elements. Sequence containers support insertions and deletions (either at the ends or at a random position) using an index. Some examples of sequence containers are vector, list, dequeue etc.

Associative containers

These containers store key value pairs and each object stored in an associative container is mapped to a specific key. In other words associative containers perform insertions and deletions based on a key. Some examples of associative containers are set, map, hash map, multiset etc.

Algorithms

The C++ STL contains a large collection of algorithms that can be used to manipulate the data contained in the container. Continuing from the earlier “jar of jelly beans” example, suppose you wanted to count the number of red colored jelly beans. Or you wanted to change the color of all the jelly beans to red the algorithms in STL allow you to accomplish this.

An important thing to note is that the algorithms in STL are decoupled from the container. This means that there is only one definition of the function which implements the algorithm. This works for any valid container containing any any valid set of objects. Analogous to our example the function to count the number of red jelly beans also works for a “bucket containing colored pebbles”.

The algorithms contained in STL can be classified into two categories

Non-Mutating

Those which do not change the collection in any way. Examples include count, find, for each etc.

Mutating

Those which modify the contents of the collection in some way. Examples include replace, copy, reverse etc.

Iterators

Iterators are an important part of the C++ STL. Iterators facilitate traversal through a container’s elements. The separation of algorithms from containers is achieved through the use of iterators. Iterators can be thought of as a generalization of pointers. All the algorithms make use of iterators as function arguments. There are 5 types of iterators in STL. These differ in the way they traverse through the elements in the container.

Functor

A functor or functional object is an object that can be called as a function. This is in essence similar to function pointers in C. A functor is any class that overloads the operator() function. Such a class is most commonly used to define custom comparison functions to compare two objects. This is similar to “comparator” in JAVA.

JCF versus STL

Quality of implementation

Java is standardised and have formalized methods such as Java community process(JCP) and Java Specification request (JSR) to define revisions of java platforms. C++ only gives a defining standard and one can have different implementations of that standard. Hence different implementations can be incompatible.

Portability

This is a direct consequence of the fact that standards are available in Java and not in C++. The following is a list of various STL libraries . These may have some common templates which might not necessarily be interoperable. In contrast the implementations of JCF are standardized and they will be portable as long as versions remain the same.

Better optimization of containers

In C++ STL the algorithms (i.e functions which manipulate the data) are completely decoupled from the containers it is not possible to optimize containers for specific purposes. In JCF since the algorithms are defined as interfaces it is easy to provide specific implementations based on requirements. Because the various implementations of each interface are interchangeable, programs can be easily tuned by switching implementations. The only way to optimize in STL is to create new containers which are tied together with the algorithms.

More number of methods in STL

STL has more number of methods than Java for collections. This is because STL has separate methods for containers and algorithms used for containers like Non-mutating algorithms, Mutating algorithms, Sorting

Extension capability

STL collections are difficult to extend as the base classes are not declared as virtual. In Java since the basic collections are interfaces, it is easy to implement the interfaces in a class, and extend from that.

Manipulation of container objects using Iterators

C++ STL uses iterators to iterate through the items. Iterators themselves are generalizations of pointers. This leads to problems when one has to manipulate the items in the container, especially if the items have to be removed. It is tricky to iterate through the container and delete objects within the same loop. This is not the case in java since memory cannot be directly manipulated by the programmer.

  map<char,int> mymap;
  map<char,int>::iterator it;
  int count = 0;
  // insert some values:
  mymap['a']=10;
  mymap['b']=20;
  mymap['c']=30;
  mymap['d']=40;
  mymap['e']=50;
  mymap['f']=60;

  //suppose you decide to delete the 3rd item onwards
  for ( it=mymap.begin() ; it != mymap.end(); it++ ){
    if(count >=2){
       mymap.erase(it); // This statement is dangerous. When the iterator is used to delete the entry the reference that the iterator holds                                     
                        // is lost. Now we have lost the reference to our position in the map.
                        // Intuitively it seems iterator just specifies the entry in the map that has to be deleted. But the iterator is a 
                        // pointer that holds a reference and it would jump to the beginning of the map.
    }
    count++
  }

Pass by reference versus pass by value

Containers in STL is accessed using copy semantics, thereby providing both efficiency and type safety, while in Java collections are accessed via references (or pointers). In Java this can cause problems if the programmer is not careful.

Consider the following piece of code

HashMap<String, HashMap<String,String>> outer = new HashMap<String, HashMap<String,String>>();
HashMap<String,String> inner = new HashMap<String,String>();

inner.add("A","1");
outer.add("A1",inner);

inner.clear(); // This statement also erases the entry in the outer HashMap. This is because of the fact mentioned above.
               // When the inner hash map is added as an entry into the outer HashMap only a reference gets stored and not the actual HashMap          
               // Object.This is not the case with C++ STL which has copy semantics

inner.add("B","2")

Collections in languages versus library

We are comparing collections inbuilt into the Ruby language versus Java’s collection libraries.

Operators vs Methods

In languages with collections built into them many operations may be performed using operators instead of calling functions. Consider an associative array, called hash in ruby and map in java.

 
Assoc_array[
	{“hello”, “world”}
	{“good”,”day”}
]

In a language with collections implemented as a class library one would use a function like “get(key)” to retrieve the value associated with a key. When there is language support one can use Assoc_array[key] to retrieve the value of the key.Now suppose we would like to extend the get function to do something different, in a language like java we can simply write another function or create a different implementation of the associative array which overrides the get function defined in the interface. Now in case of a language where the get is implemented using an operator we should either overload the default behavior of the operator or we have to define a custom operator to do this. In general it is much easier to overload a function than to overload an operator, Further not all languages allow operator overloading.

Import

In Ruby since collections are part of the language , therefore a developer need not include library for common functionalities while writing a program. In case of collections which are present in libraries like Java, one needs to import the libraries before executing the code. This can become a hassle for a Java developer if one does not know the name of the library to import for using a collection.

Operator Overloading

In Java we do not have operator overloading, but instead use methods to do the equivalent operations. Ruby has overloading where one can override , say a + operator. By not having operator overloading Java can have clearer code.This is because when one uses a + overloaded operator, we can get an unexpected behavior if internally it is coded to something else, say a subtraction. The number of operators that you can overload is very small and most of them is relevant to scalar operations. Additionally, the detail of the operation performed by operators may not be as meaningful in comparison to a good method name.

class Test
 def initialize x
   @x = x
 end

 def +(y)
   @x - y
 end
end
a = Test.new 5
puts(a + 3)

In above example instead of getting a result of 8 due to the + operator being overloaded in a wrong way we instead get a result of 2

Performance

Collections in Ruby is slower than in Java as Ruby is a dynamically typed language. , therefore in Ruby type checking happens at run-time which would slow down collection performance.

Verbosity

In Ruby since one uses operator for hash-tables, the code is less verbose and therefore easier to type:

 h[lastname] = firstname 

In Java one has to type a method name which increases the verbosity:

hashmap.put(lastname)= firstname

Readability

In Java when one has nested methods on collections, the code is easier to read than in Ruby when one has nested operators on collections within a single line. The Java code is easier to debug also as one can look at the values of the variables assigned on the nested methods.

Conclusion

We started by listing out the collections present in Java which uses JCF, and C++ which uses STL. Then we made a comparison of the collections in C++ , Java and Ruby. Each language's implementation of collections has its pros and cons. Ultimately it is the developer who chooses which one to use, and this is based on the needs of his application. In general C++ STL provides collections that are fast, Java JCF provides collections that are safe, while in Ruby, the collections can be written with minimal code as they are built into the language.

References

[1] http://download.oracle.com/javase/tutorial/collections/intro/index.html

[2]http://www.javamex.com/tutorials/collections/index.shtml

[3]http://en.wikipedia.org/wiki/Java_collections_framework

[4]http://www.sgi.com/tech/stl/index.html

[5]http://en.wikipedia.org/wiki/Standard_Template_Library

[6]http://download.oracle.com/javase/6/docs/technotes/guides/collections/overview.html

[7]http://java.dzone.com/articles/why-java-doesnt-need-operator

[8]http://www.ruby-doc.org/core/