CSC/ECE 517 Fall 2011/ch1 1b ds: Difference between revisions
Line 126: | Line 126: | ||
</pre> | </pre> | ||
In above example instead of getting a result of 8 due to the + operator being overloaded in a wrong way we instead get a result of 2 | |||
===Performance=== | ===Performance=== |
Revision as of 15:52, 8 September 2011
Cover the history of Collections frameworks back to the late 1970s or early 1980s, and how they were retrofitted to various languages. Discuss the advantage of a standard Collections framework (e.g., Java) vs. several competing frameworks (e.g., C++), and the advantage of collections built into the language (e.g., Ruby) vs. collections as class libraries. Give examples to illustrate the advantages you identify.
Authors
- Dilip Devaraj
- Srinath Sridhar
Introduction
A collection refers to a group of data items which have some commonality like a "a bunch of objects stored in a structured manner". Collections are very common in daily life, some examples would be a collection of science fiction books, the names of employees in a given department etc. Given that object oriented design concepts try to model the real world, it is natural to include support for creating and manipulating collections in languages which support object oriented development. Collections allows you to organize data in memory in various convenient ways. Example : a simple list of strings, a mapping or association of strings to strings (e.g. country names to the names of their capitals);
Almost all major programming and scripting languages that are used today such as Perl, Python], JavaScript, PHP, Ruby, C++ and Java have support for collections in some form or the other. Some languages have liner arrays (which can have different types of elements) and associative arrays built into the language whereas some other languages provide support through external libraries.
History
Java Collections Framework (JCF)
Java Collections Framework is a library of classes that implement collections. The collections framework also provides utility methods to perform functions such as sorting a list of data.The collections in Java are mainly the following List, Set, Map and Queue. Java provides these collections as interfaces from which one can write their own classes or can use the inbuilt Java classes that implement these interfaces.
List
List is a collection of objects that has a fixed order (when we add an object to the list, it will stay in the place we put it), and allows objects to be referred to by position.Duplicate elements are allowed in a list. The general-purpose List implementations are ArrayList and LinkedList.
Set
A Set is a Collection that cannot contain duplicate elements. It models the mathematical set abstraction.The Java platform contains three general-purpose Set implementations: HashSet, TreeSet, and LinkedHashSet.
Map
A Map is an object that maps keys to values. A map cannot contain duplicate keys: Each key can map to at most one value. Maps are defined by the java.util.Map interface in Java. The three general-purpose Map implementations are HashMap, TreeMap and LinkedHashMap. The java.util.Map interface is extended by its subinterface, java.util.SortedMap. This interface defines a map that's sorted by the keys provided.
Queue
A Queue is a collection for holding elements prior to processing. Besides basic Collection operations, queues provide additional insertion, removal, and inspection operations. LinkedList implements the Queue interface, providing first in, first out (FIFO) queue operations.
C++ Standard Template Library(STL)
C++ standard template library is a software library that provides implementations of some basic containers for storing collections of related objects as well as algorithms to perform operations on the elements in these containers. All the classes in C++ STL are generic and can be used with any user defined or primitive data types. In general the C++ STL has four major components.
Containers
These are classes which are used to store other objects. Taking an example suppose there is a “jar of jellybeans”, the jar can be thought of as the container while the jelly beans are the objects that are stored in the container. Most collections automatically resize depending on the number of elements in them.The C++ STL provides a number of such classes which can be classified into two categories
Sequence Containers
These containers impose a strict linear order on the elements. Sequence containers support insertions and deletions (either at the ends or at a random position) using an index. Some examples of sequence containers are vector, list, dequeue etc.
Associative containers
These containers store key value pairs and each object stored in an associative container is mapped to a specific key. In other words associative containers perform insertions and deletions based on a key. Some examples of associative containers are set, map, hash map, multiset etc.
Algorithms
The C++ STL contains a large collection of algorithms that can be used to manipulate the data contained in the container. Continuing from the earlier “jar of jelly beans” example, suppose you wanted to count the number of red colored jelly beans. Or you wanted to change the color of all the jelly beans to red the algorithms in STL allow you to accomplish this.
An important thing to note is that the algorithms in STL are decoupled from the container. This means that there is only one definition of the function which implements the algorithm. This works for any valid container containing any any valid set of objects. Analogous to our example the function to count the number of red jelly beans also works for a “bucket containing colored pebbles”.
The algorithms contained in STL can be classified into two categories
Non-Mutating
Those which do not change the collection in any way. Examples include count, find, for each etc.
Mutating
Those which modify the contents of the collection in some way. Examples include replace, copy, reverse etc.
Iterators
Iterators are an important part of the C++ STL. Iterators facilitate traversal through a container’s elements. The separation of algorithms from containers is achieved through the use of iterators. Iterators can be thought of as a generalization of pointers. All the algorithms make use of iterators as function arguments. There are 5 types of iterators in STL. These differ in the way they traverse through the elements in the container.
Functor
A functor or functional object is an object that can be called as a function. This is in essence similar to function pointers in C. A functor is any class that overloads the operator() function. Such a class is most commonly used to define custom comparison functions to compare two objects. This is similar to “comparator” in JAVA.
JCF versus STL
Quality of implementation
Java is standardised and have formalized methods such as Java community process(JCP) and Java Specification request (JSR) to define revisions of java platforms. C++ only gives a defining standard and one can have different implementations of that standard. Hence different implementations can be incompatible.
Portability
This is a direct consequence of the fact that standards are available in Java and not in C++. The following is a list of various STL libraries . These may have some common templates which might not necessarily be interoperable. In contrast the implementations of JCF are standardized and they will be portable as long as versions remain the same.
Better optimization of containers
In C++ STL the algorithms (i.e functions which manipulate the data) are completely decoupled from the containers it is not possible to optimize containers for specific purposes. In JCF since the algorithms are defined as interfaces it is easy to provide specific implementations based on requirements. Because the various implementations of each interface are interchangeable, programs can be easily tuned by switching implementations. The only way to optimize in STL is to create new containers which are tied together with the algorithms.
Fosters software reuse
By providing a standard interface for collections and algorithms to manipulate them
More number of methods in STL
STL has more number of methods than Java for collections. This is because STL has separate methods for containers and algortihms used for containers like Non-mutating algorithms, Mutating algorithms, Sorting
Collections in Java are more extensible than in STL
STL collections are difficult to extend as the base classes are not declared as virtual. In Java since the basic collections are interfaces, it is easy to implement the interfaces in a class, and extend from that.
Manipulation of container objects using Iterators
C++ STL uses iterators to iterate through the items. Iterators themselves are generalizations of pointers. This leads to problems when one has to manipulate the items in the container, especially if the items have to be removed. It is tricky to iterate through the container and delete objects within the same loop. This is not the casein java since memory cannot be directly manipulated by the programmer.
Pass by reference Vs Pass by value
Containers in STL is accessed using copy semantics, thereby providing both efficiency and type safety, while in Java collections are accessed via references (or pointers). In Java if an one tries to access an object of a container that was just deleted , since we are using a reference this would result in error.
Collections in languages versus library
We are comparing collections inbuilt into the Ruby language versus Java’s collection libraries.
Operators vs Methods
In languages with collections built into them many operations may be performed using operators instead of calling functions. Consider an associative array, called hash in ruby and map in java.
Assoc_array[ {“hello”, “world”} {“good”,”day”} ]
In a language with collections implemented as a class library one would use a function like “get(key)” to retrieve the value associated with a key. When there is language support one can use Assoc_array[key] to retrieve the value of the key.
Now suppose we would like to extend the get function to do something different, in a language like java we can simply write another function or create a different implementation of the associative array which overrides the get function defined in the interface. Now in case of a language where the get is implemented using an operator we should either overload the default behavior of the operator or we have to define a custom operator to do this. In general it is much easier to overload a function than to overload an operator, Further not all languages allow operator overloading.
Import
In Ruby since collections are part of the language , therefore a developer need not include library for common functionalities while writing a program. In case of collections which are present in libraries like Java, one needs to import the libraries before executing the code. This can become a hassle for a Java developer if one does not know the name of the library to import for using a collection.
Operator Overloading
In Java we do not have operator overloading, but instead use methods to do the equivalent operations. Ruby has overloading where one can override , say a + operator. By not having operator overloading Java can have clearer code.This is because when one uses a + overloaded operator, we can get an unexpected behavior if internally it is coded to something else, say a subtraction. The number of operators that you can overload is very small and most of them is relevant to scalar operations. Additionally, the detail of the operation performed by operators may not be as meaningful in comparison to a good method name.
class Test def initialize x @x = x end def +(y) @x - y end end a = Test.new 5 puts(a + 3)
In above example instead of getting a result of 8 due to the + operator being overloaded in a wrong way we instead get a result of 2
Performance
Collections in Ruby is slower than in Java as Ruby is a dynamically typed language. , therefore in Ruby type checking happens at run-time which would slow down collection performance.
Verbosity
In Ruby since one uses operator for hash-tables, the code is less verbose and therefore easier to type h[lastname] = firstname In Java one has to type a method name which increases the verbosity Eg) hashmap.put(lastname)= firstname
Readability
In Java when one has a nested methods on collections, the code is easier to read than in Ruby when one has to nested operators on collections within a single line. The nested operator code becomes very difficult to read.
References
[1] http://download.oracle.com/javase/tutorial/collections/intro/index.html
[2]http://www.javamex.com/tutorials/collections/index.shtml
[3]http://en.wikipedia.org/wiki/Java_collections_framework
[4]http://www.sgi.com/tech/stl/index.html
[5]http://en.wikipedia.org/wiki/Standard_Template_Library
[6]http://download.oracle.com/javase/6/docs/technotes/guides/collections/overview.html
[7]http://java.dzone.com/articles/why-java-doesnt-need-operator