CSC/ECE 517 Fall 2011/ch1 1b ds: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 8: Line 8:


==Introduction==
==Introduction==
A collection refers to a group of data items which have some commonality. Collections are very common in daily life, some examples would be a collection of science fiction books, the names of employees in a given department etc. Given that object oriented design concepts try to model the real world, it is natural to include support for creating and manipulating collections in languages which support object oriented development.  
A collection refers to a group of data items which have some commonality like a "a bunch of objects stored in a structured manner". Collections are very common in daily life, some examples would be a collection of science fiction books, the names of employees in a given department etc. Given that object oriented design concepts try to model the real world, it is natural to include support for creating and manipulating collections in languages which support object oriented development. Collections allows you to organize data in memory in various convenient ways.
Example
    * a simple list of strings;
    * a mapping or association of strings to strings (e.g. country names to the names of their capitals);


Almost all major programming and scripting languages that are used today such as [http://en.wikipedia.org/wiki/Perl Perl], [http://en.wikipedia.org/wiki/Python_%28programming_language%29 Python]], [http://en.wikipedia.org/wiki/JavaScript JavaScript], [http://en.wikipedia.org/wiki/PHP PHP], [http://en.wikipedia.org/wiki/Ruby_%28programming_language%29 Ruby], [http://en.wikipedia.org/wiki/C%2B%2B C++] and [http://en.wikipedia.org/wiki/Java_%28programming_language%29 Java] have support for collections in some form or the other. Some languages have liner arrays (which can have different types of elements) and associative arrays built into the language whereas some other languages provide support through external libraries.
Almost all major programming and scripting languages that are used today such as [http://en.wikipedia.org/wiki/Perl Perl], [http://en.wikipedia.org/wiki/Python_%28programming_language%29 Python]], [http://en.wikipedia.org/wiki/JavaScript JavaScript], [http://en.wikipedia.org/wiki/PHP PHP], [http://en.wikipedia.org/wiki/Ruby_%28programming_language%29 Ruby], [http://en.wikipedia.org/wiki/C%2B%2B C++] and [http://en.wikipedia.org/wiki/Java_%28programming_language%29 Java] have support for collections in some form or the other. Some languages have liner arrays (which can have different types of elements) and associative arrays built into the language whereas some other languages provide support through external libraries.

Revision as of 15:39, 8 September 2011

Cover the history of Collections frameworks back to the late 1970s or early 1980s, and how they were retrofitted to various languages. Discuss the advantage of a standard Collections framework (e.g., Java) vs. several competing frameworks (e.g., C++), and the advantage of collections built into the language (e.g., Ruby) vs. collections as class libraries. Give examples to illustrate the advantages you identify.

Authors

  • Dilip Devaraj
  • Srinath Sridhar

Introduction

A collection refers to a group of data items which have some commonality like a "a bunch of objects stored in a structured manner". Collections are very common in daily life, some examples would be a collection of science fiction books, the names of employees in a given department etc. Given that object oriented design concepts try to model the real world, it is natural to include support for creating and manipulating collections in languages which support object oriented development. Collections allows you to organize data in memory in various convenient ways. Example

   * a simple list of strings;
   * a mapping or association of strings to strings (e.g. country names to the names of their capitals);

Almost all major programming and scripting languages that are used today such as Perl, Python], JavaScript, PHP, Ruby, C++ and Java have support for collections in some form or the other. Some languages have liner arrays (which can have different types of elements) and associative arrays built into the language whereas some other languages provide support through external libraries.

History

Java Collections Framework (JCF)


Java Collections Framework is a library of classes that implement collections. The collections framework also provides utility methods to perform functions such as sorting a list of data. If you need to store instances of your own class in some types of collection then we need to employ hashing by writing our own hashCode and equals method.

The collections in Java are mainly the following List, Set, Map and Queue. Java provides these collections as interfaces from which one can write their own classes or can use the inbuilt Java classes that implement these interfaces.

List

List is a collection of objects that has a fixed order (when we add an object to the list, it will stay in the place we put it), and allows objects to be referred to by position.Duplicate elements are allowed in a list. The general-purpose List implementations are ArrayList and LinkedList.

Set

A Set is a Collection that cannot contain duplicate elements. It models the mathematical set abstraction.The Java platform contains three general-purpose Set implementations: HashSet, TreeSet, and LinkedHashSet.

Map

A Map is an object that maps keys to values. A map cannot contain duplicate keys: Each key can map to at most one value. Maps are defined by the java.util.Map interface in Java. The three general-purpose Map implementations are HashMap, TreeMap and LinkedHashMap. The java.util.Map interface is extended by its subinterface, java.util.SortedMap. This interface defines a map that's sorted by the keys provided.

Queue

A Queue is a collection for holding elements prior to processing. Besides basic Collection operations, queues provide additional insertion, removal, and inspection operations. LinkedList implements the Queue interface, providing first in, first out (FIFO) queue operations.

C++ Standard Template Library(STL)

C++ standard template library is a software library that provides implementations of some basic containers for storing collections of related objects as well as algorithms to perform operations on the elements in these containers. All the classes in C++ STL are generic and can be used with any user defined or primitive data types. In general the C++ STL has four major components.

Containers

These are classes which are used to store other objects. Taking an example suppose there is a “jar of jellybeans”, the jar can be thought of as the container while the jelly beans are the objects that are stored in the container. Most collections automatically resize depending on the number of elements in them.The C++ STL provides a number of such classes which can be classified into two categories

Sequence Containers

These containers impose a strict linear order on the elements. Sequence containers support insertions and deletions (either at the ends or at a random position) using an index. Some examples of sequence containers are vector, list, dequeue etc.

Associative containers

These containers store key value pairs and each object stored in an associative container is mapped to a specific key. In other words associative containers perform insertions and deletions based on a key. Some examples of associative containers are set, map, hash map, multiset etc.

Algorithms

The C++ STL contains a large collection of algorithms that can be used to manipulate the data contained in the container. Continuing from the earlier “jar of jelly beans” example, suppose you wanted to count the number of red colored jelly beans. Or you wanted to change the color of all the jelly beans to red the algorithms in STL allow you to accomplish this.

An important thing to note is that the algorithms in STL are decoupled from the container. This means that there is only one definition of the function which implements the algorithm. This works for any valid container containing any any valid set of objects. Analogous to our example the function to count the number of red jelly beans also works for a “bucket containing colored pebbles”.

The algorithms contained in STL can be classified into two categories

Non-Mutating

Those which do not change the collection in any way. Examples include count, find, for each etc.

Mutating

Those which modify the contents of the collection in some way. Examples include replace, copy, reverse etc.

Iterators

Iterators are an important part of the C++ STL. Iterators facilitate traversal through a container’s elements. The separation of algorithms from containers is achieved through the use of iterators. Iterators can be thought of as a generalization of pointers. All the algorithms make use of iterators as function arguments. There are 5 types of iterators in STL. These differ in the way they traverse through the elements in the container.

Functor

A functor or functional object is an object that can be called as a function. This is in essence similar to function pointers in C. A functor is any class that overloads the operator() function. Such a class is most commonly used to define custom comparison functions to compare two objects. This is similar to “comparator” in JAVA.

JCF vs STL

Quality of implementation

Java is standardised and have formalized methods such as Java community process(JCP) and Java Specification request (JSR) to define revisions of java platforms. C++ only gives a defining standard and one can have different implementations of that standard. Hence different implementations can be incompatible.

Portability

This is a direct consequence of the fact that standards are available in Java and not in C++. The following is a list of various STL libraries . These may have some common templates which might not necessarily be interoperable. In contrast the implementations of JCF are standardized and they will be portable as long as versions remain the same.

Better optimization of containers

In C++ STL the algorithms (i.e functions which manipulate the data) are completely decoupled from the containers it is not possible to optimize containers for specific purposes. In JCF since the algorithms are defined as interfaces it is easy to provide specific implementations based on requirements. Because the various implementations of each interface are interchangeable, programs can be easily tuned by switching implementations. The only way to optimize in STL is to create new containers which are tied together with the algorithms.

Fosters software reuse

By providing a standard interface for collections and algorithms to manipulate them

More number of methods in STL

STL has more number of methods than Java for collections. This is because STL has separate methods for containers and algortihms used for containers like Non-mutating algorithms, Mutating algorithms, Sorting

Collections in Java are more extensible than in STL

STL collections are difficult to extend as the base classes are not declared as virtual. In Java since the basic collections are interfaces, it is easy to implement the interfaces in a class, and extend from that.

Manipulation of container objects using Iterators

C++ STL uses iterators to iterate through the items. Iterators themselves are generalizations of pointers. This leads to problems when one has to manipulate the items in the container, especially if the items have to be removed. It is tricky to iterate through the container and delete objects within the same loop. This is not the casein java since memory cannot be directly manipulated by the programmer.

Pass by reference Vs Pass by value

Containers in STL is accessed using copy semantics, thereby providing both efficiency and type safety, while in Java collections are accessed via references (or pointers). In Java if an one tries to access an object of a container that was just deleted , since we are using a reference this would result in error.

Collections in languages vs library

We are comparing collections inbuilt into the Ruby language versus Java’s collection libraries.

Operators vs Methods

In languages with collections built into them many operations may be performed using operators instead of calling functions. Consider an associative array, called hash in ruby and map in java.

 
Assoc_array[
	{“hello”, “world”}
	{“good”,”day”}
]

In a language with collections implemented as a class library one would use a function like “get(key)” to retrieve the value associated with a key. When there is language support one can use Assoc_array[key] to retrieve the value of the key.

Now suppose we would like to extend the get function to do something different, in a language like java we can simply write another function or create a different implementation of the associative array which overrides the get function defined in the interface. Now in case of a language where the get is implemented using an operator we should either overload the default behavior of the operator or we have to define a custom operator to do this. In general it is much easier to overload a function than to overload an operator, Further not all languages allow operator overloading.

Import

In Ruby since collections are part of the language , therefore a developer need not include library for common functionalities while writing a program. In case of collections which are present in libraries like Java, one needs to import the libraries before executing the code. This can become a hassle for a Java developer if one does not know the name of the library to import for using a collection.

Operator Overloading

In Java we do not have operator overloading, but instead use methods to do the equivalent operations. Ruby has overloading where one can override , say a + operator. By not having operator overloading Java can have clearer code.This is because when one uses a + overloaded operator, we can get an unexpected behavior if internally it is coded to something else, say a subtraction. The number of operators that you can overload is very small and most of them is relevant to scalar operations. Additionally, the detail of the operation performed by operators may not be as meaningful in comparison to a good method name.

class Test
 def initialize x
   @x = x
 end

 def +(y)
   @x - y
 end
end
a = Test.new 5
puts(a + 3)

Performance

Collections in Ruby is slower than in Java as Ruby is a dynamically typed language. , therefore in Ruby type checking happens at run-time which would slow down collection performance.

Verbosity

In Ruby since one uses operator for hash-tables, the code is less verbose and therefore easier to type h[lastname] = firstname In Java one has to type a method name which increases the verbosity Eg) hashmap.put(lastname)= firstname

Readability

In Java when one has a nested methods on collections, the code is easier to read than in Ruby when one has to nested operators on collections within a single line. The nested operator code becomes very difficult to read.

References

[1] http://download.oracle.com/javase/tutorial/collections/intro/index.html

[2]http://www.javamex.com/tutorials/collections/index.shtml

[3]http://en.wikipedia.org/wiki/Java_collections_framework

[4]http://www.sgi.com/tech/stl/index.html

[5]http://en.wikipedia.org/wiki/Standard_Template_Library

[6]http://download.oracle.com/javase/6/docs/technotes/guides/collections/overview.html

[7]http://java.dzone.com/articles/why-java-doesnt-need-operator