CSC/ECE 517 Fall 2011/ch1 1b tj
Collections Framework : Cover the history of Collections frameworks back to the late 1970s or early 1980s, and how they were retrofitted to various languages. Discuss the advantage of a standard Collections framework (e.g., Java) vs. several competing frameworks (e.g., C++), and the advantage of collections built into the language (e.g., Ruby) vs. collections as class libraries. Give examples to illustrate the advantages you identify.
Introduction
In this wiki, we are going to talk about collections frameworks : their history, the advantages of standard collections frameworks (like Java), compared to that of C++, the advantages of collections built into languages like Ruby compared to those having collections as class libraries. We will illustrate the advantages and compare the differences with some examples.
What is a collections framework?
Firstly, what is a collection? A collection is a group of objects, and is essentially a container that groups multiple elements into a single unit. Since a collection is just a group of objects, some examples of a collection are : a stack of books, a dictionary, a list of names, a telephone directory. A collections framework is a system to represent and manipulate these collections.
Need for a collections framework
There are many needs for a collections framework. Some of them are :
- It solves the problem of storing and retrieving data
- It makes sure we write lesser code, and the code is more readable
- It improves the performance, and makes sure the programmer doesn't have to bother with this
- Makes sure we have reusable data structures
These reasons led to the development of many collections frameworks.
History
In the 70's and 80's, a number of languages had collections built into them, unlike it is now in C++ or Java where libraries have to be included for performing tasks on certain collections. Some of these languages are Smalltalk, Perl, Scheme, Miranda, and Haskell. Just to give a brief idea, below we have given a short note on each one of these languages.
Smalltalk
Smalltalk was object-oriented and was one of them that had collections. Collections in Smalltalk could be as indexed or non-indexed, indexed by explicit or implicit keys, ordered using an index or unordered, and fixed size or expandable. Smalltalk has Arrays, and the Array class is a subclass of ArrayedCollection. An array can hold any type of object, and each entry in the array can be a different object. The code snippet below creates an array object in Smalltalk :
a _ Array new: 5
And to place an object at a specific location in the array :
a at: 3 put: object
a at: 2 returns the object at position 2. Very similar to arrays, OrderedCollection is also used. The difference between them both is that all objects of Array type are fixed size, while those of OrderedCollection can be of varying size. Collections that hold key value pairs were done using Dictionaries.
| dict | dict := Dictionary new. dict at: 'Apple' put: 'Fruit'. dict at: 'Onion' put: 'Vegetable'. dict at: 'Pork' put: 'Meat'.
Then, in order to get the value for key 'Onion' we simply say :
dict at 'Onion'
which gives us 'Vegetable'.
Perl
Perl is a very powerful scripting language, and has collections as arrays and associative arrays. Arrays in Perl are defined with an @ sign.
@mylist = ("Tom", 2);
$mylist[0] has the value "Tom", and $mylist[1] has the value 2. In Perl, associative arrays can be used for hashing, and are defined with a % sign.
%months = ( "January" , 1, "April" , 4, "November", 11 );
In the example above, $months("April") would return the value 4.
Scheme
Miranda
Haskell
Collection Frameworks today
Alexander Stepanov was the biggest contributor to the designing of Standard Template Library. In late 1970's he started working on Generic Programming. At that time there was not much support for Generic Programming in any Computer programming languages. A language called Ada was able to provide some amount of support. By late 1980's Alexander Stepanov and David Musser had worked on the Ada library and later developed and also published it. By that time, programmers were using C++ and it was more widely used and thought to provide a good support for generic programming even though it was still considered a new language. Stephanov realized that the model of computation in C and C++ had a lot of flexibility and turned to C++. After a lot of research and experimentation at AT&T and HP laboratories, Stepanov presented his ideas at ANSI/ISO meeting. With some more experimentation and collaboration, in 1994, the STL (Standard Template Library) became a part of the ANSI/ISO standard definition of the C++ language.
As mentioned before, to address the need for reusable collection data structures, several independent frameworks were developed. Doug Lea's collections package (released in 1995) was the first widely used collection. In 1996, a company called Object Space developed and released the Java Generic Library ,JGL, with the main goal of being consistent with C++ Standard Template Library.
The Java collections framework was first introduced in 1998, with Java 2 platform, Standard edition. version 1.2. Before collections framework was introduced, grouping of objects in Java was through the use of arrays, Vector and HashTable.
The C++ Standard Template Library (STL)
The Standard Template Library, or STL, is a C++ library of container classes, algorithms, and iterators. The STL provides many of the basic algorithms and data structures of computer science. It can be called a Generic library because most of the components in the STL are templates. STL includes Container classes, that is, classes that contain other objects. The STL includes the classes vector, list, set, map, hash set. Each of these classes is a template, and can be instantiated to contain any type of object. The STL also includes a large collection of Algorithms that manipulate the data stored in containers. Iterators are the mechanism that makes it possible to decouple algorithms from containers: algorithms are templates, and are parameterized by the type of iterator, so they are not restricted to a single type of container.
The Java Collection Framework
The diagram (Source : [2]) below shows the interface hierarchy in the collections framework in Java.
The interfaces map and collection are both distinct in the hierarchy in Java. This distinction is because of the way maps and collection are used in Java libraries.
The collections framework in Java consists of:
- Collection Interfaces for representing different types of collections, like sets, lists and maps.
- General-purpose Implementations which are the most commonly used implementations, designed for day to day use.
- Legacy Implementations - The collection classes from earlier releases, Vector and Hashtable, have been retrofitted to implement the collection interfaces.
- Special-purpose Implementations - Implementations designed for use in special situations. These implementations display nonstandard performance characteristics, usage restrictions, or behavior.
- Concurrent Implementations - Implementations designed for highly concurrent use.
- Wrapper Implementations - Add functionality, such as synchronization, to other implementations.
- Convenience Implementations - High-performance "mini-implementations" of the collection interfaces.
- Abstract Implementations - Partial implementations of the collection interfaces to facilitate custom implementations.
- Algorithms - Static methods that perform useful functions on collections, such as sorting a list.
- Infrastructure - Interfaces that provide essential support for the collection interfaces.
- Array Utilities - Utility functions for arrays of primitives and reference objects. Not, strictly speaking, a part of the Collections Framework, this functionality was added to the Java platform at the same time and relies on some of the same infrastructure.
The C++ STL vs Java Collection Framework
Prior to Java 2, arrays, vectors and HashTable had different syntaxes for accessing members. For example, the arrays uses square bracket symbol [ ], the HashTable uses 'get and put' method and as for Vectors, it is the elementAt method. This caused a lot of inconsistency meaning programmers were using different methods to implement their own collections. Also, prior to Java 2, the classes lacked a central and unifying theme. There are many advantages of Java Collections framework compared to frameworks such as those in C++( standard template library).
The STL (Standard Template Library) is organized around three fundamental components namely containers - to hold objects, algorithms ( perform manipulation of elements in containers) and iterators. In addition, supporting them there are three additional components: allocators, adaptors and function objects.
- Containers manage collections of objects.
There are a variety of containers, and each offers unique advantages. Standard containers include vector, list, deque, set, map, and so on.
- Algorithms process elements in containers, or more specifically, in ranges. The STL offers a wide variety of algorithms to search, sort, and operate on elements in containers.
- Iterators are generalized pointer-like objects that are used to step through the elements of a container.
Iterators can work on an entire collection of objects or on a subset of objects.
The primary difference between STL and JCF is that JCF has a singular focus - It focuses mainly on containers rather than on combination of containers and algorithms ( as STL does).
The Quality of Implementation of the C++ compiler has a large impact on usability of STL. First, the error messages involving templates are very long and hard to interpret. Then, if used without proper care, STL templates can lead to code bloat. The Java Collections Framework make programming easy by providing many useful data structures and algorithms so that the programmer does not have to write them. The performance of a program is increased considerably because of the implementation of these data structures and algorithms because the various implementations of each interface can be interchanged. Collections have much better performance compared to Vectors and HashTable in the old version. The entire standard collections framework is designed around a set of standard interfaces and several standard implementations such a LinkedList, HashSet and TreeSet are provided in these interfaces so the programmers can use them or also create their own.
Collections in Ruby
In Ruby, we have containers which are entities that contain other entities. The two native containers in Ruby are :
- Arrays, which can hold a collection of object references
- Hashes, which are an indexed collection of object references
We can also create our own list structure in Ruby. The example below illustrates the syntax of how Arrays can hold a collection of object references. Initialization is not a issue in Ruby as its dynamically typed.
arr = ["Onion", 1.414, "Tomato"] a = ["Rob", arr, 9]
As shown above, arr is an array of different objects, and so is a. The section below gives a clearer view of this, with the outputs after each step.
arr = ["Onion", 1.414, "Tomato"] =>["Onion", 1.414, "Tomato"] arr.type => Array arr.length => 3 arr[1] => 1.414 a = ["Rob", arr, 9] => ["Rob", ["Onion", 1.414, "Tomato"], 9]
Hashing can also be done very easily in Ruby, without the need for any reference libraries.
food = {'Apple' => 'Fruit', 'Onion' => 'Vegetable', 'Pork' => 'Meat'}
So food['Onion'] gives us the value corresponding to key 'Onion', which is 'Vegetable'.
Ruby vs Java's Collection Framework
In Java when one has a nested methods on collections, the code is easier to read than in Ruby when one has to nested operators on collections within a single line. The nested operator code becomes very difficult to read.
When comparing the runtime of Java and Ruby, Java is way faster than the latter. But since Ruby, a dynamically typed language, has built-in lists/ arrays , hashes, there is no need for the programmer to have to build all the infrastructure. This is a win situation for Ruby when compared to Java's collections. Thus, the built-in arrays and hashes are a huge advantage over JCF. Ruby has a mark-and-sweep garbage collection which allows programmers the ability to code without having to worry about the need to maintain reference counts in extension libraries. Also, if an operating system allows for it, Ruby can dynamically load extension libraries. (Reference : Ruby-lang.org)
References
[1] Systems Architecture & Software Group, Smalltalk Collections : http://www.ifi.uzh.ch/richter/Classes/oose2/01_Collections/03_smalltalk/03_smalltalk.html
[2] JavaTM 2 Platform, Standard Edition, v 1.4.2: http://oracle.com/javase/1.4.2/docs/guide/collections/overview.html
[3] Programming Ruby - The Pragmatic Programmer's Guide : http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_containers.html
[4] MSDN Magazine: http://msdn.microsoft.com/en-us/magazine/cc301955.aspx
[5] Associative Arrays in Perl : http://www.comp.leeds.ac.uk/Perl/associative.html
[6] Simple Collections in Smalltalk : http://mainline.brynmawr.edu/Courses/cs246/spring2002/SmallTalk/Collections1.html
[7] Smalltalk Collections : http://www.ifi.uzh.ch/richter/Classes/oose2/01_Collections/03_smalltalk/03_smalltalk.html#1.%20Intro
[8] Ten Reasons for a Collections Framework : http://sibirjak.com/osflash/blog/why-we-need-a-collection-framework-in-actionscript/
[9] dmh2000 : http://www.dmh2000.com/cjpr/index.shtml
[10] SGI : http://www.sgi.com/tech/stl/stl_introduction.html