CSC/ECE 517 Fall 2011/ch1 1b tj: Difference between revisions
(92 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
--------------------- | --------------------- | ||
= Introduction = | = Introduction = | ||
In this wiki, we are going to talk about collections frameworks : their history, the advantages of standard collections frameworks (like Java), compared to that of C++, the advantages of collections built into languages like Ruby compared to those having collections as class libraries. We will illustrate the advantages and compare the differences with some examples. | In this wiki, we are going to talk about collections frameworks : their history, the advantages of standard collections frameworks (like Java and .NET framework), compared to that of C++, the advantages of collections built into languages like Ruby compared to those having collections as class libraries. We will illustrate the advantages and compare the differences with some examples. | ||
== What is a collections framework? == | == What is a collections framework? == | ||
Firstly, | Firstly, let us define a '''''collection''''' , and a '''''collection's framework'''''. | ||
Since a collection is just a group of objects, some examples of a collection are : a stack of books, a dictionary, a list of names, a telephone directory. | |||
A collections framework is a system to represent and manipulate these collections. | A '''''collection''''' can be defined as a group of objects, and is essentially a container that groups multiple elements into a single unit. | ||
In computer science, a '''''collection''''' is grouping of data items that need to be operated together, and are similar in some sense. | |||
In computer science, usually the data items that comprise a collection are all of similar type, or are all inherited from a common ancestor. | |||
Since a [http://en.wikipedia.org/wiki/Collection_%28computing%29 collection] is just a group of objects, some examples of a collection are : a stack of books, a dictionary, a list of names, a telephone directory. In computer science, examples of collections are : lists, sets, bags, trees, graphs, associative arrays, etc. | |||
A '''''collections framework''''' is a system to represent and manipulate these collections. | |||
A collections framework would thus comprise of various forms of collections (essentially data structures, like mentioned in the computer science definition of a collection) present together that are used by the programmer. A collections framework would thus consist of ways to implement collections like sets, trees, associative arrays, lists, etc., easily. | |||
== Need for a collections framework == | == Need for a collections framework == | ||
There are many needs for a collections framework. Some of them are : | There are many needs [http://sibirjak.com/osflash/blog/why-we-need-a-collection-framework-in-actionscript/] for a collections framework. Some of them are : | ||
* It solves the problem of storing and retrieving data | * It solves the problem of storing and retrieving data | ||
* It makes sure we write lesser code, and the code is more readable | * It makes sure we write lesser code, and the code is more readable | ||
Line 18: | Line 28: | ||
= History = | = History = | ||
In the 70's and 80's, a number of languages had collections built into them, unlike how it is now in say C++ where libraries have to be included for performing tasks on certain collections. Some of these languages are Smalltalk, Perl, Scheme, Miranda, and Haskell. Just to give a brief idea, below we have given a short note on Smalltalk and Perl. | In the 70's and 80's, a number of languages had collections built into them, unlike how it is now in say C++ where libraries have to be included for performing tasks on certain collections. Some of these languages are [http://en.wikipedia.org/wiki/Smalltalk Smalltalk], [http://en.wikipedia.org/wiki/Perl Perl], [http://en.wikipedia.org/wiki/Scheme_%28programming_language%29 Scheme], [http://en.wikipedia.org/wiki/Miranda_%28programming_language%29 Miranda], and [http://en.wikipedia.org/wiki/Haskell_%28programming_language%29 Haskell]. Just to give a brief idea, and because Ruby is loosely based on Smalltalk and Perl, below we have given a short note on Smalltalk and Perl. | ||
== Smalltalk == | == Smalltalk == | ||
Smalltalk was object-oriented and was one of them that had collections. Collections in Smalltalk could be as indexed or non-indexed, indexed by explicit or implicit keys, ordered using an index or unordered, and fixed size or expandable. | Smalltalk was object-oriented and was one of them that had collections. Collections in Smalltalk could be as indexed or non-indexed, indexed by explicit or implicit keys, ordered using an index or unordered, and fixed size or expandable. [http://mainline.brynmawr.edu/Courses/cs246/spring2002/SmallTalk/Collections1.html] | ||
Smalltalk has Arrays, and the Array class is a subclass of ArrayedCollection. An array can hold any type of object, and each entry in the array can be a different object. The code snippet below creates an array object in Smalltalk : | Smalltalk has Arrays, and the Array class is a subclass of ArrayedCollection. An array can hold any type of object, and each entry in the array can be a different object. The code snippet below creates an array object in Smalltalk : | ||
<pre> | <pre> | ||
Line 50: | Line 60: | ||
</pre> | </pre> | ||
$mylist[0] has the value "Tom", and $mylist[1] has the value 2. | $mylist[0] has the value "Tom", and $mylist[1] has the value 2. | ||
In Perl, associative arrays can be used for hashing, and are defined with a % sign. | In Perl, associative arrays [http://www.comp.leeds.ac.uk/Perl/associative.html] can be used for hashing, and are defined with a % sign. | ||
<pre> | <pre> | ||
Line 60: | Line 70: | ||
== Collections Frameworks today == | == Collections Frameworks today == | ||
Alexander Stepanov was the biggest contributor to the designing of Standard Template Library. In late 1970's he started working on Generic Programming. At that time there was not much support for Generic Programming in any Computer programming languages. A language called Ada was able to provide some amount of support. By late 1980's Alexander Stepanov and David Musser had worked on the Ada library and later developed and also published it. By that time, programmers were using C++ and it was more widely used and thought to provide a good support for generic programming even though it was still considered a new language. Stephanov realized that the model of computation in C and C++ had a lot of flexibility and turned to C++. After a lot of research and experimentation at AT&T and HP laboratories, Stepanov presented his ideas at ANSI/ISO meeting. With some more experimentation and collaboration, in 1994, the STL (Standard Template Library) became a part of the ANSI/ISO standard definition of the C++ language. | Alexander Stepanov was the biggest contributor to the designing of Standard Template Library [http://en.wikipedia.org/wiki/Standard_Template_Library].In late 1970's he started working on Generic Programming. At that time there was not much support for Generic Programming in any Computer programming languages. A language called Ada was able to provide some amount of support. By late 1980's Alexander Stepanov and David Musser had worked on the Ada library and later developed and also published it. By that time, programmers were using C++ and it was more widely used and thought to provide a good support for generic programming even though it was still considered a new language. Stephanov realized that the model of computation in C and C++ had a lot of flexibility and turned to C++. After a lot of research and experimentation at AT&T and HP laboratories, Stepanov presented his ideas at ANSI/ISO meeting. With some more experimentation and collaboration, in 1994, the STL (Standard Template Library) became a part of the ANSI/ISO standard definition of the C++ language. | ||
As mentioned before, to address the need for reusable collection data structures, several independent frameworks were developed. Doug Lea's collections package (released in 1995) was the first widely used collection. In 1996, a company called Object Space developed and released the Java Generic Library ,JGL, with the main goal of being consistent with C++ Standard Template Library. | As mentioned before, to address the need for reusable collection data structures, several independent frameworks were developed. Doug Lea's collections package (released in 1995) was the first widely used collection. In 1996, a company called Object Space developed and released the Java Generic Library ,JGL, with the main goal of being consistent with C++ Standard Template Library. | ||
Line 66: | Line 76: | ||
The Java collections framework was first introduced in 1998, with Java 2 platform, Standard edition. version 1.2. Before collections framework was introduced, grouping of objects in Java was through the use of arrays, Vector and HashTable. | The Java collections framework was first introduced in 1998, with Java 2 platform, Standard edition. version 1.2. Before collections framework was introduced, grouping of objects in Java was through the use of arrays, Vector and HashTable. | ||
There are many other collections frameworks : the .NET framework has collection classes | There are many other collections frameworks : the .NET framework [http://en.wikipedia.org/wiki/.NET_Framework] has collection classes too. Microsoft started development on the .NET Framework in the late 90s under the name of Next Generation Windows Services (NGWS). In 2000 the first beta versions of .NET 1.0 were released. In this article, we will be comparing the C++ Standard Template Library with Java Collections Framework and Collections in .NET, and Ruby to compare and analyze the advantages and disadvantages of each. However, since frameworks are quite similar, we will be talking in detail about JCF over .NET. The differences and similarities provided, however compare JCF, STL, collections in .NET and in Ruby. | ||
= C++'s Standard Template Library vs Java Collections Framework = | = C++'s Standard Template Library vs Java Collections Framework = | ||
Line 72: | Line 82: | ||
== C++ Standard Template Library (STL) == | == C++ Standard Template Library (STL) == | ||
The C++ Standard Template Library (STL) is a powerful and versatile collection of classes and functions that provides an efficient, lightweight, and extensible framework for application development. STL also offers a sophisticated level of abstraction that promotes the use of generic data structures and algorithms.The Standard Template Library, is a C++ library of container classes, algorithms, and iterators. The STL provides many of the basic algorithms and data structures of computer science. It can be called a Generic library because most of the components in the STL are templates. | The C++ [http://en.wikipedia.org/wiki/Standard_Template_Library Standard Template Library] (STL) is a powerful and versatile collection of classes and functions that provides an efficient, lightweight, and extensible framework for application development [http://msdn.microsoft.com/en-us/magazine/cc301955.aspx]. STL also offers a sophisticated level of abstraction that promotes the use of generic data structures and algorithms.The Standard Template Library, is a C++ library of container classes, algorithms, and iterators. The STL provides many of the basic algorithms and data structures of computer science. It can be called a Generic library because most of the components in the STL are templates. | ||
=== Architecture of STL === | === Architecture of STL === | ||
The diagram | The diagram below gives an idea of the architecture of the Standard Template Library in STL : | ||
[[File: | [[File:archi2.jpg]] | ||
=== Types of Collections === | === Types of Collections === | ||
Line 112: | Line 119: | ||
In C++, a typical vector implementation consists of a pointer referring to a dynamically allocated array, and data members holding the capacity and size of the vector. The size of the vector refers to the number of elements, where as the capacity refers to the size of the internal array. As new elements are inserted, and if the new size of the vector becomes larger than its capacity, reallocation occurs. This causes the vector to allocate a new location for storage followed by moving the previously held elements to a new location of storage, and freeing the old space. The example below illustrates this. | In C++, a typical vector implementation consists of a pointer referring to a dynamically allocated array, and data members holding the capacity and size of the vector. The size of the vector refers to the number of elements, where as the capacity refers to the size of the internal array. As new elements are inserted, and if the new size of the vector becomes larger than its capacity, reallocation occurs. This causes the vector to allocate a new location for storage followed by moving the previously held elements to a new location of storage, and freeing the old space. The example below illustrates this. | ||
<pre> | <pre> | ||
#include <vector> | #include <vector> | ||
Line 118: | Line 124: | ||
{ | { | ||
std::vector<int> a(1); // Creating a Vector. | std::vector<int> a(1); // Creating a Vector. | ||
int& first = *a.begin(); // Making a reference to the first element. | int& first = *a.begin(); // Making a reference to the first element. | ||
a.insert(v.end(), a.capacity(), 0); // Adding more elements to the Vector thereby forcing re-allocation. | |||
a.insert(v.end(), a.capacity(), 0); // Adding more elements to the Vector thereby forcing re-allocation. | |||
} | } | ||
</pre> | </pre> | ||
Now, let us talk about Hashmaps. | Now, let us talk about Hashmaps. | ||
The hash_map class uses an associative array in which the key is stored according to some hashing scheme. | The hash_map [http://www.sgi.com/tech/stl/hash_map.html] class uses an associative array in which the key is stored according to some hashing scheme. | ||
Each one has a key and a corresponding value associated to it. Hashing is very useful when we want the time complexity of searching in ''O(1)'', or a constant time order. | Each one has a key and a corresponding value associated to it. Hashing is very useful when we want the time complexity of searching in ''O(1)'', or a constant time order. | ||
<pre> | <pre> | ||
#include <hash_map> | #include <hash_map> | ||
int main() | |||
{ | |||
hash_map<const char*, int, hash<const char*>, eqstr> fruits; | |||
fruits["Apple"] = 1; | fruits["Apple"] = 1; | ||
fruits["Orange"] = 2; | fruits["Orange"] = 2; | ||
fruits["Grapes"] = 3; | fruits["Grapes"] = 3; | ||
cout << "Apple -> " << fruits["Apple"] << endl; | cout << "Apple -> " << fruits["Apple"] << endl; | ||
} | } | ||
</pre> | </pre> | ||
Line 147: | Line 147: | ||
== Java Collection Framework (JCF) == | == Java Collection Framework (JCF) == | ||
=== Architecture of JCF === | === Architecture of JCF === | ||
The | The image below shows the interface hierarchy in the collections framework in Java. | ||
[[File: | [[File:archi1.jpg]] | ||
''Interface Hierarchy in JCF | ''Interface Hierarchy in JCF'' | ||
The interfaces map and collection are both distinct in the hierarchy in Java. This distinction is because of the way maps and collection are used in Java libraries. | The interfaces map and collection are both distinct in the hierarchy in Java. This distinction is because of the way maps and collection are used in Java libraries. | ||
The collections framework in Java consists of: | The collections framework in Java consists of: [http://download.oracle.com/javase/1.4.2/docs/guide/collections/overview.html] | ||
* '''''Collection Interfaces''''' : These are needed for representing different types of collections, like sets, lists and maps. | * '''''Collection Interfaces''''' : These are needed for representing different types of collections, like sets, lists and maps. | ||
Line 240: | Line 240: | ||
</pre> | </pre> | ||
== Comparison between STL and | == Comparison between STL, JCF and .NET Collections == | ||
Prior to Java 2, arrays, vectors and HashTable had different syntaxes for accessing members. For example, the arrays uses square bracket symbol [ ], the HashTable uses 'get and put' method and as for Vectors, it is the elementAt method. This caused a lot of inconsistency meaning programmers were using different methods to implement their own collections. Also, prior to Java 2, the classes lacked a central and unifying theme | Prior to Java 2, arrays, vectors and HashTable had different syntaxes for accessing members. For example, the arrays uses square bracket symbol [ ], the HashTable uses 'get and put' method and as for Vectors, it is the elementAt method. This caused a lot of inconsistency meaning programmers were using different methods to implement their own collections. Also, prior to Java 2, the classes lacked a central and unifying theme. | ||
The | There are many advantages of Java Collections framework compared to frameworks such as those in C++( standard template library). | ||
The Quality of Implementation of the C++ compiler has a large impact on usability of STL. First, the error messages involving templates are very long and hard to interpret. Then, if used without proper care, STL templates can lead to code bloat. The Java Collections Framework make programming easy by providing many useful data structures and algorithms so that the programmer does not have to write them. The performance of a program is increased considerably because of the implementation of these data structures and algorithms because the various implementations of each interface can be interchanged. Collections have much better performance compared to Vectors and HashTable in the old version. The entire standard collections framework is designed around a set of standard interfaces and several standard implementations such a LinkedList, HashSet and TreeSet are provided in these interfaces so the programmers can use them or also create their own. | |||
The | Collections in the .NET framework do have many advantages compared to those like C++( standard template library). However, the simplicity of programming using .NET framework, and in particular the collections in .NET are hard to implement compared to JCF. | ||
The table below highlights the differences between the C++ Standard Template Library | The architecture of .NET collections is very similar to that of C++(STL), and all collections are inherited from an abstract base class. Error messages in .NET collections are very difficult and hard to interpret. The garbage collector does exist in the .NET framework, however it does have significant performance issues compared to the other two. | ||
.NET doesn't have some of the collections that are present in other frameworks; for instance Multiset and Multimap are both present in JCF, but not in the .NET collections framework. .NET collections do not use iterators like in C++(STL) or JCF. | |||
The table below highlights the differences [http://www.dmh2000.com/cjpr] between the C++ Standard Template Library, the Java Collection Framework, and Collections in the .NET framework | |||
{| class="wikitable" style="font-size: 100%; text-align: left; width: auto;" | {| class="wikitable" style="font-size: 100%; text-align: left; width: auto;" | ||
|- | |- | ||
Line 252: | Line 255: | ||
! Standard Template Library | ! Standard Template Library | ||
! Java Collections Framework | ! Java Collections Framework | ||
! .NET Collections Framework | |||
|- | |- | ||
| '''Simplicity of Programming''' | | '''Simplicity of Programming''' | ||
| Harder | | Harder | ||
| Relatively Easy | | Relatively Easy | ||
| Hard | |||
|- | |- | ||
| '''Architecture of Collections''' | | '''Architecture of Collections''' | ||
| All are inherited from the base class | | All are inherited from the base class | ||
| Maps and Collections are two separate interfaces | | Maps and Collections are two separate interfaces | ||
| All collections are inherited from the same abstract base class | |||
|- | |- | ||
| '''Error Messages''' | | '''Error Messages''' | ||
| Long, Hard to interpret | | Long, Hard to interpret | ||
| Easier to interpret | | Easier to interpret | ||
| Not intuitive | |||
|- | |- | ||
| '''Memory : Resource Management''' | | '''Memory : Resource Management''' | ||
| Has to be explicitly done by programmer | | Has to be explicitly done by programmer | ||
| Taken care of by Garbage Collector | | Taken care of by Garbage Collector | ||
| There is garbage collection; some performance issues exist | |||
|- | |- | ||
| '''Level of Abstraction''' | | '''Level of Abstraction''' | ||
| Less | | Less | ||
| Higher level of abstraction | | Higher level of abstraction | ||
| High level of Abstraction | |||
|- | |- | ||
| '''Types of Collections''' | | '''Types of Collections''' | ||
Line 287: | Line 296: | ||
* Maps | * Maps | ||
Supports 14 collection interfaces | Supports 14 collection interfaces | ||
| | |||
* Array | |||
* Array List | |||
* Hashtable | |||
Supports 10 collections | |||
|} | |} | ||
Line 292: | Line 306: | ||
= Built-in Collections vs Class Library Collections = | = Built-in Collections vs Class Library Collections = | ||
In this section, we will describe collections in a built in language (like Ruby), and compare it to collections that use class libraries (like STL and | In this section, we will describe collections in a built in language (like Ruby [http://www.ruby-lang.org/en/]), and compare it to collections that use class libraries (like STL, JCF and .NET Collections). | ||
== Collections in Ruby : An overview == | == Collections in Ruby : An overview == | ||
In Ruby, we have '''''containers''''' which are entities that contain other entities. | In Ruby, we have '''''containers''''' which are entities that contain other entities. | ||
The two native containers in Ruby are : | The two native containers [http://www.rubycentral.com/pickaxe/tut_containers.html] in Ruby are : | ||
* '''''Arrays''''', which can hold a collection of object references | * '''''Arrays''''', which can hold a collection of object references | ||
* '''''Hashes''''', which are an indexed collection of object references | * '''''Hashes''''', which are an indexed collection of object references | ||
Line 325: | Line 339: | ||
== Comparison of Ruby Collections and Collections as Class Libraries == | == Comparison of Ruby Collections and Collections as Class Libraries == | ||
When comparing the runtime of Java and Ruby, Java is way faster than the latter. But since Ruby, a dynamically typed language, has built-in lists/ arrays , hashes, there is no need for the programmer to have to build all the infrastructure. This is a win situation for Ruby when compared to Java's collections. Thus, the built-in arrays and hashes are a huge advantage over JCF. Ruby has a mark-and-sweep garbage collection which allows programmers the ability to code without having to worry about the need to maintain reference counts in extension libraries. Also, if an operating system allows for it, Ruby can dynamically load extension libraries. | When comparing the runtime of Java and Ruby, Java is way faster than the latter. But since Ruby, a dynamically typed language, has built-in lists/ arrays , hashes, there is no need for the programmer to have to build all the infrastructure. This is a win situation for Ruby when compared to Java's collections. Thus, the built-in arrays and hashes are a huge advantage over JCF. Ruby has a mark-and-sweep garbage collection which allows programmers the ability to code without having to worry about the need to maintain reference counts in extension libraries. Also, if an operating system allows for it, Ruby can dynamically load extension libraries. [http://www.ruby-lang.org/en/] | ||
The table below throws light on the difference noticed between Ruby and those that use collections as class libraries like JCF and STL, from a programmer's point of view : | The table below throws light on the difference noticed between Ruby and those that use collections as class libraries like JCF and STL, from a programmer's point of view : | ||
Line 334: | Line 348: | ||
! Ruby Collections | ! Ruby Collections | ||
! Collections as Class Libraries | ! Collections as Class Libraries | ||
! .NET Collections | |||
|- | |- | ||
| '''Language Design''' | | '''Language Design''' | ||
| Uses Interpreter | | Uses Interpreter | ||
| Uses Compiler | | Uses Compiler | ||
| The compiler produces executable (.exe) files, dynamic-link libraries (.dll), or code modules (.netmodule). | |||
|- | |- | ||
| '''Dynamically Typed''' | | '''Dynamically Typed''' | ||
| Yes | | Yes | ||
| No, statically typed | | No, statically typed | ||
| Dynamic typing possible with dynamic keyword. The dynamic type enables the operations in which it occurs to bypass compile-time type checking. | |||
|- | |- | ||
| '''Lines of Code''' | | '''Lines of Code''' | ||
| Lesser - Collections included in Language | | Lesser - Collections included in Language | ||
| More - Have to include JCF | | More - Have to include collections frameworks like STL/JCF | ||
| Not significantly reduced as in collections included in language. | |||
|- | |- | ||
| '''Learning Curve''' | | '''Learning Curve''' | ||
| Relatively easy to learn and use | | Relatively easy to learn and use | ||
| Comparatively steeper learning curve | | Comparatively steeper learning curve | ||
| Curve steeper compared to collections built into language. | |||
|- | |- | ||
| '''Coding Efficiency''' | | '''Coding Efficiency''' | ||
| Higher, and is more readable and understandable | | Higher, and is more readable and understandable | ||
| Lower | | Lower | ||
| Less compared to built in collection languages. | |||
|- | |- | ||
| '''Programmer's Psychology''' | | '''Programmer's Psychology''' | ||
| Programmer feels its a part of language, and starts using it | | Programmer feels its a part of language, and starts using it | ||
| Programmer feels something external is being used; uses it only when its very much needed | | Programmer feels something external is being used; uses it only when its very much needed | ||
| Programmer does feel that something external is used. | |||
|- | |||
| '''Time Efficiency''' | |||
| Doesn't use Libraries; takes lesser time as language is built-in with these features | |||
| Libraries are loaded and compiled each time code runs; takes longer | |||
| Longer time taken since libraries have to be loaded and compiler has to compile each time code runs. | |||
|} | |} | ||
The table above shows a comparison of languages having collections built-in to them (like Ruby), compared to those that use collections as class libraries. The table does show that though each one does have its own advantages, there are a lot of advantages of having it built into the language compared to using libraries. | The table above shows a comparison of languages having collections built-in to them (like Ruby), compared to those that use collections as class libraries. The table does show that though each one does have its own advantages, there are a lot of advantages of having it built into the language (like in Ruby) compared to using libraries (STL or JCF). | ||
= References = | = References = | ||
[1] Systems Architecture & Software Group, Smalltalk Collections : http://www.ifi.uzh.ch/richter/Classes/oose2/01_Collections/03_smalltalk/03_smalltalk | [[#References|[1]]] Systems Architecture & Software Group, Smalltalk Collections : http://www.ifi.uzh.ch/richter/Classes/oose2/01_Collections/03_smalltalk/03_smalltalk.html | ||
[ | [[#References|[2]]] Programming Ruby - The Pragmatic Programmer's Guide : http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_containers.html | ||
[ | [[#References|[3]]] Associative Arrays in Perl : http://www.comp.leeds.ac.uk/Perl/associative.html | ||
[ | [[#References|[4]]] Simple Collections in Smalltalk : http://mainline.brynmawr.edu/Courses/cs246/spring2002/SmallTalk/Collections1.html | ||
[ | [[#References|[5]]] Ten Reasons for a Collections Framework : http://sibirjak.com/osflash/blog/why-we-need-a-collection-framework-in-actionscript/ | ||
[ | [[#References|[6]]] SGI : http://www.sgi.com/tech/stl/stl_introduction.html | ||
[ | [[#References|[7]]] Linux Journal: http://www.linuxjournal.com/article/2468 | ||
[ | [[#References|[8]]] Oracle : http://download.oracle.com/javase/1.4.2/docs/guide/collections/overview.html | ||
[ | [[#References|[9]]] SGI : http://www.sgi.com/tech/stl/hash_map.html | ||
[ | [[#References|[10]]] A modest STL Tutorial : http://www.cs.brown.edu/~jak/proglang/cpp/stltut/tut.html | ||
[ | [[#References|[11]]] Java Collections Framework - Wikipedia :http://en.wikipedia.org/wiki/Java_collections_framework | ||
[ | [[#References|[12]]] Standard Template Library : http://en.wikipedia.org/wiki/Standard_Template_Library | ||
[ | [[#References|[13]]] The Ruby Programming Language : http://www.ruby-lang.org/en/ | ||
[ | [[#References|[14]]] Containers, Blocks and Iterators - Programming Ruby : The Pragmatic Programmer's Guide : http://www.rubycentral.com/pickaxe/tut_containers.html |
Latest revision as of 03:36, 26 September 2011
Collections Framework : Cover the history of Collections frameworks back to the late 1970s or early 1980s, and how they were retrofitted to various languages. Discuss the advantage of a standard Collections framework (e.g., Java) vs. several competing frameworks (e.g., C++), and the advantage of collections built into the language (e.g., Ruby) vs. collections as class libraries. Give examples to illustrate the advantages you identify.
Introduction
In this wiki, we are going to talk about collections frameworks : their history, the advantages of standard collections frameworks (like Java and .NET framework), compared to that of C++, the advantages of collections built into languages like Ruby compared to those having collections as class libraries. We will illustrate the advantages and compare the differences with some examples.
What is a collections framework?
Firstly, let us define a collection , and a collection's framework.
A collection can be defined as a group of objects, and is essentially a container that groups multiple elements into a single unit.
In computer science, a collection is grouping of data items that need to be operated together, and are similar in some sense.
In computer science, usually the data items that comprise a collection are all of similar type, or are all inherited from a common ancestor.
Since a collection is just a group of objects, some examples of a collection are : a stack of books, a dictionary, a list of names, a telephone directory. In computer science, examples of collections are : lists, sets, bags, trees, graphs, associative arrays, etc.
A collections framework is a system to represent and manipulate these collections.
A collections framework would thus comprise of various forms of collections (essentially data structures, like mentioned in the computer science definition of a collection) present together that are used by the programmer. A collections framework would thus consist of ways to implement collections like sets, trees, associative arrays, lists, etc., easily.
Need for a collections framework
There are many needs [1] for a collections framework. Some of them are :
- It solves the problem of storing and retrieving data
- It makes sure we write lesser code, and the code is more readable
- It improves the performance, and makes sure the programmer doesn't have to bother with issues related to performance
- Makes sure we have reusable data structures
These reasons led to the development of many collections frameworks.
History
In the 70's and 80's, a number of languages had collections built into them, unlike how it is now in say C++ where libraries have to be included for performing tasks on certain collections. Some of these languages are Smalltalk, Perl, Scheme, Miranda, and Haskell. Just to give a brief idea, and because Ruby is loosely based on Smalltalk and Perl, below we have given a short note on Smalltalk and Perl.
Smalltalk
Smalltalk was object-oriented and was one of them that had collections. Collections in Smalltalk could be as indexed or non-indexed, indexed by explicit or implicit keys, ordered using an index or unordered, and fixed size or expandable. [2] Smalltalk has Arrays, and the Array class is a subclass of ArrayedCollection. An array can hold any type of object, and each entry in the array can be a different object. The code snippet below creates an array object in Smalltalk :
a _ Array new: 5
And to place an object at a specific location in the array :
a at: 3 put: object
a at: 2 returns the object at position 2. Very similar to arrays, OrderedCollection is also used. The difference between them both is that all objects of Array type are fixed size, while those of OrderedCollection can be of varying size. Collections that hold key value pairs were done using Dictionaries, as shown by the code snippet below :
| dict | dict := Dictionary new. dict at: 'Apple' put: 'Fruit'. dict at: 'Onion' put: 'Vegetable'. dict at: 'Pork' put: 'Meat'.
Then, in order to get the value for key 'Onion' we simply say :
dict at 'Onion'
which gives us 'Vegetable'.
Perl
Perl is a very powerful scripting language, and has collections as arrays and associative arrays. Arrays in Perl are defined with an @ sign.
@mylist = ("Tom", 2);
$mylist[0] has the value "Tom", and $mylist[1] has the value 2. In Perl, associative arrays [3] can be used for hashing, and are defined with a % sign.
%months = ( "January" , 1, "April" , 4, "November", 11 );
In the example above, $months("April") would return the value 4.
Collections Frameworks today
Alexander Stepanov was the biggest contributor to the designing of Standard Template Library [4].In late 1970's he started working on Generic Programming. At that time there was not much support for Generic Programming in any Computer programming languages. A language called Ada was able to provide some amount of support. By late 1980's Alexander Stepanov and David Musser had worked on the Ada library and later developed and also published it. By that time, programmers were using C++ and it was more widely used and thought to provide a good support for generic programming even though it was still considered a new language. Stephanov realized that the model of computation in C and C++ had a lot of flexibility and turned to C++. After a lot of research and experimentation at AT&T and HP laboratories, Stepanov presented his ideas at ANSI/ISO meeting. With some more experimentation and collaboration, in 1994, the STL (Standard Template Library) became a part of the ANSI/ISO standard definition of the C++ language.
As mentioned before, to address the need for reusable collection data structures, several independent frameworks were developed. Doug Lea's collections package (released in 1995) was the first widely used collection. In 1996, a company called Object Space developed and released the Java Generic Library ,JGL, with the main goal of being consistent with C++ Standard Template Library.
The Java collections framework was first introduced in 1998, with Java 2 platform, Standard edition. version 1.2. Before collections framework was introduced, grouping of objects in Java was through the use of arrays, Vector and HashTable.
There are many other collections frameworks : the .NET framework [5] has collection classes too. Microsoft started development on the .NET Framework in the late 90s under the name of Next Generation Windows Services (NGWS). In 2000 the first beta versions of .NET 1.0 were released. In this article, we will be comparing the C++ Standard Template Library with Java Collections Framework and Collections in .NET, and Ruby to compare and analyze the advantages and disadvantages of each. However, since frameworks are quite similar, we will be talking in detail about JCF over .NET. The differences and similarities provided, however compare JCF, STL, collections in .NET and in Ruby.
C++'s Standard Template Library vs Java Collections Framework
Before we talk about the difference between the Standard Template Library and Java Collections Framework, given below is the overview of each of them (in terms of architecture, types of collections, and examples).
C++ Standard Template Library (STL)
The C++ Standard Template Library (STL) is a powerful and versatile collection of classes and functions that provides an efficient, lightweight, and extensible framework for application development [6]. STL also offers a sophisticated level of abstraction that promotes the use of generic data structures and algorithms.The Standard Template Library, is a C++ library of container classes, algorithms, and iterators. The STL provides many of the basic algorithms and data structures of computer science. It can be called a Generic library because most of the components in the STL are templates.
Architecture of STL
The diagram below gives an idea of the architecture of the Standard Template Library in STL :
Types of Collections
The STL (Standard Template Library) is organized around three fundamental components namely containers - to hold objects, algorithms ( perform manipulation of elements in containers) and iterators. In addition, supporting them there are three additional components: allocators, adaptors and function objects.
- Containers manage collections of objects.
There are a variety of containers, and each offers unique advantages.
The STL Architecture shown above has the following components as collections :
- Vectors
- Lists
- Deque
- Set
- Map
- Multiset
- Multimap
- Algorithms process elements in containers, or more specifically, in ranges. The STL offers a wide variety of algorithms to search, sort, and operate on elements in containers.
- Iterators are generalized pointer-like objects that are used to step through the elements of a container.
Iterators can work on an entire collection of objects or on a subset of objects.
STL distinguishes general algorithms from the more specialized data and methods encapsulated by ordinary abstract data types. Through this, complex and powerful algorithms can be implemented independently of the data to which they are applied, allowing for generalization and reuse of these algorithms.
Examples
As an example, first let us consider Vectors.
In C++, a typical vector implementation consists of a pointer referring to a dynamically allocated array, and data members holding the capacity and size of the vector. The size of the vector refers to the number of elements, where as the capacity refers to the size of the internal array. As new elements are inserted, and if the new size of the vector becomes larger than its capacity, reallocation occurs. This causes the vector to allocate a new location for storage followed by moving the previously held elements to a new location of storage, and freeing the old space. The example below illustrates this.
#include <vector> int main() { std::vector<int> a(1); // Creating a Vector. int& first = *a.begin(); // Making a reference to the first element. a.insert(v.end(), a.capacity(), 0); // Adding more elements to the Vector thereby forcing re-allocation. }
Now, let us talk about Hashmaps. The hash_map [7] class uses an associative array in which the key is stored according to some hashing scheme. Each one has a key and a corresponding value associated to it. Hashing is very useful when we want the time complexity of searching in O(1), or a constant time order.
#include <hash_map> int main() { hash_map<const char*, int, hash<const char*>, eqstr> fruits; fruits["Apple"] = 1; fruits["Orange"] = 2; fruits["Grapes"] = 3; cout << "Apple -> " << fruits["Apple"] << endl; }
Java Collection Framework (JCF)
Architecture of JCF
The image below shows the interface hierarchy in the collections framework in Java.
Interface Hierarchy in JCF
The interfaces map and collection are both distinct in the hierarchy in Java. This distinction is because of the way maps and collection are used in Java libraries.
The collections framework in Java consists of: [8]
- Collection Interfaces : These are needed for representing different types of collections, like sets, lists and maps.
- General-purpose Implementations : They are are the most commonly used implementations, designed for day to day use.
- Legacy Implementations: The collection classes that were a part of Java 1.1, namely Vector and Hashtable, are retrofitted to implement the collection interfaces that are mentioned above.
- Special-purpose Implementations : These are implementations designed for special situations usage.
- Concurrent Implementations : These are implementations designed for highly concurrent use.
- Wrapper Implementations : These implementations add functionality, like synchronization, to other implementations.
- Convenience Implementations : These are High-performance implementations which are in a way smaller implementations of the collection interfaces.
- Abstract Implementations : These are partial implementations of the collection interfaces required to facilitate custom implementations.
- Algorithms : These are static methods that perform useful functions on collections.
- Infrastructure : These are interfaces that provide essential support for all of the collection interfaces.
- Array Utilities : These are utility functions for arrays of primitives and reference objects.
Types of Collections
Most collections in Java are derived from the java.util.Collection interface.
The main types of collections in Java are :
- Lists : There are two classes, java.util.LinkedList and java.util.ArrayList that implement the java.util.List interface in the JCF framework. Lists in Java are ordered, and may contain duplicates.
- Sets : This is defined by the java.util.Set interface. It is implemented by java.util.HashSet, etc. A set cannot have duplicates by definition, and it doesn't have an order.
- Maps : The java.util.Map interface is used to define maps. Maps are data structures that associate or map a key to a value. Hashmaps are used as they make the time of searching for a key in O(1), a constant time complexity order.
Examples
As an example, let us consider the List interface. It is part of the java.util package. List interface can add value elements by add(value) method.
import java.util.Iterator; import java.util.List; import java.util.ArrayList; public class MyList { public static void main(String[] args) { List<String> ml=new ArrayList<String>(); ml.add("Milk"); ml.add("Eggs"); ml.add("Cereal"); Iterator it=ml.iterator(); while(it.hasNext()) { String output=(String)it.next(); System.out.println("My Grocery List :"+output); } } }
As expected, the output of the above code snippet would simply print out the List of Groceries :
My Grocery List : Milk My Grocery List : Eggs My Grocery List : Cereal
HashMaps:
import java.util.HashMap; public class MyHashMap { public static void main(String args[]) { HashMap MyFirstHashMap = new HashMap(); hashMap.put("First", new Integer(1)); // add value into HashMap hashMap.put("Second", new Integer(2)); } }
Comparison between STL, JCF and .NET Collections
Prior to Java 2, arrays, vectors and HashTable had different syntaxes for accessing members. For example, the arrays uses square bracket symbol [ ], the HashTable uses 'get and put' method and as for Vectors, it is the elementAt method. This caused a lot of inconsistency meaning programmers were using different methods to implement their own collections. Also, prior to Java 2, the classes lacked a central and unifying theme.
There are many advantages of Java Collections framework compared to frameworks such as those in C++( standard template library). The Quality of Implementation of the C++ compiler has a large impact on usability of STL. First, the error messages involving templates are very long and hard to interpret. Then, if used without proper care, STL templates can lead to code bloat. The Java Collections Framework make programming easy by providing many useful data structures and algorithms so that the programmer does not have to write them. The performance of a program is increased considerably because of the implementation of these data structures and algorithms because the various implementations of each interface can be interchanged. Collections have much better performance compared to Vectors and HashTable in the old version. The entire standard collections framework is designed around a set of standard interfaces and several standard implementations such a LinkedList, HashSet and TreeSet are provided in these interfaces so the programmers can use them or also create their own.
Collections in the .NET framework do have many advantages compared to those like C++( standard template library). However, the simplicity of programming using .NET framework, and in particular the collections in .NET are hard to implement compared to JCF. The architecture of .NET collections is very similar to that of C++(STL), and all collections are inherited from an abstract base class. Error messages in .NET collections are very difficult and hard to interpret. The garbage collector does exist in the .NET framework, however it does have significant performance issues compared to the other two. .NET doesn't have some of the collections that are present in other frameworks; for instance Multiset and Multimap are both present in JCF, but not in the .NET collections framework. .NET collections do not use iterators like in C++(STL) or JCF. The table below highlights the differences [9] between the C++ Standard Template Library, the Java Collection Framework, and Collections in the .NET framework
Difference | Standard Template Library | Java Collections Framework | .NET Collections Framework |
---|---|---|---|
Simplicity of Programming | Harder | Relatively Easy | Hard |
Architecture of Collections | All are inherited from the base class | Maps and Collections are two separate interfaces | All collections are inherited from the same abstract base class |
Error Messages | Long, Hard to interpret | Easier to interpret | Not intuitive |
Memory : Resource Management | Has to be explicitly done by programmer | Taken care of by Garbage Collector | There is garbage collection; some performance issues exist |
Level of Abstraction | Less | Higher level of abstraction | High level of Abstraction |
Types of Collections |
|
Supports 14 collection interfaces |
Supports 10 collections |
Clearly from the table above, the advantages of Java Collections Framework compared to the Standard Template Library is seen in terms of various parameters.
Built-in Collections vs Class Library Collections
In this section, we will describe collections in a built in language (like Ruby [10]), and compare it to collections that use class libraries (like STL, JCF and .NET Collections).
Collections in Ruby : An overview
In Ruby, we have containers which are entities that contain other entities. The two native containers [11] in Ruby are :
- Arrays, which can hold a collection of object references
- Hashes, which are an indexed collection of object references
We can also create our own list structure in Ruby. The example below illustrates the syntax of how Arrays can hold a collection of object references. Initialization is not a issue in Ruby as its dynamically typed.
arr = ["Onion", 1.414, "Tomato"] a = ["Rob", arr, 9]
As shown above, arr is an array of different objects, and so is a. The section below gives a clearer view of this, with the outputs after each step.
arr = ["Onion", 1.414, "Tomato"] =>["Onion", 1.414, "Tomato"] arr.type => Array arr.length => 3 arr[1] => 1.414 a = ["Rob", arr, 9] => ["Rob", ["Onion", 1.414, "Tomato"], 9]
Hashing can also be done very easily in Ruby, without the need for any reference libraries.
food = {'Apple' => 'Fruit', 'Onion' => 'Vegetable', 'Pork' => 'Meat'}
So food['Onion'] gives us the value corresponding to key 'Onion', which is 'Vegetable'. In Java when one has a nested methods on collections, the code is easier to read than in Ruby when one has to nested operators on collections within a single line. The nested operator code becomes very difficult to read.
Comparison of Ruby Collections and Collections as Class Libraries
When comparing the runtime of Java and Ruby, Java is way faster than the latter. But since Ruby, a dynamically typed language, has built-in lists/ arrays , hashes, there is no need for the programmer to have to build all the infrastructure. This is a win situation for Ruby when compared to Java's collections. Thus, the built-in arrays and hashes are a huge advantage over JCF. Ruby has a mark-and-sweep garbage collection which allows programmers the ability to code without having to worry about the need to maintain reference counts in extension libraries. Also, if an operating system allows for it, Ruby can dynamically load extension libraries. [12]
The table below throws light on the difference noticed between Ruby and those that use collections as class libraries like JCF and STL, from a programmer's point of view :
Difference | Ruby Collections | Collections as Class Libraries | .NET Collections |
---|---|---|---|
Language Design | Uses Interpreter | Uses Compiler | The compiler produces executable (.exe) files, dynamic-link libraries (.dll), or code modules (.netmodule). |
Dynamically Typed | Yes | No, statically typed | Dynamic typing possible with dynamic keyword. The dynamic type enables the operations in which it occurs to bypass compile-time type checking. |
Lines of Code | Lesser - Collections included in Language | More - Have to include collections frameworks like STL/JCF | Not significantly reduced as in collections included in language. |
Learning Curve | Relatively easy to learn and use | Comparatively steeper learning curve | Curve steeper compared to collections built into language. |
Coding Efficiency | Higher, and is more readable and understandable | Lower | Less compared to built in collection languages. |
Programmer's Psychology | Programmer feels its a part of language, and starts using it | Programmer feels something external is being used; uses it only when its very much needed | Programmer does feel that something external is used. |
Time Efficiency | Doesn't use Libraries; takes lesser time as language is built-in with these features | Libraries are loaded and compiled each time code runs; takes longer | Longer time taken since libraries have to be loaded and compiler has to compile each time code runs. |
The table above shows a comparison of languages having collections built-in to them (like Ruby), compared to those that use collections as class libraries. The table does show that though each one does have its own advantages, there are a lot of advantages of having it built into the language (like in Ruby) compared to using libraries (STL or JCF).
References
[1] Systems Architecture & Software Group, Smalltalk Collections : http://www.ifi.uzh.ch/richter/Classes/oose2/01_Collections/03_smalltalk/03_smalltalk.html
[2] Programming Ruby - The Pragmatic Programmer's Guide : http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_containers.html
[3] Associative Arrays in Perl : http://www.comp.leeds.ac.uk/Perl/associative.html
[4] Simple Collections in Smalltalk : http://mainline.brynmawr.edu/Courses/cs246/spring2002/SmallTalk/Collections1.html
[5] Ten Reasons for a Collections Framework : http://sibirjak.com/osflash/blog/why-we-need-a-collection-framework-in-actionscript/
[6] SGI : http://www.sgi.com/tech/stl/stl_introduction.html
[7] Linux Journal: http://www.linuxjournal.com/article/2468
[8] Oracle : http://download.oracle.com/javase/1.4.2/docs/guide/collections/overview.html
[9] SGI : http://www.sgi.com/tech/stl/hash_map.html
[10] A modest STL Tutorial : http://www.cs.brown.edu/~jak/proglang/cpp/stltut/tut.html
[11] Java Collections Framework - Wikipedia :http://en.wikipedia.org/wiki/Java_collections_framework
[12] Standard Template Library : http://en.wikipedia.org/wiki/Standard_Template_Library
[13] The Ruby Programming Language : http://www.ruby-lang.org/en/
[14] Containers, Blocks and Iterators - Programming Ruby : The Pragmatic Programmer's Guide : http://www.rubycentral.com/pickaxe/tut_containers.html