CSC/ECE 517 Fall 2012/ch1 1w30 rp
A Collection is a data structure representing a group of data items sharing common properties (or state) and behaviour. Generally data items represents object of same type and their behaviour is governed by the set of operation that can be performed on them. An Array also represents a group of related objects but Array are not considered as collection because an array can only hold a fixed number of items however size of collection is variable, or dynamic. In sum, a collection can be viewed as a single object containing multiple numbers of elements. Few collection types include trees, sets, ArrayList, Dictionary etc.
In order to create, manage and manipulate collection, we have collection framework which depicts representation and manipulation of collection elements independent of their implementation details. It defines application programming interfaces (API’s) consisting of classes and interfaces for manipulating data in collection.
Advantages of collection framework
a) Performance Improvement: Framework includes highly efficient implementation of data structures and algorithms to manipulate them.
b) Increase in programmer’s productivity: Framework provides ready to use data structures obviating coding of underlying data structure by programmer.
c) Unrelated API interoperability: It provides a common language through which collection can be passed back and forth.
d) Ease in designing, implementing and learning APIs by providing generic collections APIs.
Elements of collection framework
a) Collection Interfaces: These are the basis of the framework and represents different type of collection such as List, Set, Queue, and SortedSet.
b) Implementations: Classes having implementation of Collection interface, partial implementation of a particular interface in order to facilitate custom behaviour like adding constraints on operations of a particular collection, synchronize data of collection on concurrent use of framework etc.
c) Algorithms: These are static methods providing support of useful functions on collection like sorting, searching, checking equality, manipulating elements etc.
Collection Interfaces and Implementations
As mentioned earlier, collection framework has interfaces of collection. In Java programming language, there are two category of interfaces:
a) Interfaces extending ‘Collection’ interface: The root interface in the collection framework hierarchy is ‘Interface Collection’ and there are several interfaces which extend this interface like: Set, List, Sorted Set, Queue, Deque etc.
b) Interfaces supporting collection-view operations: These interfaces do not extend ‘Collection’ interface however they are used to viewing the group of elements instead of storing a collection. In other terms, we can say that these interfaces represent relation among elements instead of actual collection.
In the following section three major collection interfaces and their implementations are discussed in detail. Different languages provide support for different collections, for instance in ruby arrays and hashes are termed as collections while Java provides a larger base like maps and lists
1. Set Interface: It is a collection which extends the Collection interface and contains elements like the sets in mathematics but elements here are distinct, for example, pages of a book. There are several restrictions on the method like add() method cannot add similar objects, moreover equals() can compare 2 set interfaces even if they have separate implementations. However in order to achieve such comparison implementation of interfaces need to override the hashcode() method so that the hashes are well interspersed and minimal collision occurs. It is relevant to note that the value returned by hashcode() is same for 2 sets thus making them equal.
Implementations of Set Interface
a. Hash Set- It extends set interface and is implemented using a hash table. The result of the Hash Set is not guaranteed to be sorted. The runtime and space complexity of the same is linear and constant. There is an issue with the load factor and the bucket size of the hash set, that is if the amount of load factor is very less the hash Set will iterate over the whole bucket unnecessarily.
b. Tree Set- It extends set interface and is implemented using a tree structure. The result of the Tree Set is sorted unlike Hashset. The runtime complexity is logarithmic due to its dynamic nature. The TreeSet resolves the problem of load factor and bucket size as the tree is always balanced. Though TreeSet seems more efficient but HashSet are more in use.
c. LinkedHashSet- It extends the HashSet implementation. The underlying data structure is a doubly link list. It allows viewing the elements in the order they were inserted that is, in an ordered fashion.
2. The List Interface: It is an ordered collection of elements which extends Collection interface where duplicate elements are allowed to enter. The idea here is to allow iteration over the collection in a position oriented manner. It allows search, add, delete on the collection from the index required.
Implementations of List Interface
The choice of implementation here depends on the need.
a. ArrayList- It implements the List Interface and manages data in the form of a dynamic array. It offers constant time positional access. It works up to the mark, if you wish to randomly access the elements and only manipulate elements at the end.
b. LinkedList- It implements List Interface and manages data in the form of a linkedlist. It should be used if you wish to access elements sequentially and manipulate data from the middle of the list.
3. The Map Interface: The Map interface is different from other interfaces for it doesn’t extend the Collection Interface. It has its own hierarchy. It maintains data in the form of key-value pairs, where key has to be unique, though it does allow null key values. It is also called as associative array. The Map interface implementations also require to override the hashcode() and equals() method so that equality of 2 maps can be substantiated.
Implementations of Map Interface
a. HashMap- They are best to use if the problem at hand needs to insert, delete or locate elements in a map. The result of these maps is unsorted. The underlying data structure is a hash table.
b. TreeMap- Theyare best to use if the problem at hand needs to traverse the map in a sorted manner. It is required that a TreeMap implements the comparable interface and overrides the compareTo() method so that it returns an int stating the status of the compare operation.
Algorithm in Collection Framework
Collection Framework provides support for various algorithms to carry out basic operations with collections. Various collections described above have support for all the basic methods inherited from Collection class which are -add(), addall(), contains(), containsall(), hashcode(). Besides these they provide support for the following algorithms:
1. Sort: Collection Framework provide support for sorting a List using sort( ) methods in the Collections class. For sorting all items of a collection, its element must be comparable to each other.
2. Searching: Collection Framework provides support for searching a List as well as finding the minimum and maximum values within a Collection. Different ways to search a collection includes searching an unsorted list using contains ( ) method of List collection or searching a sorted list using binarySearch( ) method as that is more fast in comparison to contains( ) method.
Apart from searching a particular element within a List, min( ) and max( ) methods of Collections can be used to find minimum and maximum element in an unsorted collection. Note that if the object in the collection does not implement Comparable then you must provide a Comparator which will serve as comparison operator among the elements of a collection.
3. Reorganising: Methods like reverse and shuffle alter the index of elements in the collection.
Implementation of Collection in .NET
C# programming language in Microsoft .NET offers many types of collection like ArrayList, Hashtable, ObservableCollection, Lists. Here is the explanation of how Lists are used and supported in C#.
Lists basically represents a collection or container of objects that can be accessed by a particular index and provides basic operational methods like search, sort, modify. As mentioned earlier, List collection is derived from ICollection interface.
List<T> : IList<T>, ICollection <T> where <T> denotes generic implementation.
Using a List collection
1) In order to add a new object to an existing List demoList we use Add method
demoList.Add('ListItem1') // This will add ListItem1 object to demoList.
2) To delete an element we use Remove method
demoList.RemoveAt(<index>) // Removes element at specified index.
3) To delete all elements of a List, we use Clear method
demoList.Clear() // Removes all elements from the List<T>
4) use Sort() method to sort all elements of the collection.
demoList.Sort() // Sorts all elements using the default comparer. You can provide a particular comparer as
an argument to Sort() method for specific comparison based sorting.