CSC/ECE 517 Fall 2012/ch1b 1w47 sk

From Expertiza_Wiki
Revision as of 19:22, 27 September 2012 by Skanaka (talk | contribs) (→‎References)
Jump to navigation Jump to search

Introduction

Generic Programming is a programming paradigm for developing efficient, reusable software libraries. The Generic Programming process focuses on finding commonality among similar implementations of the same algorithm, then providing suitable abstractions so that a single, generic algorithm can cover many concrete implementations. This process is repeated until the generic algorithm has reached a suitable level of abstraction, where it provides maximal re-usability while still yielding efficient, concrete implementations. The abstractions themselves are expressed as requirements on the parameters to the generic algorithm.

In the simplest definition, generic programming is a style of computer programming in which algorithms are written in terms of to-be-specified-later types that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by Ada in 1983, permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplication. Such software entities are known as generics in Ada, Eiffel, Java, C#, F#, and Visual Basic .NET; parametric polymorphism in ML, Scala and Haskell (the Haskell community also uses the term "generic" for a related but somewhat different concept); templates in C++ and D; and parameterized types in the influential 1994 book Design Patterns.

The term generic programming was originally coined by David Musser and Alexander Stepanov in a more specific sense than the above, to describe an approach to software decomposition whereby fundamental requirements on types are abstracted from across concrete examples of algorithms and data structures and formalized as concepts, analogously to the abstraction of algebraic theories in abstract algebra. Early examples of this programming approach were implemented in Scheme and Ada, although the best known example is the Standard Template Library (STL) in which is developed a theory of iterators which is used to decouple sequence data structures and the algorithms operating on them.

Generics in Object Oriented Languages

C#

Java

Generics were introduced in Java from JDK 1.5. Generics allows the abstraction over types. The most common examples are container types, such as those in the Collection hierarchy. Here is a typical usage of that sort:

List myIntList = new LinkedList(); // 1
myIntList.add(new Integer(0)); // 2
Integer x = (Integer) myIntList.iterator().next(); // 3

Typically, the programmer knows what kind of data has been placed into a particular list. However, the cast is essential. The compiler can only guarantee that an Object will be returned by the iterator. To ensure the assignment to a variable of type Integer is type safe, the cast is required. Of course, the cast not only introduces clutter. It also introduces the possibility of a run time error, since the programmer might be mistaken. It would be better if programmers could actually express their intent, and mark a list as being restricted to contain a particular data type. This is the core idea behind generics. Here is a version of the program fragment given above using generics:

List<Integer> myIntList = new LinkedList<Integer>(); // 1’
myIntList.add(new Integer(0)); //2’
Integer x = myIntList.iterator().next(); // 3’

Notice the type declaration for the variable myIntList. It specifies that this is not just an arbitrary List, but a List of Integer, written List<Integer>. We say that List is a generic interface that takes a type parameter - in this case, Integer. We also specify a type parameter when creating the list object. The other thing to pay attention to is that the cast is gone on line 3’. Now, you might think that all we’ve accomplished is to move the clutter around. Instead of a cast to Integer on line 3, we have Integer as a type parameter on line 1’. However, there is a very big difference here. The compiler can now check the type correctness of the program at compile-time. When we say that myIntList is declared with type List<Integer>, this tells us something about the variable myIntList, which holds true wherever and whenever it is used, and the compiler will guarantee it. In contrast, the cast tells us something the programmer thinks is true at a single point in the code. The net effect, especially in large programs, is improved readability and robustness.

Here is a small excerpt from the definitions of the interfaces List and Iterator in package java.util:

public interface List<E> {
void add(E x);
Iterator<E> iterator();
}
public interface Iterator<E> {
E next();
boolean hasNext();
}

This should all be familiar, except for the stuff in angle brackets. Those are the declarations of the formal type parameters of the interfaces List and Iterator. Type parameters can be used throughout the generic declaration, pretty much where you would use ordinary types.

A generic type declaration is compiled once and for all, and turned into a single class file, just like an ordinary class or interface declaration. Type parameters are analogous to the ordinary parameters used in methods or constructors. Much like a method has formal value parameters that describe the kinds of values it operates on, a generic declaration has formal type parameters. When a method is invoked, actual arguments are substituted for the formal parameters, and the method body is evaluated. When a generic declaration is invoked, the actual type arguments are substituted for the formal type parameters.

Generics and Subtyping

Let’s test our understanding of generics. Is the following code snippet legal?

List<String> ls = new ArrayList<String>(); //1
List<Object> lo = ls; //2

Line 1 is certainly legal. The trickier part of the question is line 2. This boils down to the question: is a List of String a List of Object. Most people’s instinct is to answer: “sure!”. Well, take a look at the next few lines:

lo.add(new Object()); // 3
String s = ls.get(0); // 4: attempts to assign an Object to a String!

Here we’ve aliased ls and lo. Accessing ls, a list of String, through the alias lo, we can insert arbitrary objects into it. As a result ls does not hold just Strings anymore, and when we try and get something out of it, we get a rude surprise. The Java compiler will prevent this from happening of course. Line 2 will cause a compile time error.

In general, if Foo is a subtype (subclass or subinterface) of Bar, and G is some generic type declaration, it is not the case that G<Foo> is a subtype of G<Bar>. This is probably the hardest thing you need to learn about generics, because it goes against our deeply held intuitions. The problem with that intuition is that it assumes that collections don’t change. Our instinct takes these things to be immutable. For example, if the department of motor vehicles supplies a list of drivers to the census bureau, this seems reasonable. We think that a List<Driver> is a List<Person>, assuming that Driver is a subtype of Person. In fact, what is being passed is a copy of the registry of drivers. Otherwise, the census bureau could add new people who are not drivers into the list, corrupting the DMV’s records. In order to cope with this sort of situation, it’s useful to consider more flexible generic types. The rules we’ve seen so far are quite restrictive.

Wildcards

Consider the problem of writing a routine that prints out all the elements in a collection. Here’s how you might write it in an older version of the language:

void printCollection(Collection c) {
Iterator i = c.iterator();
for (k = 0; k < c.size(); k++) {
System.out.println(i.next());
}}
And here is a naive attempt at writing it using generics (and the new for loop syntax):
void printCollection(Collection<Object> c) {
for (Object e : c) {
System.out.println(e);
}}

The problem is that this new version is much less useful than the old one. Whereas the old code could be called with any kind of collection as a parameter, the new code only takes Collection<Object>, which, as we’ve just demonstrated, is not a supertype of all kinds of collections! So what is the supertype of all kinds of collections? It’s written Collection<?> (pronounced “collection of unknown”) , that is, a collection whose element type matches anything. It’s called a wildcard type for obvious reasons. We can write:

void printCollection(Collection<?> c) {
for (Object e : c) {
System.out.println(e);
}}

and now, we can call it with any type of collection. Notice that inside printCollection(), we can still read elements from c and give them type Object. This is always safe, since whatever the actual type of the collection, it does contain objects. It isn’t safe to add arbitrary objects to it however:

Collection<?> c = new ArrayList<String>();
c.add(new Object()); // compile time error

Since we don’t know what the element type of c stands for, we cannot add objects to it. The add() method takes arguments of type E, the element type of the collection. When the actual type parameter is ?, it stands for some unknown type. Any parameter we pass to add would have to be a subtype of this unknown type. Since we don’t know what type that is, we cannot pass anything in. The sole exception is null, which is a member of every type. On the other hand, given a List<?>, we can call get() and make use of the result. The result type is an unknown type, but we always know that it is an object. It is therefore safe to assign the result of get() to a variable of type Object or pass it as a parameter where the type Object is expected.

Advantages and Disadvantages

These are there pretty much for syntactic sugar. They are implemented through a controversial decision called type erasure. All they really do is prevent you from having to cast a whole lot, which makes them safer to use. Performance is identical to making specialized classes, except in cases where you are using what would have been a raw data type (int, float, double, char, bool, short). In these cases, the value types must be boxed to their corresponding reference types (Integer, Float, Double, Char, Bool, Short), which has some overhead. Memory usage is identical, since the JRE is just performing the casting in the background (which is essentially free). Java also has some nice type covariance and contravariance, which makes things look much cleaner than not using them.

C++

In C++ generics are called templates.Function templates are special functions that can operate with generic types. This allows us to create a function template whose functionality can be adapted to more than one type or class without repeating the entire code for each type.

In C++ this can be achieved using template parameters. A template parameter is a special kind of parameter that can be used to pass a type as argument: just like regular function parameters can be used to pass values to a function, template parameters allow to pass also types to a function. These function templates can use these parameters as if they were any other regular type.

The format for declaring function templates with type parameters is:

template <class identifier> function_declaration;
template <typename identifier> function_declaration;

The only difference between both prototypes is the use of either the keyword class or the keyword typename. Its use is indistinct, since both expressions have exactly the same meaning and behave exactly the same way. For example, to create a template function that returns the greater one of two objects we could use:

template <class myType>
myType GetMax (myType a, myType b) {
 return (a>b?a:b);
}

Here we have created a template function with myType as its template parameter. This template parameter represents a type that has not yet been specified, but that can be used in the template function as if it were a regular type. As you can see, the function template GetMax returns the greater of two parameters of this still-undefined type.

To use this function template we use the following format for the function call:

function_name <type> (parameters);

For example, to call GetMax to compare two integer values of type int we can write:

int x,y;
GetMax <int> (x,y);

When the compiler encounters this call to a template function, it uses the template to automatically generate a function replacing each appearance of myType by the type passed as the actual template parameter (int in this case) and then calls it. This process is automatically performed by the compiler and is invisible to the programmer.

Here is the entire example:

// function template
#include <iostream>
using namespace std;

template <class T>
T GetMax (T a, T b) {
  T result;
  result = (a>b)? a : b;
  return (result);
}

int main () {
  int i=5, j=6, k;
  long l=10, m=5, n;
  k=GetMax<int>(i,j);
  n=GetMax<long>(l,m);
  cout << k << endl;
  cout << n << endl;
  return 0;
}

In this case, we have used T as the template parameter name instead of myType because it is shorter and in fact is a very common template parameter name. But you can use any identifier you like. In the example above we used the function template GetMax() twice. The first time with arguments of type int and the second one with arguments of type long. The compiler has instantiated and then called each time the appropriate version of the function.

As you can see, the type T is used within the GetMax() template function even to declare new objects of that type:

 
T result;

Therefore, result will be an object of the same type as the parameters a and b when the function template is instantiated with a specific type.

In this specific case where the generic type T is used as a parameter for GetMax the compiler can find out automatically which data type has to instantiate without having to explicitly specify it within angle brackets (like we have done before specifying <int> and <long>). So we could have written instead:

int i,j;
GetMax (i,j);

Since both i and j are of type int, and the compiler can automatically find out that the template parameter can only be int. This implicit method produces exactly the same result:

// function template II
#include <iostream>
using namespace std;

template <class T>
T GetMax (T a, T b) {
  return (a>b?a:b);
}

int main () {
  int i=5, j=6, k;
  long l=10, m=5, n;
  k=GetMax(i,j);
  n=GetMax(l,m);
  cout << k << endl;
  cout << n << endl;
  return 0;
}

Notice how in this case, we called our function template GetMax() without explicitly specifying the type between angle-brackets <>. The compiler automatically determines what type is needed on each call.

Because our template function includes only one template parameter (class T) and the function template itself accepts two parameters, both of this T type, we cannot call our function template with two objects of different types as arguments:

int i;
long l;
k = GetMax (i,l); 

This would not be correct, since our GetMax function template expects two arguments of the same type, and in this call to it we use objects of two different types.

We can also define function templates that accept more than one type parameter, simply by specifying more template parameters between the angle brackets. For example:

template <class T, class U>
T GetMin (T a, U b) {
  return (a<b?a:b);
}

In this case, our function template GetMin() accepts two parameters of different types and returns an object of the same type as the first parameter (T) that is passed. For example, after that declaration we could call GetMin() with:

int i,j;
long l;
i = GetMin<int,long> (j,l);

or simply:

 
i = GetMin (j,l);

even though j and l have different types, since the compiler can determine the appropriate instantiation anyway.

Class templates We also have the possibility to write class templates, so that a class can have members that use template parameters as types. For example:

template <class T>
class mypair {
    T values [2];
  public:
    mypair (T first, T second)
    {
      values[0]=first; values[1]=second;
    }
};

The class that we have just defined serves to store two elements of any valid type. For example, if we wanted to declare an object of this class to store two integer values of type int with the values 115 and 36 we would write:

mypair<int> myobject (115, 36); 

this same class would also be used to create an object to store any other type:

 
mypair<double> myfloats (3.0, 2.18); 

The only member function in the previous class template has been defined inline within the class declaration itself. In case that we define a function member outside the declaration of the class template, we must always precede that definition with the template <...> prefix:

// class templates
#include <iostream>
using namespace std;

template <class T>
class mypair {
    T a, b;
  public:
    mypair (T first, T second)
      {a=first; b=second;}
    T getmax ();
};

template <class T>
T mypair<T>::getmax ()
{
  T retval;
  retval = a>b? a : b;
  return retval;
}

int main () {
  mypair <int> myobject (100, 75);
  cout << myobject.getmax();
  return 0;
}

Notice the syntax of the definition of member function getmax:

template <class T>
T mypair<T>::getmax () 

Confused by so many T's? There are three T's in this declaration: The first one is the template parameter. The second T refers to the type returned by the function. And the third T (the one between angle brackets) is also a requirement: It specifies that this function's template parameter is also the class template parameter.

Template specialization If we want to define a different implementation for a template when a specific type is passed as template parameter, we can declare a specialization of that template.

For example, let's suppose that we have a very simple class called mycontainer that can store one element of any type and that it has just one member function called increase, which increases its value. But we find that when it stores an element of type char it would be more convenient to have a completely different implementation with a function member uppercase, so we decide to declare a class template specialization for that type:

// template specialization
#include <iostream>
using namespace std;

// class template:
template <class T>
class mycontainer {
    T element;
  public:
    mycontainer (T arg) {element=arg;}
    T increase () {return ++element;}
};

// class template specialization:
template <>
class mycontainer <char> {
    char element;
  public:
    mycontainer (char arg) {element=arg;}
    char uppercase ()
    {
      if ((element>='a')&&(element<='z'))
      element+='A'-'a';
      return element;
    }
};

int main () {
  mycontainer<int> myint (7);
  mycontainer<char> mychar ('j');
  cout << myint.increase() << endl;
  cout << mychar.uppercase() << endl;
  return 0;
}

This is the syntax used in the class template specialization:

 
template <> class mycontainer <char> { ... };

First of all, notice that we precede the class template name with an empty template<> parameter list. This is to explicitly declare it as a template specialization.

But more important than this prefix, is the <char> specialization parameter after the class template name. This specialization parameter itself identifies the type for which we are going to declare a template class specialization (char). Notice the differences between the generic class template and the specialization:

template <class T> class mycontainer { ... };
template <> class mycontainer <char> { ... };

The first line is the generic template, and the second one is the specialization.

When we declare specializations for a template class, we must also define all its members, even those exactly equal to the generic template class, because there is no "inheritance" of members from the generic template to the specialization.

Non-type parameters for templates Besides the template arguments that are preceded by the class or typename keywords , which represent types, templates can also have regular typed parameters, similar to those found in functions. As an example, have a look at this class template that is used to contain sequences of elements:

// sequence template
#include <iostream>
using namespace std;

template <class T, int N>
class mysequence {
    T memblock [N];
  public:
    void setmember (int x, T value);
    T getmember (int x);
};

template <class T, int N>
void mysequence<T,N>::setmember (int x, T value) {
  memblock[x]=value;
}

template <class T, int N>
T mysequence<T,N>::getmember (int x) {
  return memblock[x];
}

int main () {
  mysequence <int,5> myints;
  mysequence <double,5> myfloats;
  myints.setmember (0,100);
  myfloats.setmember (3,3.1416);
  cout << myints.getmember(0) << '\n';
  cout << myfloats.getmember(3) << '\n';
  return 0;
}
100
3.1416 

It is also possible to set default values or types for class template parameters. For example, if the previous class template definition had been:

template <class T=char, int N=10> class mysequence {..};

We could create objects using the default template parameters by declaring:

 
mysequence<> myseq;

Which would be equivalent to:

 
mysequence<char,10> myseq;

Advantages and Disadvantages

These actually generate different classes based on the input type. An std::vector<int> is a completely different class than an std::vector<float>. There is no support for covariance or contravariance, but there is support for passing non-types to templates, partial template specialization. They basically allow you to do whatever you want.

However, since C++ templates create different classes for every variation of their template parameters, the size of the compiled executable is larger. Beyond that, compilation time increases greatly, since all template code must be included with each compilation unit and much more code must be generated. However, actual runtime memory footprint is typically smaller than the alternative (frees an extra void*) and performance is better, since the compiler can perform more aggressive optimizations with the known type.

While a generic Java class compiles it's entire self, when using a C++ template, you only compile what you use. So, if you create an std::vector<int> and only use push_back and size, only those functions will be compiled into the object file. This eases the size of executable problem.

Comparing C# and Java Generics

Java's generics implementation was based on a project originally called Pizza, which was done by Martin Odersky and others. Pizza was renamed GJ, then it turned into a JSR and ended up being adopted into the Java language. And this particular generics proposal had as a key design goal that it could run on an unmodified VM [Virtual Machine]. It is, of course, great that you don't have to modify your VM, but it also brings about a whole bunch of odd limitations. The limitations are not necessarily directly apparent, but you very quickly go, "Hmm, that's strange."

For example, with Java generics, you don't actually get any of the execution efficiency that I talked about, because when you compile a generic class in Java, the compiler takes away the type parameter and substitutes Object everywhere. So the compiled image for List<T> is like a List where you use the type Object everywhere. Of course, if you now try to make a List<int>, you get boxing of all the ints. So there's a bunch of overhead there. Furthermore, to keep the VM happy, the compiler actually has to insert all of the type casts you didn't write. If it's a List of Object and you're trying to treat those Objects as Customers, at some point the Objects must be cast to Customers to keep the verifier happy. And really all they're doing in their implementation is automatically inserting those type casts for you. So you get the syntactic sugar, or some of it at least, but you don't get any of the execution efficiency. So that's issue number one I have with Java's solution.

Issue number two, and I think this is probably an even bigger issue, is that because Java's generics implementation relies on erasure of the type parameter, when you get to runtime, you don't actually have a faithful representation of what you had at compile time. When you apply reflection to a generic List in Java, you can't tell what the List is a List of. It's just a List. Because you've lost the type information, any type of dynamic code-generation scenario, or reflection-based scenario, simply doesn't work. If there's one trend that's pretty clear to me, it's that there's more and more of that. And it just doesn't work, because you've lost the type information. Whereas in our implementation, all of that information is available. You can use reflection to get the System.Type for object List<T>. You cannot actually create an instance of it yet, because you don't know what T is. But then you can use reflection to get the System.Type for int. You can then ask reflection to please put these two together and create a List<int>, and you get another System.Type for List<int>. So representationally, anything you can do at compile time you can also do at runtime.

Comparing C# Generics to C++ Templates

To me the best way to understand the distinction between C# generics and C++ templates is this: C# generics are really just like classes, except they have a type parameter. C++ templates are really just like macros, except they look like classes.

The big difference between C# generics and C++ templates shows up in when the type checking occurs and how the instantiation occurs. First of all, C# does the instantiation at runtime. C++ does it at compile time, or perhaps at link time. But regardless, the instantiation happens in C++ before the program runs. That's difference number one. Difference number two is C# does strong type checking when you compile the generic type. For an unconstrained type parameter, like List<T>, the only methods available on values of type T are those that are found on type Object, because those are the only methods we can generally guarantee will exist. So in C# generics, we guarantee that any operation you do on a type parameter will succeed.

C++ is the opposite. In C++, you can do anything you damn well please on a variable of a type parameter type. But then once you instantiate it, it may not work, and you'll get some cryptic error messages. For example, if you have a type parameter T, and variables x and y of type T, and you say x + y, well you had better have an operator+ defined for + of two Ts, or you'll get some cryptic error message. So in a sense, C++ templates are actually untyped, or loosely typed. Whereas C# generics are strongly typed.

Language Support

References

1. http://www.artima.com/intv/genericsP.html
2. http://www.generic-programming.org
3. Generics in the Java Programming Language
4. http://www.cplusplus.com/doc/tutorial/templates
5. http://osl.iu.edu/publications/prints/2003/comparing_generic_programming03.pdf