CSC/ECE 517 Fall 2011/ch6 6c sm

From Expertiza_Wiki
Revision as of 06:06, 26 November 2011 by Yxue4 (talk | contribs)
Jump to navigation Jump to search

Generics

Introduction

Generic programming is a method of programming that has been defined from many different angles to solve a fundamental problem, writing general purpose algorithms that will work with a variety of system or user defined types of variables. The ultimate goal of Generic programming is to avoid code duplication and provide compile-time type checking, thereby providing run-time type safety.

The idea was first brought by Ada in 1983 to reduce code duplication. Then later after, more and more languages joined the team. Gnenerics is used in Java, C#, F#, and Visual Basic .NET, in C++, we can achive generic programming by using templates.

For example, the concept of a “linked list” is pervasive throughout software design, with each node holding an object that may range from a simple int to a complex structure or an instantiated object. In each case, the underlying “list” needs to know how to handle the object at its node. Consider the following C++ class:

class Point {
   
  public:
  Point();
  Point(int x, int y, int z);
  void setCoord(int x, int y, int z);
  // Other functions
  
  private:
    int x;
    int y;
    int z;
};

Perhaps this “point” representation is the basis of a larger project that will use thousands of “points” to plot some meaningful data or provide the basis of more complex analysis. In order to do this a list of points can be created by first creating the concept of a list node::

class PointNode  {
    
    public:
    PointNode();
    void setPoint(int x, int y, int z);
    // Other functions
    
    private:
      Point *_next;
};

And a list wrapper class to manage them all:

class PointList  {
    
    public:
    PointList();
    void add(Point &a);
    void remove(Point &a);
    int pointTotal();
    // Other functions necessary
    private:
      PointNode *head;
      int count;
};

Now suppose that in addition to “point” data with three dimensions, there is another class of data dealing with rectangular areas that can be arranged in a list and processed. Consider the new data type:

class Rectangle  {
    
    public:
    RectangularArea(Point p1, Point p2, Point p3);
    // Other accessors, etc...
    
    private:
      Point p1, p2, p3;
};

Since Rectangle is a different type, the Point list will not accept Rectangle objects. The above list code could be copied and modified to deal with the concept of this new “Node Type” to provide the list functionality, which would allow us to accomplish our goal in the short run, but this leads to the following problem:

For each basic algorithmic function (sort, iteration, etc) A and each datatype (int, bool, user-defined types) T, there are a total of A*T implementations that need to be written.

Generic Programming is a method of programming that reuses the implementation of “List” and other concepts and abstracts away the notion of “type”.

Consider the following two lists that take advantage of the C++ STL:

list<Point> plist;
list<Rectangle> rlist;

That’s it! The above two lines each instantiated a “list” of the appropriate type. Given this flexibility, only A + T implementations need to be written, one implementation for the underlying algorithm (list in this case) and one for the underlying data type.

Generics/Templates in Java, C# and C++

C#

In C#, the following can be declared:

List<Point> plist = new List<Point>();

The compiler will prevent anything that is not a “Point” from being added to this list. The C# compiler uses JIT compilation to build the specific code needed for this list, which is similar to writing a special “behind the scenes” list class to handle Points. The benefit of this is that it increases the execution speed, there is no behind the scenes casting, and Reflectioncan be used to tell that this list contains only “Point” objects.

Java

In Java, the following can be declared:

ArrayList<Point> plist = new ArrayList<Point>();

The declaration looks the same, but the underlying implementation is different. The compiler will still prevent any non-Point objects from being added to the list, but instead of the compiler generating a special class for Points like in C#, Java uses the ArrayList to build the list and performs “Type Erasure” (The compiler will insert the proper casts in the compiled byte-code and “throw away” the type of the argument, treating them all as type Object). Basically, the Java compiler creates “bridge methods” that will perform casts similar to this -- Point p1 = (Point) plist.get(1); Since the casting is still being done, the performance hits are still there. The benefit is that existing code that knows how to manipulate ArrayList can manipulate the list of “Points”. One drawback is that (beside the performance penalty of casting) is that the type being erased prevents Reflection, the code can’t tell that the ArrayList contains “Points”, only that it is an ArrayList.

C++

In C++, the following can be declared:

list<Point> *plist = new list<Point>();

Once again, the syntax is similar to both Java and C#, but the implementation is very different. Both C# and Java produce code that is consumed by a Virtual Machine (VM), which does the translation to specific hardware instructions. C++, however, produces actual binary code (ie. actual x86 instructions). Everything in C++ is not an object and there is no underlying VM that needs to know about the “Point” class. Because of this, C++ does not restrict what can be done with templates. For example, in C# and Java, the compiler needs to know what methods are available for a class so that it can communicate this to the VM, which is accomplished through the implementation of interfaces. For example:

void averageXcoord<T>( T one, T two ) { return ( T.getX() + T.getX() ) / 2; }

This code will not compile in Java or C# because the compiler can’t tell which type T actually has the getX() method. In order to get it to compile, an interface has to be created, implemented and then declared in the function declaration. C++ takes a different path, if both objects during compile time have a getX() method defined, then the code will compile, if they don’t then it will throw a compile-time error.


Advantage of Using Generics

There are several important reasons why we should use generics. The most important reason is that the type safety is easily achived by using Generics. Programmers don't need to do multiple implementations and overiding all the mothods. All they need to do is createing a generic type or method and specify it when they call it. This also significantly reduces the duplication, and improves the quality of the code. Generics can also improve the perfoemance of the code. Since you don't need to box the value types, it is usually faster to manipulate the data types in generic programming. There are also some limitation of using generics, but there is hardly any actual disavantage.


Language Support

Many languages support some implementation of Generic Programming. The following list is not intended to be exhaustive:

References