CSC/ECE 517 Fall 2011/ch6 6c p: Difference between revisions
No edit summary |
No edit summary |
||
Line 189: | Line 189: | ||
Concepts have no direct representation in C++, so they are represented only within the documentation for a generic library. The documentation for the SGI Standard Template Library established the de facto standard for documenting concepts. | Concepts have no direct representation in C++, so they are represented only within the documentation for a generic library. The documentation for the SGI Standard Template Library established the de facto standard for documenting concepts. | ||
=====Specifications===== | |||
Valid Expressions are C++ expressions which must compile successfully for the objects involved in the expression to be considered models of the concept. | Valid Expressions are C++ expressions which must compile successfully for the objects involved in the expression to be considered models of the concept. | ||
Revision as of 20:35, 16 November 2011
Generics
Introduction
In a broad definition, generic programming is a style of computer programming in which algorithms are written in terms of to-be-specified-later types that are then instantiated when needed for specific types provided as parameters.
Why Generics?
This approach involves writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplication.
Duplicate code is a computer programming term for a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable.
Problems with code duplication
Avoiding bulk code
Code duplication frequently creates long, repeated sections of code that differ in only a few lines or characters. The length of such routines can make it difficult to quickly understand them. Hence, code duplication is discouraged.
Purpose masking
The repetition of largely identical code sections makes it difficult to identify how they differ from one another, and therefore, identifying the specific purpose of each code section. Often, the only difference is in a parameter value. The best practice in such cases is a reusable subroutine. If the parameter type differs then it is called a generic case.
Update anomalies
Any modification to a redundant piece of code must be made for each duplicate separately which is very tedious. At worst, some locations may be missed, and for example bugs thought to be fixed may persist in duplicated locations for months or years. The best practice here is a code library.
The generics approach was popularized by Ada - programming language. Code set written using generic programming are known as generics. They are used in programming languages like Ada, Java, C++ and Visual Basic .NET. Generics create dynamically high parameterized software which can be difficult to follow.
The term generic programming was originally coined by David Musser and Alexander Stepanov.
Popular Example of Generics
Early examples of this programming approach were implemented in Ada, although the best example is the Standard Template Library (STL) in which iterators are used to access different values of an algorithm and provide them with varying datatype inputs.
The Standard Template Library (STL)
It is the C++'s Standard software library. It provides four components such algorithms, containers, functors, and iterators. We are mainly concerned with the algorithm and iterators. Here, as mentioned above algorithms are accessed with iterators where the values provided through iterators and make the algorithm work with different data types having same functionality.
The STL provides a ready-made set of common classes for C++, such as containers and associative arrays, that can be used with any built-in type and with any user-defined type that supports some elementary operations (such as copying and assignment). STL algorithms are independent of containers, which significantly reduces the complexity of the library.
Understanding Generics
Let us consider an example in STI. A sequence of data structures, e.g. a vector etc., and some algorithms to operate on them, e.g. find, sort etc., a direct approach would implement each algorithm specifically for each data structure which would be highly impossible. In the generic programming approach, each data structure returns a model of an iterator concept (a simple value type which can be dereferenced to retrieve the current value, or changed to point to another value in the sequence) and each algorithm is instead written generically with arguments of such iterators. Thus, reducing the data structure-algorithm combinations need be implemented and extensive reusability.
Iterators are of different types for e.g. forward iterators only provide movement to the next value in a sequence (e.g. suitable for a singly linked list), whereas a random-access iterator also provides direct constant-time access to any element of the sequence (e.g. suitable for a vector). An important point is that a data structure will return a model of the most general concept that can be implemented efficiently—computational complexity requirements are explicitly part of the concept definition.
Here, Arrays and Structs can be viewed as predefined generic types. Every usage of an array or struct type instantiates a new concrete type, or reuses a previous instantiated type. Array element types and struct element types are parameterized types, which are used instantiate the corresponding generic type. All this is usually built-in in the compiler and the syntax differs from other generic constructs.
Generics in Object Oriented Programming Languages
Generic programming first appeared in Ada and was subsequently adopted by many object-oriented languages. Implementation in languages such as Java and C++ are formally based on the notion of parametricity.
Generics
Define Common Block { Code statements }
Program { Call Common Block // For different data types and values. }
Example of generic programming[1]
Let us consider the memcpy() function to copy a set of array values and then generalize it using generics approach.
void* memcpy(void* region1, const void* region2, size_t n) { const char* first = (const char*)region2; const char* last = ((const char*)region2) + n; char* result = (char*)region1; while (first != last) *result++ = *first++; return result; }
The memcpy() function is used to copy arrays of different kinds of data. Now considering the case were the data we would like to copy is not in an array? Perhaps it is in a linked list. We can generalize a copy function to any sequence of elements? Looking at the body of memcpy(), the function's minimal requirements are that it needs to traverse through a sequence using pointer to access elements, write them to the destination, and compare pointers to know when to stop. The C++ standard library groups requirements such as these into concepts, in this case the Input Iterator concept (for region2) and the Output Iterator concept (for region1).
Generic approach code
template <typename InputIterator, typename OutputIterator> OutputIterator copy(InputIterator first, InputIterator last, OutputIterator result) { while (first != last) *result++ = *first++; return result; }
Using the above generic copy() function, we can now copy elements from any kind of sequence, including a linked list that exports iterators such as std::list.
#include <list> #include <vector> #include <iostream>
int main() { const int N = 3; std::vector<int> region1(N); std::list<int> region2;
region2.push_back(1); region2.push_back(0); region2.push_back(3); std::copy(region2.begin(), region2.end(), region1.begin());
for (int i = 0; i < N; ++i) std::cout << region1[i] << " "; std::cout << std::endl; }
1) Generics in ADA
In a generic a parameter is a value, a variable, a constant, a type, a subprogram, or even an instance of another, designated, generic unit that is used in the generic routine. For generic formal types, the syntax distinguishes between discrete, floating-point, fixed-point, access (pointer) types, etc. Some formal parameters can have default values.
To instantiate a generic unit, the programmer passes actual parameters for each formal. The generic instance then behaves just like any other unit. It is possible to instantiate generic units at run-time, for example inside a loop.
One of the ways in which the Ada language supports this characteristic is by means of generic units.
The generic here uses a package which contains a general operation to be performed.
The specification of a generic package[2]
generic type Element_T is private; -- Generic formal type parameter procedure Swap (X, Y : in out Element_T);
procedure Swap (X, Y : in out Element_T) is Temporary : constant Element_T := X; begin X := Y; Y := Temporary; end Swap;
The Swap subprogram is said to be generic. The subprogram specification is preceded by the generic formal part consisting of the reserved word generic followed by a list of generic formal parameters which may be empty. The entities declared as generic are not directly usable, it is necessary to instantiate them.
To use the generic we instantiate by
procedure Instance_Swap is new Swap (Float); procedure Instance_Swap is new Swap (Integer); procedure Instance_Swap is new Swap (Element_T => Stack_T);
Here, the parameters passed have different data types making the generic widely usable.
A generic using a sub program
It is possible to pass a subprogram as a parameter to a generic. The generic specifies a generic formal subprogram, complete with parameter list and return type (if the subprogram is a function). The actual must match this parameter profile. It is not necessary that the names of parameters match, though.
generic type Element_T is private; with function "*" (X, Y: Element_T) return Element_T; function Square (X : Element_T) return Element_T;
Function description
function Square (X: Element_T) return Element_T is begin return X * X; -- The formal operator "*". end Square;
Usage of the generic
with Square; with Matrices; procedure Matrix_Example is function Square_Matrix is new Square (Element_T => Matrices.Matrix_T, "*" => Matrices.Product); A : Matrices.Matrix_T := Matrices.Identity; begin A := Square_Matrix (A); end Matrix_Example;
To instantiate a generic unit, use the keyword new
function Square_Matrix is new Square (Element_T => Matrices.Matrix_T, "*" => Matrices.Product);
Advantages
The language syntax allows precise specification of constraints on generic formal parameters. For example, it is possible to specify that a generic formal type will only accept a modular type as the actual. It is also possible to express constraints between generic formal parameters; for example:
generic type Index_Type is (<>); -- must be a discrete type type Element_Type is private; -- can be any nonlimited type type Array_Type is array (Index_Type range <>) of Element_Type;
In this example, Array_Type is constrained by both Index_Type and Element_Type. When instantiating the unit, the programmer must pass an actual array type that satisfies these constraints.
All instances of a generic being exactly the same, it is easier to review and understand programs written by others; there are no "special cases" to take into account.
Limitations
Ada does not allow specialized generic instances, unlike C++ and requires that all generics be instantiated explicitly but all instantiations being explicit, there are no hidden instantiations that might make it difficult to understand the program.
Ada does not permit "template metaprogramming", because it does not allow specializations.
2) Generics in C++[4]
When creating containers of objects it is possible to write specific implementations for each data type contained, even if the code is virtually identical except for different data types. In C++, this duplication of code can be circumvented by defining a template class. C++ has can support Generic Programming very well through this template system.
Concepts have no direct representation in C++, so they are represented only within the documentation for a generic library. The documentation for the SGI Standard Template Library established the de facto standard for documenting concepts.
Specifications
Valid Expressions are C++ expressions which must compile successfully for the objects involved in the expression to be considered models of the concept.
Associated Types are types that are related to the modeling type in that they participate in one or more of the valid expressions. Typically associated types can be accessed either through typedefs nested within a class definition for the modeling type, or they are accessed through a traits class.
Invariants are run-time characteristics of the objects that must always be true, that is, the functions involving the objects must preserve these characteristics. The invariants often take the form of pre-conditions and post-conditions.
Complexity Guarantees are maximum limits on how long the execution of one of the valid expressions will take, or how much of various resources its computation will use.
Tem plate<typename T> class swap (T x, T y) { T temp; temp =y; y=x; x=temp; return x }
Instantiation is done by calling as an ordinary function
Swap_x << swap(5, 10); // outputs 10
The compiler examines the arguments used to call swap and determines that this is a call to swap(int, int), it then instantiates a version of the function where the parameterizing type T is int.
This works whether the arguments x and y are integers, strings, or any other type for which the expression x interchanges y is sensible. Common inheritance is not needed for the set of types that can be used, and so it is very similar to duck typing. In the context of a comprehensive library like the STL it allows the programmer to get extensive functionality for a new data type, just by defining a few operators for it. Merely defining < allows a type to be used with the standard sort(), stable_sort(), and binary_search() algorithms or to be put inside data structures such as sets, heaps, and associative arrays. C++ templates are completely type safe at compile time.
A powerful feature of C++'s templates is template specialization. This allows alternative implementations to be provided based on certain characteristics of the parameterized type that is being instantiated. Template specialization has two purposes: to allow certain forms of optimization, and to reduce code bloat.
Advantages
Both macros and templates are expanded at compile time. Macros are always expanded inline; templates can also be expanded as inline functions when the compiler deems it appropriate. Thus both function-like macros and function templates have no run-time overhead.
#define max(a,b) ((a) < (b) ? (b) : (a))
However, templates are generally considered an improvement over macros for these purposes. Templates are type-safe.
Limitations
Here are three primary drawbacks to the use of templates: compiler support, poor error messages, and code bloat. Many compilers historically have poor support for templates, thus the use of templates can make code somewhat less portable. Support may also be poor when a C++ compiler is being used with a linker which is not C++-aware, or when attempting to use templates across shared library boundaries. Most modern compilers however now have fairly robust and standard template support, and the new C++ standard, C++0x, is expected to further address these issues.
Almost all compilers produce confusing, long, or sometimes unhelpful error messages when errors are detected in code that uses templates.[10] This can make templates difficult to develop.
Finally, the use of templates requires the compiler to generate a separate instance of the templated class or function for every permutation of type parameters used with it. (This is necessary because types in C++ are not all the same size, and the sizes of data fields are important to how classes work.) So the indiscriminate use of templates can lead to code bloat, resulting in excessively large executables.[citation needed] However, judicious use of template specialization can dramatically reduce such code bloat in some cases. The extra instantiations generated by templates can also cause debuggers to have difficulty working gracefully with templates.
3) Generics in JAVA
Generics are a facility of generic programming that was added to the Java programming language in 2004 as part of J2SE 5.0. They allow "a type or method to operate on objects of various types while providing compile-time type safety. “A common use of this feature is when using a Java Collection that can hold objects of any type, to specify the specific type of object stored in it.
Specifications
A type variable is an unqualified identifier. Type variables are introduced by generic class declarations, generic interface declarations, generic method declarations, and by generic constructor declarations.
A class is generic if it declares one or more type variables. A generic class declaration defines a set of parameterized types, one for each possible invocation of the type parameter section. All of these parameterized types share the same class at runtime.
An interface is generic if it declares one or more type variables. A generic interface declaration defines a set of types, one for each possible invocation of the type parameter section. All parameterized types share the same interface at runtime.
A method is generic if it declares one or more type variables. These type variables are known as the formal type parameters of the method. The form of the formal type parameter list is identical to a type parameter list of a class or interface.
A constructor can be declared as generic, independently of whether the class of the constructor is declared in is itself generic. A constructor is generic if it declares one or more type variables.
Example of generic classes and methods in JAVA[3]
public class Entry<K, V> { private final K mKey; private final V mValue; public Entry(K k,V v) { mKey = k; mValue = v; } public K getKey() { return mKey; } public V getValue() { return mValue; } public String toString() { return "(" + mKey + ", " + mValue + ")"; } }
Note that the above class is for illustration of Java generics only.
This generic class can be used in the following way:
Entry<String, String> grade440 = new Entry<String, String>("mike", "A"); Entry<String, Integer> marks440 = new Entry<String, Integer>("mike", 100); System.out.println("grade: " + grade440); System.out.println("marks: " + marks440);
A generic method using the generic class above:
public static <T> Entry<T,T> twice(T value) { return new SimpleImmutableEntry<T,T>(value, value); }
In many cases the user of the method need not indicate the type parameters, as they can be inferred:
Entry<String, String> pair = twice("Hello");
The parameters can be explicitly added if needed:
Entry<String, String> pair = this.<String>twice("Hello");
Note that you cannot use native types, ex:
Entry<int, int> pair; // this fails. You have to use Integer instead.
There is also the possibility to create generic methods based on given parameters.
public <T> T[] toArray(T... elements) { return elements; }
In such cases you can't use native types either, ex:
Integer[] array = toArray(1,2,3,4,5);