CSC/ECE 517 Fall 2010/ch2 S24 rm

From Expertiza_Wiki
Revision as of 18:19, 22 September 2010 by Whiteopal (talk | contribs)
Jump to navigation Jump to search

"I'd rather write programs that write programs than write programs" - Richard Sites


Introduction

Most metaprogramming is done in dynamic languages, like Ruby. Achieving metaprogramming in static languages directly, becomes complex due to its inherent nature of compile time abstraction verification. However, there are tools and packages for support of metaprogramming in statically typed languages, such as Java that can be leveraged to achieve metaprogramming features.


Metaprogramming

Metaprogramming is a programming technique of writing computer programs that write or manipulate other programs or themselves. In other words, it is a programming technique of writing programs with a higher level of abstraction to make it appear as generative programming.

Metaprogramming involves two kinds of languages. The meta-language is the language in which meta-programs, which construct or manipulate other programs, are written. The object-language is the language of programs being manipulated. This makes them ‘meta level programs’ whose problem domain are other ‘base level programs’.[1]

The ability of a programming language to be its own metalanguage is called reflection or reflexivity.

Simple example of a metaprogram: Let us consider a totally fabricated example for our understanding at very high level. Suppose we need to write a C program that printed the following 500 lines of text with a restriction that the program could not use any kind of loop or goto instruction.

Output expected:

 1 Mississippi
 2 Mississippi
 3 Mississippi
 4 Mississippi
 ...
 499 Mississippi
 500 Mississippi

In C this would be then coded as:

 #include <stdio.h>
 int main(void) {
   printf("1 Mississippi\n");
   printf("2 Mississippi\n");
       -
       -
       -
   printf("499 Mississippi\n");
   printf("500 Mississippi\n");
   return 0;
  }


With the power of a metaprogramming language we can write another program that writes this program automatically.

Ruby code:

 File.open('mississippi.c', 'w') do |output|
  output.puts '#include <stdio.h>'
  output.puts 'int main(void) {'
    1.upto(500) do |i|
      output.puts "    printf(\"#{i} " +
      "Mississippi\\n\");"
  end
  output.puts '    return 0;'
  output.puts '}'
 end

This code creates a file called mississippi.c with the expected 500+ lines of C source code.Here, mississippi.c is the generated code and ruby code is the metaprogram.


Applications of Metaprogramming

Metaprogramming is an attractive technique needed when one needs to alter the behavior of a program at run time. Due to its generative nature, it has numerous applications in program development. It can achieve program development without rewriting boiler-plate code all the time, ensuring efficiency, increasing modularity and minimizing inconsistent implementation errors. Program generators and program analyzers are the two main categories of meta programs. Metaprograms can be compilers, interpreters, type checkers etc. Some commonly used applications include using a program that outputs source code to -

  • generate sine/cosine/whatever lookup tables
  • to extract a source-form representation of a binary file
  • to compile your bitmaps into fast display routines
  • to extract documentation, initialization/finalization code, description tables, as well as normal code from the same source files
  • to have customized assembly code, generated from a perl/shell/scheme script that does arbitrary processing
  • to propagate data defined at one point only into several cross-referencing tables and code chunks.

In many cases, this allows programmers to get more done in the same amount of time as they would take to write all the code manually, or it gives programs greater flexibility to efficiently handle new situations without recompilation.[1]

Typing in Programming Languages

Earlier programming languages [e.g. Assembly] were written such that each machine level function was reflected in the program code. With advancement in programming languages a certain level of abstraction was reached wherein lower level details were abstracted with one functional unit of work and represented by fewer lines of code e.g. primitive variables are represented with higher level abstract classes. With this abstraction arose a need for checking the validity of operations that could be performed with these abstractions in place.

Typing in programming languages is property of operations and variables in the language that ensure that certain kinds of values that are invalid are not used in operations with each other. Errors related to these are known as type errors. Type checking is the process of verifying and enforcing the constraints of types. Compile time type checking also known as static type checking. Run time type checking is known as dynamic type checking. If a language specification requires its typing rules strongly (i.e., more or less allowing only those automatic type conversions which do not lose information), one can refer to the process as strongly typed, if not, as weakly typed.[1] The above classification can be represented as -


Statically Typed Programming Languages

Statically typed languages ensure that a fixed type is assigned by the programmer to every variable and parameter. Thus, every expression type can be deduced and type checked during compilation. Static languages try to fix most errors during compile time and strive to minimize failures during run time. Due to this there are many type constraints on the programmer while coding. At run time, the program uses the classes that it has been given and in this way statically typed languages make distinctions between what happens at compile time and what happens at run time. Examples of statically typed languages are C, C++, Java, C#.

Dynamically Typed Programming Languages

In dynamically typed languages, the variables and parameters do not have a designated type and may take different values at different times. In all the operations, the operands must be type checked at runtime just before performing the operation. Dynamically typed languages don’t need to make a distinction between classes created at compile time and classes provided. It is possible to define classes at run time and in fact, classes are always defined at run time. These eliminate many developer constraints by avoiding the need of book keeping, declarations etc. Due to this flexibility these languages make an ideal candidate for prototyping and are widely used in agile development environments. However, dynamic languages are known to have performance issues. Static languages have code optimization features at compile time, but dynamic languages allow runtime code optimizations only. [2] In dynamically typed languages, the interpreter deduces type and type conversions, this makes development time faster, but it also can provoke runtime failures. These runtime failures are caught early on during compile time for statically typed languages. Examples of dynamically typed languages include Perl, Python, JavaScript, PHP, Ruby, Groovy.

Metaprogramming in statically typed languages

In safety languages [syntactically verbose], metaprogramming is not a standard feature, it can however be achieved. Also, static typing in meta-programs has a number of advantages. In addition to guaranteeing that the meta-program encounters no type-errors while manipulating object-programs, a statically typed metaprogramming language can also guarantee that any of the object-programs generated by the meta-program are also type-correct. A disadvantage of these type system is that (in case of meta-programming languages with weaker type systems) they sometime may be too restrictive in object-programs that the programmer is allowed to construct.


Techniques and Packages

Many language features can be leveraged to achieve some form of characteristics needed to achieve metaprogramming. For instance languages that support reflection also allow for dynamic code generation. e.g. In Microsoft .NET Framework use of System.Reflection.Emit namespace is used to generate types and methods at runtime. [3]

Reflection

Reflection is a valuable language feature to facilitate metaprogramming. Reflection is defined as the ability of a programming language to be its own meta-language. Thus, reflection is writing programs that manipulate other programs or themselves. e.g. In Java, reflection enables to discover information about the loaded classes:

  • Fields,
  • Methods and constructors
  • Generics information
  • Metadata annotations

It also enables to use these metaobjects to their instances in run time environment. E.g. Method.invoke(Object o, Object… args) With the Java reflection API, you can interrogate an object to get all sorts of information about its class.

Consider the following simple example:

 public class HelloWorld {
   public void printName() {
     System.out.println(this.getClass().getName());
   }
 }

The line

 (new HelloWorld()).printName();

sends the string HelloWorld to standard out. Now let x be an instance of HelloWorld or one of its subclasses. The line

 x.printName();

sends the string naming the class to standard out.

The printName method examines the object for its class (this.getClass()). In doing so, the decision of what to print is made by delegating to the object's class. The method acts on this decision by printing the returned name. Without being overridden, the printName method behaves differently for each subclass than it does for HelloWorld. The printName method is flexible; it adapts to the class that inherits it, causing the change in behavior. Simple example to attain flexibility using reflection. [15 example] xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Annotations

Annotations are a metaprogramming facility that allow the code to be marked with defined tags. Many APIs require a fair amount of boilerplate code ie. code that can be reused in new contexts or applications without being changed much from the original. This boilerplate could be generated automatically by a tool if the program were “decorated” with annotations indicating which methods were remotely accessible. Metadata provided using annotations is beneficial for documentation, compiler checking, and code analysis. One can use this metadata to indicate if methods are dependent on other methods, if they are incomplete, if a certain class must reference another class, and so on. It is used by the compiler to perform some basic compile-time checking. For example there is a override annotation that lets you specify that a method overrides another method from a superclass. At this, the Java compiler will ensure that the behavior you indicate in your metadata actually happens at a code level as well.

An “annotation” has an “annotation type” associated with it which is used for defining it. It is used when you want to create a custom annotation. The type is the actual construct used, and the annotation is the specific usage of that type. An annotation type definition takes an "at" (@) sign, followed by the interface keyword plus the annotation name. On the other hand, an annotation takes the form of an "at" sign (@), followed by the annotation type [4].

Example to Define an Annotation (Annotation type)

 public @interface MyAnnotation {
   String doSomething();}
  Example to Annotate Your Code (Annotation)
  MyAnnotation (doSomething="What to do")
  public void mymethod() {
  ....
  }

Annotation Types

There are three annotation types:

  • Marker: Marker type annotations have no elements, except the annotation name itself.

Example:

 public @interface MyAnnotation {
 } 

Usage:

 @MyAnnotation
 public void mymethod() {
  ....
 }
  • Single-Element: Single-element, or single-value type, annotations provide a single piece of data only. This can be represented with a data=value pair or, simply with the value (a shortcut syntax) only, within parenthesis.

Example:

 public @interface MyAnnotation
 {
   String doSomething();
 } 

Usage:

 @MyAnnotation ("What to do")
 public void mymethod() {
  ....
 } 
  • Full-value or multi-value: Full-value type annotations have multiple data members. Therefore, you must use a full data=value parameter syntax for each member.

Example:

 public @interface MyAnnotation {
   String doSomething();
   int count; String date();
 } 

Usage:

 @MyAnnotation (doSomething="What to do", count=1,
             date="09-09-2005")
 public void mymethod() {

Generics

Generic programming is a style of computer programming in which algorithms are written in terms of to-be-specified-later types that are then instantiated when needed for specific types provided as parameters. [5] Java Generics are primarily a way for library authors to write something once, which users can customize to their own types. Generics allow the creation of classes and methods that work in the same way on different types of objects. The term "generic" comes from the idea that we'd like to be able to write general algorithms that can be broadly reused for many types of objects rather than having to adapt our code to fit each circumstance.

Generics add a way to specify concrete types to general purpose classes and methods that operated on object before. A Java collection is a flexible data structure that can hold heterogeneous objects where the elements may have any reference type. When you take an element out of a collection, you must cast it to the type of element that is stored in the collection. Generics provides a way for you to communicate the type of a collection to the compiler, so that it can be checked. Once the compiler knows the element type of the collection, the compiler can check that the collection has been used consistently and can insert the correct casts on values being taken out of the collection.[5] Generics are implemented by type erasure: generic type information is present only at compile time, after which it is erased by the compiler. The main advantage of this approach is that it provides total interoperability between generic code and legacy code that uses non-parametrized types (which are technically known as raw types).

Consider a non-generic example:

// Removes 4-letter words from c. Elements must be strings

  static void expurgate(Collection c)  
  {    
       for (Iterator i = c.iterator(); i.hasNext(); )      
       if (((String) i.next()).length() == 4)        
       i.remove();
   }

Here is the same example modified to use generics:

// Removes the 4-letter words from c

 static void expurgate(Collection<String> c) 
 {    
   for (Iterator<String> i = c.iterator(); i.hasNext(); )
   if (i.next().length() == 4)
   i.remove();
 }

The declaration above reads as “Collection of String c.” Collection is a generic class that takes ‘String’ as its type parameter. The code using generics is clearer and safer. Unsafe cast and a number of extra parentheses have been eliminated. More importantly, we have moved part of the specification of the method from a comment to its signature, so the compiler can verify at compile time that the type constraints are not violated at run time.[6]


Template Metaprogramming

Template Metaprogramming is a generic programming technique that uses extremely early binding. A primary requirement for a metaprogramming language is providing high-level abstractions to hide the internal representation of base programs. Each template language is specific for a base language and is generated from it. In this sense, a language of templates is a superset of the base language. Thus templates are abstractions that encapsulate a program pattern written by example. This concept has been explained in detail in section 3.2.5.

Packages

MetaJ

MetaJ is another package that supports metaprogramming in the Java language. A MetaJ program is a Java program that uses MetaJ components. Accessing patterns by example inside ordinary Java programs is a major feature of MetaJ programming. MetaJ combines templates and reflection for Java metaprogramming. MetaJ embodies a set of concepts that are independent of the base language: syntax trees, code references, code iterators and code templates. It defines a framework which supports this independence by isolating the features common to most languages, defining generic operations for them and allows plugging components that are language dependent. Templates are translated to Java classes, so they could be accessed in the metaprogram. MetaJ provides a generic protocol-based self-applicative interpreter for Java. For debugging purposes may require the MOP to provide access to the execution stack. However, because of security concerns stack access must frequently be restricted: in JAVA, for example, it is not allowed to modify the (untyped) stack because security properties essentially rely on type information. It consists of a mixture of plain Java code that will be copied verbatim into the output file, as well as special metaprogramming declarations that will control how the output code is created. [1 - MetaJ] Now Java SDK7 has support for dynamic typing and metaprogramming includes MetaJ implementation.

Javassist (Java Programming Assistant)

Javassist is a Java library providing means to manipulate the Java bytecode of an application. In this sense Javassist provides the support for structural reflection, i.e. the ability to change the implementation of a class at runtime. This is explicit metaprogramming, in which the metalanguage is Java. Javassist is a load-time reflective system for Java. It enables Java programs to define a new class at runtime and to modify a class file when the JVM loads it. Unlike other similar bytecode editors, Javassist provides two levels of API: source level and bytecode level.

If the users use the source-level API, they can edit a class file without knowledge of the specifications of the Java bytecode. They do not have to even write an inserted bytecode sequence; Javassist instead can compile a fragment of source text on line (for example, just a single statement). This ease of use is a unique feature of Javassit against other tools. One can even specify inserted bytecode in the form of source text and Javassist compiles it on the fly. On the other hand, the bytecode-level API allows the users to directly edit a class file as other editors. Thus it makes Java bytecode manipulation simple.

Javassist has the following applications:

  • Aspect Oriented Programming: Javassist can be a good tool for introducing new methods into a class and for inserting before/after/around advice at the both caller and callee sides.
  • Reflection: One of applications of Javassist is runtime reflection; Javassist enables Java programs to use a metaobject that controls method calls on base-level objects. No specialized compiler or virtual machine are needed.
  • Remote method invocation: Another application is remote method invocation. Javassist enables applets to call a method on a remote object running on the web server. Unlike the Java RMI, the programmer does notneed a stub compiler such as rmic; the stub code is dynamically produced by Javassist.

Example: BufferedInputStream fin = new BufferedInputStream(new FileInputStream("Point.class")); ClassFile cf = new ClassFile(new DataInputStream(fin));

A ClassFile object can be written back to a class file. write() in ClassFile writes the contents of the class file to a given DataOutputStream. ClassFile provides addField() and addMethod() for adding a field or a method (note that a constructor is regarded as a method at the bytecode level). It also provides addAttribute() for adding an attribute to the class file.

To examine every bytecode instruction in a method body, CodeIterator is useful. A CodeIterator object allows you to visit every bytecode instruction one by one from the beginning to the end. To otbain this object, do as follows:

    ClassFile cf = ... ;
    MethodInfo minfo = cf.getMethod("move");    // we assume move is not overloaded.
    CodeAttribute ca = minfo.getCodeAttribute();
    CodeIterator i = ca.iterator();

Other extensions are being developed to include reflective systems for C++ and Java e.g. OpenC++ and OpenJava which are extensible preprocessors based on compile-time reflection in C++ and Java respectively.

JRuby

JRuby is a complete implementation of Ruby in Java. The scripting and functional features of the Ruby language can be used by Java developers. Simple metaprogramming techniques can be extended from Ruby so that Java packages are mapped onto to Ruby modules. This would be something like a Ruby-Java Bridge. Since JRuby can be run from any platform with a JVM.

Example: Using JRuby API calling JRuby from Java

import org.jruby.*; public class SimpleJRubyCall {

   public static void main(String[] args) {
       Ruby runtime = Ruby.getDefaultInstance();

runtime.evalScript(“puts 1+2”);

   }

}

With metaprogramming using JRuby one can

  • add methods to class,
  • add instance methods
  • add to have Java classes

Since metaprogramming empowers the programmer to create domain specific languages(DSL), the ones created by JRuby can always leverage Java libraries to build wrapper functionalities. e.g. Simple JRuby DSL on top of HtmlUnit

AspectJ

Another library worth mentioning here is AspectJ. It enforces the aspect oriented programming approach in Java. AOP is a complimentary programming paradigm to object oriented programming and is used to improve the modularity of software systems. Thus, while OOP is great for modeling common behavior on a hierarchy of objects, AOP allows you to define cross-cutting concerns that can be applied across separate, and very different, object models. Hence, applications of AOP on cross cutting concerns include logging, instrumenting and debugging. [7] Aspect Oriented Programming goes a little ahead of just explicit metaprogramming by adding direct semantics for coding crosscutting concerns. Since explicit metaprogramming is so powerful it can transform the code in any way imaginable [e.g. JavaAssist] AOP mechanisms are more limited, and as a result they preclude many incorrect implementations.

AspectJ Approach: AspectJ introduces declaring aspects in the statically typed language Java in the form of annotations, introduced in Java 5. [8] In object-oriented programs like Java, the natural unit of modularity is the class. In AspectJ, aspects modularize concerns that affect more than one class. In addition to classes, AOP gives you aspects. Aspects enable modularization of crosscutting concerns such as transaction management that cut across multiple types and objects.[9] Thus, AspectJ package achieves metaprogramming features with more controllability.

Reflective and aspect-oriented languages have an important similarity: both provide programming support for dealing with crosscutting concerns. In this sense reflective systems proved that independent programming of crosscutting concerns is possible. But the control that reflection provides tends to be low-level and extremely powerful. In contrast, AspectJ provides more carefully controlled power, drawing on the rules learned from object-oriented development to encourage a clean and understandable program structure.

Firstly, an aspect imposes behavior on a class, rather than a class requesting behavior from an aspect. An aspect can modify a class without needing to edit that class. This property is sometimes called reverse inheritance.

Example with concept of weaving The core task of AspectJ's advice weaver is to statically transform a program so that at runtime it will behave according to the AspectJ language semantics. The AspectJ weaver takes class files as input and produces class files as output. The weaving process itself can take place at one of three different times: compile-time, post-compile time, and load-time.

Example

/**

  • The HelloWorld class implements an application that
  • simply displays "Hello World!" on the standard output.
  • /

package helloworld;

class HelloWorld {

public static void main(String[] args) {
   new HelloWorld().printMessage();
}
void printMessage() {
   System.out.println("Hello world!");
}

}


Create a Java source file with an aspect definition

Using a text editor, create a file named Trace.java, in the same directory, with the following code:

/**

  • The Trace aspect injects tracing messages before and after
  • method main of class HelloWorld.
  • /

package helloworld;

aspect Trace of eachobject(instanceof(HelloWorld)) {

pointcut printouts(): receptions(void printMessage());
before(): printouts() {
   System.out.println("*** Entering printMessage ***");
}
after():  printouts() {
   System.out.println("*** Exiting printMessage ***");
}

}


Given those two java files, the compiler intermingles the class and the aspect together -- the aspect is "plugged in". If you don't want tracing, you can simply leave the aspect out. You can feed the class directly to the aspectj compiler No trace of tracing... However, the tracing code is still around in the file Trace.java, and if you ever want/need to use it again, you can just plug it in again.

C++ Template Metaprogramming

Static metaprogramming (aka "template metaprogramming- TMP") is a C++ technique that allows the execution of programs at compile-time. In C++ static metaprogramming is implemented with the help of reflection. The most important implementation of reflection in C++ is using the feature of run time type-identification (RTTI) [10]. RTTI is a system that keeps information about an object's data type in memory at run time. Run-time type information can apply to simple data types, such as integers and characters, or to generic objects. Enabling RTTI in C++ allows the use of dynamic_cast<> operation, the typeid operator or exceptions [11].

The template mechanism in C++ allows defining parametrized classes and functions. Templates together with other C++ features constitute a Turing-complete, compile-time sub- language of C++. A Turing-complete language is a language with at least a conditional and a looping construct. C++ can be considered to be a two-level language since a C++ program may contain both static code, which is evaluated at compile time, and dynamic code, which is executed at run time. Template meta-programs are the part of a C++ source that is executed during compilation. A meta-program can access information about types not generally available to ordinary programs [12].

Given below is an example of how to use templates for writing a common recursive factorial program:

 template<int count>
 class FACTOR{
 public:
     enum {RESULT = count * FACTOR<count-1>::RESULT};
      };
 class FACTOR<1>{
 public:
     enum {RESULT = 1};
      };

If we write this-

 int j = FACTOR<5>::RESULT;

The above line will calculate the value of 5 factorial. As we instantiate FACTOR<5> the definition of this class depends on FACTOR<4>, which in turn depend on FACTOR<3> and so on. The compiler needs to create all these classes until the template specialization FACTOR<1> is reached. This means the entire recursion is done by the compiler during compile time and uses the result as if it is a constant.

Uses of template metaprogramming: Compile-time dimensional analysis. Multiple dispatch. Design patterns. Code optimization Lexing and parsing.


Metaprogramming in dynamically typed language

Languages that provide the best support for this are those that easily overcome the distinction between code and data. e.g. Lisp - no other language has devised a more radical yet natural representation of that interchangeability. Code and data are both represented in Lisp as lists, so any list can easily be treated as either code or data. It’s simple, therefore, to manipulate code as data, and then execute it – either via EVAL or by returning it as the result of a macro expansion. [13]

References and notes

[1] http://en.wikipedia.org/wiki/Metaprogramming [2] http://www.slideshare.net/aminbandeali/dynamically-and-statically-typed-languages [3] http://www.geeksaresexy.net/2009/07`/22/gas-explains-what-is-metaprogramming/ [4] http://www.developer.com/java/other/article.php/3556176/An-Introduction-to-Java-Annotations.htm [5] http://en.wikipedia.org/wiki/Generic_programming [6] http://www.cs.tut.fi/~kk/webstuff/MetaProgrammingJavaKalvot.pdf