Metaprogramming in Statically Typed Languages

"I'd rather write programs that write programs than write programs" - Richard Sites

Introduction

Metaprogramming is an inherent feature of dynamically typed languages [e.g. Ruby]. However, achieving metaprogramming in statically typed languages [e.g. Java] becomes complex due to compile time abstraction verification. In this topic, we explore how tools and packages are usedto implement metaprogramming in statically typed languages .

Metaprogramming

Metaprogramming is a programming technique of writing computer programs that write or manipulate other programs or themselves, as data. [1] In other words, it is a programming technique of writing programs with a higher level of abstraction to make it appear as generative programming.

Metaprogramming involves two kinds of languages-

Meta-language is the language in which meta-programs, which construct or manipulate other programs, are written.
Object-language is the language of programs being manipulated.

This makes them ‘meta level programs’ whose problem domain are other ‘base level programs’.

The ability of a programming language to be its own metalanguage is called reflection or reflexivity.

Simple example of a metaprogram:
Let us consider a totally fabricated example for our understanding at very high level. Suppose we need to write a C program that printed the following 500 lines of text with a restriction that the program could not use any kind of loop or goto instruction.

Output expected:

 1 Mississippi
 2 Mississippi
 3 Mississippi
 4 Mississippi
 ...
 499 Mississippi
 500 Mississippi

In C this would be then coded as:

 #include <stdio.h>
 int main(void) {
   printf("1 Mississippi\n");
   printf("2 Mississippi\n");
       -
       -
       -
   printf("499 Mississippi\n");
   printf("500 Mississippi\n");
   return 0;
  }

With the power of a metaprogramming language we can write another program that writes this program automatically.

Ruby code:

 File.open('mississippi.c', 'w') do |output|
  output.puts '#include <stdio.h>'
  output.puts 'int main(void) {'
    1.upto(500) do |i|
      output.puts "    printf(\"#{i} " +
      "Mississippi\\n\");"
  end
  output.puts '    return 0;'
  output.puts '}'
 end

This code creates a file called mississippi.c with the expected 500+ lines of C source code.Here, mississippi.c is the generated code and ruby code is the metaprogram.

Applications of Metaprogramming

Metaprogramming is an attractive technique needed when one needs to alter the behavior of a program at run time. Due to its generative nature, it has numerous applications in program development. It can achieve program development without rewriting boiler-plate code [2] all the time, ensuring efficiency, increasing modularity and minimizing inconsistent implementation errors. Program generators and program analyzers are the two main categories of meta programs. Metaprograms can be compilers, interpreters, type checkers etc. Some commonly used applications include using a program that outputs source code to -

generate sine/cosine/whatever lookup tables
to extract a source-form representation of a binary file
to compile your bitmaps into fast display routines
to extract documentation, initialization/finalization code, description tables, as well as normal code from the same source files
to have customized assembly code, generated from a perl/shell/scheme script that does arbitrary processing
to propagate data defined at one point only into several cross-referencing tables and code chunks.

Programmers can focus more on the main business logic and new features to be implemented rather than writing repetitive chunks of code (e.g. setups, stubs)[1].

Typing in Programming Languages

Earlier programming languages [e.g. Assembly] were written such that each machine level function was reflected in the program code. With advancement in programming languages a certain level of abstraction was reached wherein lower level details were abstracted with one functional unit of work and represented by fewer lines of code e.g. primitive variables are represented with higher level abstract classes. With this abstraction arose a need for checking the validity of operations that could be performed with these abstractions in place.

Typing in programming languages is property of operations and variables in the language that ensure that certain kinds of values that are invalid are not used in operations with each other. Errors related to these are known as type errors. Type checking is the process of verifying and enforcing the constraints of types. Compile time type checking also known as static type checking. Run time type checking is known as dynamic type checking. If a language specification requires its typing rules strongly (i.e., more or less allowing only those automatic type conversions which do not lose information), one can refer to the process as strongly typed, if not, as weakly typed.[3] The above classification can be represented as -

Statically Typed Programming Languages

Statically typed languages ensure that a fixed type is assigned by the programmer to every variable and parameter. Thus, every expression type can be deduced and type checked during compilation. Static languages try to fix most errors during compile time and strive to minimize failures during run time. Due to this there are many type constraints on the programmer while coding. At run time, the program uses the classes that it has been given and in this way statically typed languages make distinctions between what happens at compile time and what happens at run time. Examples of statically typed languages are C, C++, Java, C#.

Dynamically Typed Programming Languages

In dynamically typed languages, the variables and parameters do not have a designated type and may take different values at different times. In all the operations, the operands must be type checked at runtime just before performing the operation. Dynamically typed languages don’t need to make a distinction between classes created at compile time and classes provided. It is possible to define classes at run time and in fact, classes are always defined at run time. These eliminate many developer constraints by avoiding the need of book keeping, declarations etc. Due to this flexibility these languages make an ideal candidate for prototyping and are widely used in agile development environments. However, dynamic languages are known to have performance issues. Static languages have code optimization features at compile time, but dynamic languages allow runtime code optimizations only. [4] In dynamically typed languages, the interpreter deduces type and type conversions, this makes development time faster, but it also can provoke runtime failures. These runtime failures are caught early on during compile time for statically typed languages. Examples of dynamically typed languages include Perl, Python, JavaScript, PHP, Ruby, Groovy.

Metaprogramming in statically typed languages

In safety languages [syntactically verbose], metaprogramming is not a standard feature, it can however be achieved. Also, static typing in meta-programs has a number of advantages. In addition to guaranteeing that the meta-program encounters no type-errors while manipulating object-programs, a statically typed metaprogramming language can also guarantee that any of the object-programs generated by the meta-program are also type-correct. A disadvantage of these type system is that (in case of meta-programming languages with weaker type systems) they sometime may be too restrictive in object-programs that the programmer is allowed to construct.

Techniques and Packages

Many language features can be leveraged to achieve some form of characteristics needed to achieve metaprogramming. For instance, languages that support reflection also allow for dynamic code generation. e.g. In Microsoft .NET Framework use of "System.Reflection.Emit" namespace is used to generate types and methods at runtime.

Reflection

Reflection is a valuable language feature to facilitate metaprogramming. Reflection is defined as the ability of a programming language to be its own meta-language. Thus, reflection is writing programs that manipulate other programs or themselves. [5]
e.g. In Java, reflection enables to discover information about the loaded classes:

Fields,
Methods and constructors
Generics information
Metadata annotations

It also enables to use these metaobjects to their instances in run time environment. E.g. Method.invoke(Object o, Object… args) With the Java reflection API, you can interrogate an object to get all sorts of information about its class.

Consider the following simple example:

 public class HelloWorld {
   public void printName() {
     System.out.println(this.getClass().getName());
   }
 }

The line

 (new HelloWorld()).printName();

sends the string HelloWorld to standard out. Now let x be an instance of HelloWorld or one of its subclasses. The line

 x.printName();

sends the string naming the class to standard out.

The printName method examines the object for its class (this.getClass()). In doing so, the decision of what to print is made by delegating to the object's class. The method acts on this decision by printing the returned name. Without being overridden, the printName method behaves differently for each subclass than it does for HelloWorld. The printName method is flexible; it adapts to the class that inherits it, causing the change in behavior.

Annotations

Annotations are a metaprogramming facility that allow the code to be marked with defined tags. Many APIs require a fair amount of boilerplate code. This boilerplate could be generated automatically by a tool if the program were “decorated” with annotations indicating which methods were remotely accessible. Metadata provided using annotations is beneficial for documentation, compiler checking, and code analysis. One can use this metadata to indicate if methods are dependent on other methods, if they are incomplete, if a certain class must reference another class, and so on. It is used by the compiler to perform some basic compile-time checking. For example there is a override annotation that lets you specify that a method overrides another method from a superclass. At this, the Java compiler will ensure that the behavior you indicate in your metadata actually happens at a code level as well.

An “annotation” has an “annotation type” associated with it which is used for defining it. It is used when you want to create a custom annotation. The type is the actual construct used, and the annotation is the specific usage of that type. An annotation type definition takes an "at" (@) sign, followed by the interface keyword plus the annotation name. On the other hand, an annotation takes the form of an "at" sign (@), followed by the annotation type [6].

Example to Define an Annotation (Annotation type)

 public @interface MyAnnotation {
   String doSomething();}
  Example to Annotate Your Code (Annotation)
  MyAnnotation (doSomething="What to do")
  public void mymethod() {
  ....
  }

Annotation Types

There are three annotation types:

Marker: Marker type annotations have no elements, except the annotation name itself.

Example:

 public @interface MyAnnotation {
 }

Usage:

 @MyAnnotation
 public void mymethod() {
  ....
 }

Single-Element: Single-element, or single-value type, annotations provide a single piece of data only. This can be represented with a data=value pair or, simply with the value (a shortcut syntax) only, within parenthesis.

Example:

 public @interface MyAnnotation
 {
   String doSomething();
 }

Usage:

 @MyAnnotation ("What to do")
 public void mymethod() {
  ....
 }

Full-value or multi-value: Full-value type annotations have multiple data members. Therefore, you must use a full data=value parameter syntax for each member.

Example:

 public @interface MyAnnotation {
   String doSomething();
   int count; String date();
 }

Usage:

 @MyAnnotation (doSomething="What to do", count=1,
             date="09-09-2005")
 public void mymethod() {

Generics

Generic programming is a style of computer programming in which algorithms are written in terms of to-be-specified-later types that are then instantiated when needed for specific types provided as parameters. [7] Java Generics are primarily a way for library authors to write something once, which users can customize to their own types. They allow the creation of classes and methods that work in the same way on different types of objects. The term "generic" comes from the idea that we'd like to be able to write general algorithms that can be broadly reused for many types of objects rather than having to adapt our code to fit each circumstance.

Generics add a way to specify concrete types to general purpose classes and methods that operated on object before. A Java collection is a flexible data structure that can hold heterogeneous objects where the elements may have any reference type. When you take an element out of a collection, you must cast it to the type of element that is stored in the collection. Generics provides a way for you to communicate the type of a collection to the compiler, so that it can be checked. Once the compiler knows the element type of the collection, the compiler can check that the collection has been used consistently and can insert the correct casts on values being taken out of the collection.[5] Generics are implemented by type erasure: generic type information is present only at compile time, after which it is erased by the compiler. The main advantage of this approach is that it provides total interoperability between generic code and legacy code that uses non-parametrized types (which are technically known as raw types).

Consider a non-generic example:

  //This program removes 4-letter words from c. Elements must be strings
  static void expurgate(Collection c)  
  {    
       for (Iterator i = c.iterator(); i.hasNext(); )      
       if (((String) i.next()).length() == 4)        
       i.remove();
   }

Here is the same example modified to use generics:

  //This program removes the 4-letter words from c
 static void expurgate(Collection<String> c) 
 {    
   for (Iterator<String> i = c.iterator(); i.hasNext(); )
   if (i.next().length() == 4)
   i.remove();
 }

The declaration above reads as “Collection of String c.” Collection is a generic class that takes ‘String’ as its type parameter. The code using generics is clearer and safer. Unsafe cast and a number of extra parentheses have been eliminated. More importantly, we have moved part of the specification of the method from a comment to its signature, so the compiler can verify at compile time that the type constraints are not violated at run time.[8]

Template Metaprogramming

Template metaprogramming(TMP) or Static Metaprogramming is a technique that allows the execution of programs at compile-time. It uses extremely early binding. A primary requirement for a metaprogramming language is providing high-level abstractions to hide the internal representation of base programs.[9] Each template language is specific for a base language and is generated from it. In this sense, a language of templates is a superset of the base language. Thus templates are abstractions that encapsulate a program pattern written by example. This concept has been explained in detail in section 3.2.5

Uses of template metaprogramming:

Compile-time dimensional analysis
Multiple dispatch
Design patterns
Code optimization
Lexing and parsing

Packages

Javassist (Java Programming Assistant)

Javassist is a Java library providing means to manipulate the Java bytecode of an application. It provides the support for structural reflection, i.e. the ability to change the implementation of a class at runtime. [10] Javassist is explicit metaprogramming, in which the metalanguage is Java. It is a load-time reflective system for Java which enables Java programs to define a new class at runtime and to modify a class file when the JVM loads it. Unlike other similar bytecode editors, Javassist provides two levels of API: source level and bytecode level.

If the users use the source-level API, they can edit a class file without knowledge of the specifications of the Java bytecode. They do not have to even write an inserted bytecode sequence; Javassist instead can compile a fragment of source text on line (for example, just a single statement). This ease of use is a unique feature of Javassit against other tools. One can even specify inserted bytecode in the form of source text and Javassist compiles it on the fly. On the other hand, the bytecode-level API allows the users to directly edit a class file as other editors. Thus it makes Java bytecode manipulation simple.

Javassist has the following applications:

Aspect Oriented Programming: Javassist can be a good tool for introducing new methods into a class and for inserting before/after/around advice at the both caller and callee sides.
Reflection: One of applications of Javassist is runtime reflection; Javassist enables Java programs to use a metaobject that controls method calls on base-level objects. No specialized compiler or virtual machine are needed.
Remote method invocation: Another application is remote method invocation. Javassist enables applets to call a method on a remote object running on the web server. Unlike the Java RMI, the programmer does notneed a stub compiler such as rmic; the stub code is dynamically produced by Javassist.

Example:

 BufferedInputStream fin = new BufferedInputStream(new FileInputStream("Point.class"));
 ClassFile cf = new ClassFile(new DataInputStream(fin));

A ClassFile object can be written back to a class file. write() in ClassFile writes the contents of the class file to a given DataOutputStream. ClassFile provides addField() and addMethod() for adding a field or a method (note that a constructor is regarded as a method at the bytecode level). It also provides addAttribute() for adding an attribute to the class file.

To examine every bytecode instruction in a method body, CodeIterator is useful. A CodeIterator object allows you to visit every bytecode instruction one by one from the beginning to the end. To otbain this object, do as follows:

    ClassFile cf = ... ;
    MethodInfo minfo = cf.getMethod("move");    // we assume move is not overloaded.
    CodeAttribute ca = minfo.getCodeAttribute();
    CodeIterator i = ca.iterator();

Other extensions are being developed to include reflective systems for C++ and Java e.g. OpenC++ and OpenJava which are extensible preprocessors based on compile-time reflection in C++ and Java respectively.

JRuby

JRuby is a complete implementation of Ruby in Java. The scripting and functional features of the Ruby language can be used by Java developers.[11] Simple metaprogramming techniques can be extended from Ruby so that Java packages are mapped onto to Ruby modules. This would be something like a Ruby-Java Bridge, since JRuby can be run from any platform with a JVM.

Example: Using JRuby API calling JRuby from Java

   import org.jruby.*;
   public class SimpleJRubyCall {
   public static void main(String[] args) {
       Ruby runtime = Ruby.getDefaultInstance();
   runtime.evalScript(“puts 1+2”);
     }
   }

With metaprogramming using JRuby one can

add methods to class,
add instance methods
add to have Java classes

Since metaprogramming empowers the programmer to create domain specific languages(DSL), the ones created by JRuby can always leverage Java libraries to build wrapper functionalities. e.g. Simple JRuby DSL on top of HtmlUnit

AspectJ

Another library worth mentioning here is AspectJ. It enforces the aspect oriented programming (AOP) approach in Java. Aspect oriented programming is a complimentary programming paradigm to object oriented programming and is used to improve the modularity of software systems. Thus, while object oriented programming is great for modeling common behavior on a hierarchy of objects, aspect oriented programming allows you to define cross-cutting concerns by adding direct semantics and can be applied across heterogeneous object models. [12] Applications of aspect oriented programming include logging, instrumenting and debugging.

In object oriented programs, the natural unit of modularity is the class. In AspectJ, aspects modularize concerns that affect more than one class. In addition to classes, aspect oriented programming uses 'aspects'. Aspects enable modularization of crosscutting concerns such as transaction management that cut across multiple types and objects. Thus, AspectJ package achieves metaprogramming features with more controllability.

AspectJ introduces declaring aspects in the statically typed language Java by using 3 key concepts -

Join Points - well defined points in program execution
Advice - defines additional code to be executed around join points
Pointcut - define those points in the source code of a program where an advice will be applied

The core task of AspectJ's advice weaver is to statically transform a program so that at runtime it will behave according to the AspectJ language semantics. The AspectJ weaver takes class files as input and produces class files as output. The weaving process itself can take place at one of three different times: compile-time, post-compile time, and load-time. These two java files are 'woven' together by compiler i.e. the class and the aspect behavior are tied together. This is also known as 'plugging-in' the aspect. [13]

Example - Simple tracing/logging

Consider the HelloWorld class below which implements an application that simply displays "Hello World!" on the standard output.

package helloworld; 
class HelloWorld {
   public static void main(String[] args) {
           new HelloWorld().printMessage();
   }   
   void printMessage() {
           System.out.println("Hello world!");
   }
}

Consider another Java source file with an aspect definition as follows -

package helloworld;
aspect Trace of eachobject(instanceof(HelloWorld)) {
   pointcut printouts(): receptions(void printMessage());
   before(): printouts() {
             System.out.println("*** Entering printMessage ***");
   }
   after():  printouts() {
             System.out.println("*** Exiting printMessage ***");
   }
}

In above example, the Trace aspect injects tracing messages before and after method main of class HelloWorld. If one does not want tracing, one can simply leave the aspect out and plug it in as and when required.

In contrast to reflection[low level], AspectJ provides more carefully controlled power, drawing on the rules learned from object-oriented development to encourage a clean and understandable program structure. An aspect imposes behavior on a class, rather than a class requesting behavior from an aspect. An aspect can modify a class without needing to edit that class - also known as 'reverse inheritance'.

MetaJ

MetaJ is another package that supports metaprogramming in the Java language. A MetaJ program consists of the Java code with special metaprogramming declarations that will control how the output code is created. It uses templates and reflection. Templates are abstractions that encapsulate a program pattern written by example. Templates are translated to Java classes, so they can be accessed in the metaprogram. Accessing patterns by example inside ordinary Java programs is a major feature of MetaJ programming. [14]

Language Dependency: MetaJ comprises of a set of concepts that are independent of the base language. These are syntax trees, code references, code iterators and code templates. A framework is defined which in which features common to most languages are abstracted. This supports independence from base language. Thus, generic operations can be defined and components that are language dependent can be plugged onto it.

Execution Flow Representation:

Example:

Template : SampleTempl

package myTempl; 
language = Java // base language plugin
template #CompilationUnit SampleTempl{ 
#[#PackageDeclaration:pck]# 
#[#ImportDeclarationList:imps]# 
class SampleTempl { ... } 
#TypeDeclaration:td 
}

Instance of Template : Java Code

package myTempl; 
import metaj.framework.AbstractTemplate; 
public class SampleTempl extends AbstractTemplate{ 
public final Reference imps, td, pck;
... // Implementation of superclass abstract methods
}

Java SDK7 has added support for dynamic typing and metaprogramming and includes MetaJ implementation.

C++ Template Metaprogramming

In C++, static metaprogramming is implemented with the help of reflection. The most important implementation of reflection in C++ is using the feature of run time type-identification (RTTI) [15]. RTTI is a system that keeps information about an object's data type in memory at run time. Run-time type information can apply to simple data types, such as integers and characters, or to generic objects. Enabling RTTI in C++ allows the use of dynamic_cast<> operation, the typeid operator or exceptions [16].

The template mechanism in C++ allows defining parametrized classes and functions. Templates together with other C++ features constitute a Turing-complete, compile-time sub- language of C++. A Turing-complete language is a language with at least a conditional and a looping construct. C++ can be considered to be a two-level language since a C++ program may contain both static code, which is evaluated at compile time, and dynamic code, which is executed at run time. Template meta-programs are the part of a C++ source that is executed during compilation. A meta-program can access information about types not generally available to ordinary programs [9].

Given below is an example of how to use templates for writing a common recursive factorial program:

 template<int count>
 class FACTOR{
 public:
     enum {RESULT = count * FACTOR<count-1>::RESULT};
      };
 class FACTOR<1>{
 public:
     enum {RESULT = 1};
      };

If we write this-

 int j = FACTOR<5>::RESULT;

The above line will calculate the value of 5 factorial. As we instantiate FACTOR<5> the definition of this class depends on FACTOR<4>, which in turn depend on FACTOR<3> and so on. The compiler needs to create all these classes until the template specialization FACTOR<1> is reached. This means the entire recursion is done by the compiler during compile time and it uses the result as if it is a constant.

Conclusion

Dynamically typed languages are best suited to provide support for metaprogramming due to their inherent nature of easily overcoming distinction between code and data e.g. Lisp provides this feature of interchangeability. Code and data are both represented in Lisp as lists, so any list can easily be treated as either code or data. It’s simple, therefore, to manipulate code as data, and then execute it – either via EVAL or by returning it as the result of a macro expansion. [3]

Thus, statically typed languages achieve metaprogramming using various techniques and tools to achieve the level of flexibility that dynamic languages provide immanently.

CSC/ECE 517 Fall 2010/ch2 S24 rm

Contents

Introduction

Metaprogramming

Applications of Metaprogramming

Typing in Programming Languages

Statically Typed Programming Languages

Dynamically Typed Programming Languages

Metaprogramming in statically typed languages

Techniques and Packages

Reflection

Annotations

Annotation Types

Generics

Template Metaprogramming

Packages

Javassist (Java Programming Assistant)

JRuby

AspectJ

MetaJ

C++ Template Metaprogramming

Conclusion

References and Notes

Further Reading

Navigation menu

CSC/ECE 517 Fall 2010/ch2 S24 rm

Introduction

Metaprogramming

Applications of Metaprogramming

Typing in Programming Languages

Statically Typed Programming Languages

Dynamically Typed Programming Languages

Metaprogramming in statically typed languages

Techniques and Packages

Reflection

Annotations

Annotation Types

Generics

Template Metaprogramming

Packages

Javassist (Java Programming Assistant)

JRuby

AspectJ

MetaJ

C++ Template Metaprogramming

Conclusion

References and Notes

Further Reading

Navigation menu

Search