CSC/ECE 517 Fall 2010/ch2 S24 sk

What is Metaprogramming

Metaprogramming is in essence using programs to manipulate other programs. The concept is actually a classic UNIX principle on writing code and is far from new. In fact, its technical beginning probably starts with Konrad Zuse's insight almost 70 years ago that a computer could prepare its own instructions. Metaprogramming has been garnering the attention of developers and software engineers in recent years, first with dynamic languages such as Ruby and more recently with static languages such as Java and C++.

Metaprogramming is also called generative programming. The idea is that instead of modifying simple data elements, you modify symbols (or patterns) that represent complex operations. This is achieved through three key components. A metalanguage where we define a generalization or pattern. A generator which is an expression of that generalization. And finally an instance which is the output from the evaluation of that generator. The goal is to generalize a set of concrete instances that can be evaluated and generated as a static representation of what we wish our code to accomplish. This takes polymorphism variation out of the code level and moves it up to the metalevel where it can be interpreted by Domain Specific Languages (DSLs) which describe how a system should be generated.

Metaprogramming in Practice

Before we elaborate to much on metaprogramming, here is some fundamental terminology to become familiar with.

A Metaprogram represents a program that has the ability to modify another program or itself
Reflection is the ability of a program to manipulate the state of a program during execution. This manipulation comes in two aspects: introspection and intercession
Introspection is the ability of a program to observe and reason about its own state
Intercession is the ability of a program to alter its own execution state or alter its own interpretation or meaning
A Metaobject represents an object during reflection as well as execution stacks, the processor, and nearly all elements of the language and its exectution environment
Metalevel Architecture provides information about selected system properties that helps enables generated code members to become self aware
Partial Evaluation is evaluating a system while one part is varying or in the process of generation and another is constant

Even though metaprogramming in static languages has been getting a lot of attention from software creators recently, the capability has been around quite some time. In C for example, the preprocessor acts as the vehicle for generating code. This type of metaprogramming was mostly limited to simple macros due to the preprocessors lack of recursion and lack of type checking. This made generating code rather dangerous and complicated (a trait that has plagued metaprogramming). Still, it showed the need was there and as time went on the tools for building metaprograms became more robust.

In todays environment the generation of code in static programming infrastructures is handled by templates that extend the static language. To put it simply, a template has a name, a type, and a body. when the template executes, syntactiically correct code is generated. There are several different types of templates and depending on which you use the validity of the code at runtime will vary.

Code generators come in a couple of different flavors and depending on the task at hand one may be favored over another. Multistage templating offers control and well defined legalities. With this type of templating we get static safety checking that ensures type-correctness. MetaML, MetaCaml, and MetaD are examples of multistage metaprogramming template languages. The issue with multistage templates are that the well defined legalities decrease flexibility and add complexity to the language. This means that the specificity of the program it can match becomes more and more fine. Remember we are tying to generalize our generation of code to make it applicable to many different situations. Single stage generators offer increased flexibility at the cost of less stringent type checking. They also allow for the generation of arbitrary code. SafeGen is an example of a single state metaprograming template language.

There are also several different categories of templates including expression templates for developing embedded domain specific languages, metafunctions for computing types and numbers, and also trait templates, and nested templates for representing metainformation.

Some key applications of metaprogamming are the translation of a program from one language to another, to transform a program , to refactor a program, to optimize a program, verification of a program, type checking a program, and to apply design patters to a program. A visual representation of how template metaprogramming can be applied is shown below.

Applications of template metaprogramming

Drawbacks of Metaprogramming

To talk about the challenges that occur when working with metaprograms lets start off with an example. Ruby is a dynamic language where code generation is a key focus of its design. If we look at the example below:

class Person
   attr_accessor :name
end

is equivalent to:

class person
   def name
      @name
   end
   def name=(new_name)
      @name = new_name
   end
end

This is a basic metaprogramming construct that is used every day in Ruby. There is one minor drawback however. The instance variable is not initialized until we set its value from the given attribute. This means it is possible to receive a NIL value if we try to read this attribute of this class before writing to it. Now while this is a rather simple example that isn't going to cause anyone too many headaches, it does show that it is very easy to introduce bugs into your code through metaprogramming if you are not careful. It can also be the case that these bugs will stay hidden for quite some time before being discovered, usually by the end user of the system and not by the developer. Here is a list of some of the most common challenges encountered with metaprogramming:

Debugging: Debugging is difficult in template metaprograms because it is challenging to verify for all possible inputs. Another issue is that typenames in the template generation can easily get into the thousands of characters due to nesting.
Error Reporting: Error reporting is lacking because small changes or mistakes to a template can generate vastly differing code with a long list of errors that do not address the actual template that was written.
Readability: Because there are no agreed upon standards for metaprogramming structure and static languages typically were not built with metaprogramming as a key focus, template code tends to not be readable (generated code even less so).
Compilation Time: Template metacode is interpreted, not compiled. This may extend compilation time by orders of magnitude.
Compiler Size: Compiler sizes vary but when generating code that is doing complex computations, it is something to be careful of.
Portability: Template metaprogramming is evolving at a rapid pace. A particular compiler may not support what the metaprogrammer is trying to do.

Touching on these challenges brings up a key point. Metaprogramming is complicated. In fact, before even beginning to write your first metaprogram there are two central challenges to overcome. One, there is no consensus on a general approach to metaprogramming, and two there is a steep learning curve involved in developing with metalanguages. This is compounded when extending a static language as we are doing here. When extending a static language a great deal of effort is required by us as the metaprogrammer in order to query the static code. Also, once our metaprogram is complete, it will be more dificult to maintain. With metaprogramming in a static language you will also always have a lack of extensibility due to lawfully defined typing. One thing to keep in mind before writing your first metaprogram on a complex piece of code is that you will have to deal with that complexity at least twice, or possibly more when nesting your metaprogram.

Addressing the Drawbacks

Several template libraries have been developed to try and address the challenges of metaprogramming. Each has their own approach to overcoming these hurdles. While none of them are the perfect solution they are evidence that great strides are being made. We will take a look at some of them now.

Reflex

Reflex is a template library that extends Java and utilizes what it calls a "Cut", a "Link", and an "Action." In Reflex a Cut is a specification of what programs are of interest to the metaprogram. An action is what the metaprogram would like to do on the programs of interest. And the Link handles the interconnection between a Cut and an Action. Reflex also uses Metaobjects to do their work during actions. A Metaobject can be any standard Java object as long as it includes the needed protocol. There are two types of Links, Structural and Behavioral. Structural Links bind a set of classes to a metaobject, so that the class definitions of these classes can be modified. Structural Links are meant to modify classes before they are loaded. This includes adding members, implementing a new interface, and change method signatures.

Reflex Structural Model

In the example below we see a metaojbect that adds a unique identifier to instances of a supplied class. This in turn adds a field interface, and a method to the class.

SMetaobject uidAdder = new SMetaobject(){
   public void handle(RClass aClass){
      aClass.addField(...);aClass.addInterface(...);aClass.addMethod(...);
   }
}

Reflex is notable becaues it attempts standardization in metaprogramming. It does this by supporting the Open Metaojbect Protocol (MOP) specification. MOP is a protocol between the base program and metaobjects. It is not a fixed protoocol and can be customized down to its finest details. This gives Reflex flexibility as well as specificity. Below is another example of Reflex. In this example we see the ability to modify object of the subclass of a class as well as fields of objects of the same type as that parent class that occur outside of that object.

Hookset mExces = new PrimitiveHookset(MsgReceive.class, a, new PublicOS());
Hookset fAccess = new PrimitiveHookset(FieldAccess.class, new NotCS(a),
   new OperationSelector(){
      public boolean accept(Operation op, RClass c){
         return a.accept(((FieldAccess)op).getTargetType());});

Hookset useOfA = new CompositeHookset().add(mExecs).add(fAccess);

SafeGen

SafeGen is another metaprogramming language for Java. In SafeGen the user specifies all legal inputs. These inputs are analyzed and if they pass, SafeGen guarantees that well formed Java code will be output from the generator. The way SafeGen works is based on two main constructs. the constructs are Cursors and Generators. A Cursor is a range variable. The way SafeGen looks at a base program is as if its a collection of logical facts about all of its type declaration. A Cursor ranges over all of these facts. A Generator is a way for SafeGen to express Java code fragments. These code fragments can take Cursors as their parameters.

An example of a Generator

#defgen trivialGen(){
   class C {public void meth() {}}
}

An example of a Generator taking a Cursor as a parameter

#defgen addFields(Class c){
   #foreach (Field f : FieldOf(f,c)) {int #[f];}
}

Inside the generator body we can have any valid Java syntax. We can also have SafeGen specific constructs taht help to direct the control of dataflow of the Generator.

SafeGen also allows us to define predicates. We can define these predicates either inside or outside a Generator. We can also pass a predicate to a Generator as a parameter. An example of predicate creation as declaring the java.io.Serializable as a constant at compilation time is below.

forall (Class c) : (exists (Interface i) :
   (InterfaceOf(i, c) & i.Name = "java.io.Serializable"))

SafeGen is a notable metaprogramming language because it seeks to solve the problem of generators creating code with bugs in it. It uses an automated theorem prover (SPASS) in an effort to prove correctness conditions under all inputs. Where SafeGen excels over other generating templates is cooperating with various parts of a program at runtime. Most generators are good at type validation when the entire code base is available however in metaprogramming we may have a partially generated program that needs to interact with another part of a program whose structure might not be known until runtime. This is an area that SafeGen performs well in and many other code generators fall behind.

Clojure

As you may have noticed, the libraries we have looked at in detail so far allow a metaprogrammer to write generators in some form of the base language. There is a conscious movement among the development community towards homogeneous metaprogramming languages and away from the heterogeneous ones. Heterogeneous metaprogramming languages are written in a language other than the base language and are more difficult to learn and to use. Heterogeneous languages like TXL, Stratego/XT and ML do offer more flexibility than the homogeneous variety but static language tools have different levels of support for these other languages. For these reasons heterogeneous metaprogramming templates have not generated as much support in the static language community as the homogeneous ones have.

Clojure is a new Java metaprogramming language that fully embraces the homogeneous approach. Whereas many other generator libraries augment Java grammar in the JVM, Clojure fully integrates within the Java ecosystem. Clojure compiles directly to Java or Common Language Runtime bytecode. It also extends Java to support pure functional style, pragmatic facilities for concurrent programming, metaprogramming, and domain specific languages. Below is an example of Clojure generating a class.

(ns clojure.examples.hello
    (:gen-class))
 
(defn -main
  [greetee]
  (println (str "Hello " greetee "!")))

And here is an example of how a metaprogramming language such as Clojure (along with a DSL called Magic Potion) would integrate into the development cycle of a typical production project. In this case for an Art Gallery.

Clojure's place within a normal development cycle

Conclusions

Advancements in metaprogramming for statically typed languages has seen a great deal of progress in the last several years, however there is still a great deal of progress left to be made. Many developers and software engineers still see metaprogramming as just too unwieldy, undisciplined, and risky to use in production applications. Some of those concerns are warranted and some are a thing of the past. For those programmers who do subscribe to the practice of metaprogramming in statically typed languages there are still questions over the necessity of a metaprogramming language infrastructure. Tens of thousands of programmers worldwide use a simple text based tool called XDoclet to manage their code templates and code generation to interface with J2EE application servers. This bypasses the need for a metaprogramming architecture all together.

Even though metaprogramming in static languages is still a work in progress I believe a clear direction and focus has been discovered. In future implementations of code generators for statically typed languages it is my opinion that you will see much more homogeneity. This greatly reduces the complexity, is easier to learn and easier to maintain. We will also see much more modularized and direct metaprograms. Implementing additional features that are not the direct goal of the project focus serve to confuse and slow down productivity in metaprograms especially. Metaprograms will be written with the support of the base language's standard tools and not specialized constructs that would reduce portability. Metaprogams will also be transparent to the base language. Finally, modeling spaces will be used more heavily to describe what the metaprogram is for and why it was created.

Metaprogramming is a rich and powerful paradigm that we in the development community have to fully harness. As time moves forward though and our comilers and architectures improve we will begin to metaprograms that will redefine productivity in static language production applications.