CSC/ECE 517 Fall 2009/wiki2 6 rp: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
-Author's Note=
=Author's Note=
Apologies.  This has been here since 9 October, but I was editing under the wrong account.  I made a change with the correct account, but it was a whitespace change that apparently didn't stick in a way that Expertiza picked up.  I have just gone in and made a non-whitespace change, and things seem to be working now.  I'm very sorry for taking so long to get this to appear.
Apologies.  This has been here since 9 October, but I was editing under the wrong account.  I made a change with the correct account, but it was a whitespace change that apparently didn't stick in a way that Expertiza picked up.  I have just gone in and made a non-whitespace change, and things seem to be working now.  I'm very sorry for taking so long to get this to appear.



Revision as of 06:55, 12 October 2009

Author's Note

Apologies. This has been here since 9 October, but I was editing under the wrong account. I made a change with the correct account, but it was a whitespace change that apparently didn't stick in a way that Expertiza picked up. I have just gone in and made a non-whitespace change, and things seem to be working now. I'm very sorry for taking so long to get this to appear.

Topic

While Smalltalk and Java were instrumental in bringing object-oriented programming into the mainstream, the 1990s also saw the development of hybrid languages that supported multiple programming paradigms, often combining procedural, object-oriented and functional approaches. Discuss the development and design considerations that went into languages like OCaml, Scala, F# and Fan, and how they build upon an object-oriented foundation to offer languages that offer type safety while maintaining brevity and expressive power.

Introduction

There are many ways to classify programming languages: on the basis of static or dynamic typing, type safety, high or low levels of abstraction, and by programming paradigm.

In this article, the focus is on programming paradigms, which can loosely be thought of as defining the atomic units a language offers as building block with which to build programs. There are three primary paradigms in use today:

Declarative languages span a wide array of programming styles. For simplicity in this article, we will focus on functional languages, which are an important subset of declarative languages.

Many languages provide only one of the paradigms to their users. FORTRAN, COBOL, Pascal, C and BASIC are all procedural, Simula, Smalltalk] and Eiffel are object-oriented and Haskell and ML are functional. These languages provide a single paradigm for programming, which limits programmers to the particular set of constructs available in that model.

The three paradigms evolved somewhat independently and offer unique constructs with which to solve programming problems. Some languages are multi-paradigm, however, and allow a programmer to use the tools most fitting for the problem at hand, sometimes mixing and matching freely within a single program. The original language of this type is Lisp, which dates back to the mid-1950s. In the last 20 years, many other hybrid languages have emerged, often derivatives of existing languages. C++ is derived from C, and offers both procedural and object-oriented approaches. Java, one of the most widely used languages today, offers both procedural and object-oriented paradigms as well. OCaml is derived from ML, and offers functional and object-oriented styles. There are many others that have become popular in recent years, like Python and Ruby.

In this article, our goal is to examine what these hybrid languages offer above and beyond strictly object-oriented languages. In particular, we will examine the features of object-oriented languages and functional languages, and how they have been blended in a modern hybrid language called Scala.

Features of Object Oriented Languages

The key construct provided by object-oriented languages is the object, which bundles the data and behavior together into a cohesive unit.

Encapsulation

Encapsulation is the technique of rolling behavior and data into an object that exhibits (and is handled in terms of) a particular behavior, rather than a particular composition. This technique allows the internals of an object to remain hidden from external entities, which in turn enables the object to modify its implementation over time while providing the same external behaviors.

For example, encapsulation would allow a class to add logging functionality without affecting other classes' collaboration with it. Any feature that is solely an implementation concern (speeding up a sorting algorithm, checking inputs for invalid data, changing the way resource errors are handled, etc.) can be evolved and modified over time within the object without breaking functionality elsewhere in the program that relies on the object in question.

Dependency Inversion

One of the major benefits of focusing on the behavior of objects rather than their implementation is that the behavior aspects can be well defined and captured in an interface. This allows code that once would rely on an unstable implementation to instead shift its dependency onto a stable interface, and have the implementation then implement the interface. This process is known as dependency inversion, because of the way in which it alter the dependency structure of the code.

For example, let us say we have a web server that makes use of a logging facility to log information to disk. In this case, the web server depends on the logging facility:

Later, it becomes apparent that the web server also want to be able to log to other destinations. To facilitate this, we introduce a Logger interface, and make have the web server depend on that, so as to insulate it from new types of loggers that may be introduced. This allows us to modify both the server and the loggers in isolation, having them both depend on the interface rather than each other, essentially reversing, or inverting the dependency diagram.

In this diagram, the behavior of the logger is well-defined and stable, and is captured in the interface represented by the yellow box. The blue boxes are classes that are implementations, and are insulated from one another by the interface, allowing them to vary independently without breaking the program as a whole. This is an example of the robustness that the object-oriented approach lends to a program through use of encapsulation.

Polymorphism and Dynamic Dispatch

The object-oriented approach of thinking in terms of behavior lends itself to have multiple implementations of a single behavior. We can see this is the case of the logger above. To the web server, a logger is simply an object with which logs can be stored. There are many possible implementations; we could use the file logger or the network logger as seen above, but we could also implement a console logger or a database logger. Essentially, this means that the web server could call a method to log information without any knowledge of exactly what kind of object it was calling. The raises the question of how (and when) a programming language decides which log method to call. Is it compiled in, or is it determined at runtime?

The answer is that it is determined at runtime. This type of late binding allows a program to remain flexible even after it is deployed, permitting end-users to determine what kind of logging functionality they want be passing in various implementations of the logger interface when the web server is run, even implementations that were written after the web server was written and deployed! This kind of flexibility is a result of polymorphism, which simply means that the an interface (like the logger interface in our example) can have many forms, or implementations. The way the language determines which one to invoke is through dynamic dispatch, which is essentially a lookup table maintained as the program runs that maps calls to methods like logging to the implementations specified.

The major theme here is that the object-oriented approach of favoring encapsulation has many effects that ripple throughout the design process, almost all of which allow programmers to reduce the brittleness of their software by delaying commitment to particular implementations.

Features of Functional Languages

Much like object-oriented languages, functional languages offer a dominant paradigm that has far-reaching consequences in software design. For object-oriented languages, that element is encapsulation, and for functional languages, it is that functions are first-class entities, which simply means that functions are treated just like any other value in the language. This allows them to be passed as parameters, delayed, composed, invoked, and returned from other functions. But what is a function?

A function's most notable trait is that it is free of side-effects, or state changes to the program that occur as a result of its execution. We could say a lot about side effects, but the general rule is that a function will always return the same value given the same input. This means that functions are stateless. Therefore, entities like random number generators and loggers are not functional, because they either store state or have some effect that is not related to the value they return. Another way to think of functions is that they are completely defined in terms of their input and output.

From this simple feature flows many characteristic language features. We will discuss them in turn.

Immutability

Maybe the most noticeable feature (or restriction) to programmers of languages that are not functional, immutability is one of the hallmarks of a functional language. Most non-functional languages allow assignment, or the ability to assign new values to variables as the program runs. In functional programming, a value is declared only once and then never changes. This feature of of the language flows out of the notion that functions have no side-effects, and is often viewed as more of a restriction than a feature.

The power of immutability comes from the fact that programs that fully utilize immutability are much simpler than those that do not, making them amenable to automated analysis by a computer. Computers can then prove various properties about function programs, like concurrent behavior (presence or absence of deadlock/livelock), memory usage, and what within what bounds various values in the program will lie.

Higher Order Functions

Higher order functions are simply functions that accept or return other functions as values. This is made possible because functions are treated as first-class entities, and allows very expressive implementations of many algorithms to be written.

There are several hallmark higher order functions: map, reduce, fold left/right and filter are a few examples. Going into the intricacies of each is beyond the scope of this article, but Abelson and Sussman's seminal Structure and Interpretation of Computer Programs has an excellent treatment of the subject.

Recursion

Programmers that are new to functional programming often balk at the notion that values cannot change over time (as variables do in imperative languages). If values cannot change, how can we execute loops?

The answer lies in recursion, where values are immutable, and loops are implemented as successive calls to the current function with new arguments. The avoids the complexity of having values change over time while offering all the power of the loops we often see in imperative languages (for...each, do...while, etc.) Again, a deep treatment of these language features is beyond the scope of this article, but again, Abelson and Sussman have addressed the topic at some length in their book.

Delayed (Lazy) Evaluation

An extremely powerful feature of functional programming that is enabled by immutability is lazy evaluation. This simply means that expressions can be defined, even recursively, and are not computed until the values that would result from that computation are needed. This is possible because values in functional programs don't change over time, which means expressions can be evaluated at any point in time as needed.

Take an imperative expression like this (in Java):

 int i = 4;
 int j = i + 3;
 i++;

If you evaluate these expressions in order, j will be equal to 7, but if you were to delay the evaluation of j until after the third line, j would be equal to 8. This is a very simple example of how variables that change over time require expressions be evaluated in a particular order to maintain the correctness of the program. Functional programming explicitly disallows the third line, and therefore enables the value of j to be calculated only when it is needed. The implications of this are significant, as we'll see in the next section.

Hybrid Object-Oriented/Functional Languages

To programmers of languages like SmallTalk, Simula and Java, many of the features provided in functional languages will seem foreign or strange. But, as discussed, there are significant advantages to the functional approach for certain types of programs. While the object-oriented paradigm has proven itself to be very useful, many modern languages have sought to integrate functional features as well to increase the expressiveness and power of the language. One such language is Scala, which, like Java, runs on the Java Virtual Machine (JVM), and is thus shares Java's libraries and portability while offering many functional features that are not available in object-oriented languages like Java. In the overview of Scala available on its homepage, Scala is briefly described:

[Scala] smoothly integrates features of object-oriented and functional languages, enabling Java and other programmers to be more productive. Code sizes are typically reduced by a factor of two to three when compared to an equivalent Java application."

How does Scala allow code sizes to be reduced so much? It uses sophisticated type inferencing to reduce type signature redundancy, allows for functional programming constructs to be used, and provides syntax for oft-used programming idioms. Essentially, it takes the lessons learned from decades of functional and object-oriented programming experience and rolls them into a modern hybrid language that can make full use legacy Java code.

Scala is representative of an entire class of hybrid object/functional languages like OCaml, F# and Fan, and therefore makes a good language for use in the examples in this article. The intent is not to present highly technical examples, but rather to give readers an impression of how hybrid languages increase expressive power over their single-paradigm counterparts.

Case Classes

Idiomatic object-oriented programming involves encapsulation of data, as previously discussed. In older object-oriented languages, this required defining internal data values, and then providing necessary get and set methods to access and/or modify those values. Here is an example a class in Java that provides only get methods for some (immutable) values it holds. It includes all the methods that should be included in such a class, including a toString, equals and hashCode method. Note that is an error if such methods are not included in structural Java classes, which is why I include them here.

 public class UserData {
     private final String firstName;
     private final String lastName;
     private final Integer age;
     public UserData(String firstName, String lastName, Integer age) {
         this.firstName = firstName;
         this.lastName = lastName;
         this.age = age;
     }
     public String getFirstName() {
         return this.firstName;
     }
     public String getLastName() {
         return this.lastName;
     }
     public Integer getAge() {
         return this.age;
     }
     @Override
     public boolean equals(Object obj) {
         if (obj == null) {
             return false;
         }
         if (getClass() != obj.getClass()) {
             return false;
         }
         final UserData other = (UserData) obj;
         if ((this.firstName == null) ? (other.firstName != null) : !this.firstName.equals(other.firstName)) {
             return false;
         }
         if ((this.lastName == null) ? (other.lastName != null) : !this.lastName.equals(other.lastName)) {
             return false;
         }
         if (this.age != other.age && (this.age == null || !this.age.equals(other.age))) {
             return false;
         }
         return true;
     }
     @Override
     public int hashCode() {
         int hash = 7;
         hash = 19 * hash + (this.firstName != null ? this.firstName.hashCode() : 0);
         hash = 19 * hash + (this.lastName != null ? this.lastName.hashCode() : 0);
         hash = 19 * hash + (this.age != null ? this.age.hashCode() : 0);
         return hash;
     }
     public String toString() {
         StringBuilder sb = new StringBuilder();
         sb.append("UserData[firstName: ");
         sb.append(this.firstName);
         sb.append(" lastName: ");
         sb.append(this.lastName);
         sb.append(" age: ");
         sb.append(this.age);
         sb.append("]");
         return sb.toString();
     }
 }

In Scala, all the functionality of the above class can be expressed with:

 case class UserData(val firstName: String, val lastName: String, val age: Int)

To reiterate, the version in Scala actually does include a constructor, accessor methods, three fields that can only be set once, equals and hashCode methods (comparing the structural equality of the two instances of the class), as well a meaningful toString method which prints the fields.

There are numerous other benefits in using case classes, including automatic destructuring in pattern matching. We will discuss these features in the next section.

This feature by itself vastly reduces both development and maintenance burden for developers. Classes in Scala are defined in lines, rather than entire files. All the potential errors that could creep into that code are removed, and the intent of the developer is very clear.

Destructuring Binds and Pattern Matching

Lisp, one of the earliest hybrid languages, offered primarily functional features with object-orientation built on top (in a system called CLOS). One piece of functionality written with Common Lisp's metaprogramming facility was the ability to execute destructuring binds, which are a powerful way of checking for a particular type (pattern matching) and simultaneously extracting values from the inside of that type to use them in some computation. In the spirit of Common Lisp, Scala also offers destructuring binds and pattern matching. Let's take a look at what they can do for us.

Let us suppose we want to make use of our UserData class, defined above. We want to find all users with ages over 18 and store their names in a database. If we were to write a method that would simply do the database store for a single user, it might look like this:

 public void storeName(UserData user) {
     Integer age = user.getAge();
     if (age > 18) {
         recordToDatabase(user.lastName, user.firstName);
     }
 }

In Scala, this would be a trivial pattern match, with a destructuring bind to capture the fields inside the UserData (in this case, we bind them to f, l and a):

 def storeName(user:UserData) = user match {
   case UserData(f, l, a) if a > 18 => recordToDatabase(l, a)
 }

In this case, we don't get an enourmous savings in terms of the number of lines of code used, but we gain tremendously in expressive power. Let's say we wanted to do the same thing, but not include users named "Bob". In Java:

 public void storeName(UserData user) {
     String firstName = user.getFirstName();
     if (firstName != "Bob") {
         Integer age = user.getAge();
         if (age > 18) {
             recordToDatabase(user.lastName, user.firstName);
         }
     }
 }

And now in Scala:

 def storeName(user:UserData) = user match {
   case UserData(f, l, a) if a > 18 && f != "Bob" => recordToDatabase(l, a)
 }

Hopefully this provides a little bit of insight into how destructuring can improve the expressiveness of your code. A more in depth discussion on pattern matching and binding can be be found in Part 4 of Code Commit's series "Scala For Java Refugees".

List Comprehension

In imperative languages (including both procedural and object-oriented languages), it is common to loop very often, either with a for loop or a while loop. Hybrid languages allow for this, and also incorporate more abstract operations on structures seen in functional programming.

For example, suppose we wanted to take in a structure containing a bunch of UserData objects, and return a list of strings, containing the last name, a comma, and then the first name of each user.

In Java, this might look like this:

 public ArrayList<String> toStrings(ArrayList<UserData> users) {
     ArrayList<String> returnedList = new ArrayList<String>();
     for (UserData u : users) {
         returnedList.add(u.getLastName() + ", " + u.firstName);
     }
     return returnedList;
 }

In the seven lines of code, only the method declaration and the add to the new list really have anything to do with the business logic. In Scala, that boiler plate can be avoided:

 def toStrings(users: List[UserData]) = users.map(u => u.lastName + ", " + u.firstName)

This method definition is so short that it is arguable we shouldn't even define the method, and should simply use the map operation inline:

 users.map(u => u.lastName + ", " + u.firstName)

This is a small demonstration of how functional languages and object-oriented languages an combine their strengths. We retain all the encapsulation of the object-oriented approach (although u.lastName and u.firstName appear to be fields, they are in fact accessor methods), but we also have the power of higher abstraction levels from the functional world (in this case, the map operation).

Lazy Evaluation

The final feature I'd like to touch upon is lazy evaluation. One of the many implications of lazy evaluation is that we can now program using infinite data structures; we define a way in which deeper parts of the data structure are calculated from the existing parts, and only lazily evaluate parts of the structure as they are accessed. This allows us to express algorithms quite concisely. This can be used in a variety of cases like random number generation, methods of successive approximation (like Newton's Method), fractal calculations, and generation of infinite series, as seen in the numerical methods or, more simply, the Fibonacci sequence. Because it is so well known, I will use the Fibonacci sequence as an example here. First, in Java:

  public long fib(int k) {
      if (k <= 2) {
          return 1;
      } else {
          return fib(k-1) + fib(k-2);
      }
  }

This is a short, inefficient, recursive solution. It doesn't cache answers, either in a single call or across calls, which means it performs a huge amount of redundant computation. So it has a number of problems, but it is concise.

In Scala, we can remedy all those problems and remain concise using a Stream, a lazily evaluated form of a List:

 val fibs: Stream[Int] = Stream(0,1) append fibs.zip(fibs.drop(1)).map(x => x._1 + x._2)

This definition is concise, offers automatic memoization and is thread-safe. To add those features to the Java code would require 20 lines of additional code at a minimum. It returns a Stream that can be queried for the kth Fibonacci number in the same way as the fib method defined in Java could, but calculates the 92rd Fibonacci in a fraction of a second.

Conclusion

Unfortunately, this article barely scratches the surface of the topic. There are decades of research and experience that have gone into modern programming languages, and the way in which they blend various programming paradigms blurs what were once quite distinct borders in the landscape of programming languages. But, that blurring has allowed these modern languages to offer programmers a bigger set of tools that aren't merely an amalgamation of features from every language, but a carefully crafted blend of features that enhance the expressiveness of these languages while not substantially increasing the syntax or API surface area.

Further Reading

If you want to know more about hybrid languages, its best to choose one (or a few!) and read about what they offer. If you're familiar with Java, or C#, take a look at the tour of Scala or perhaps some of the features of Fan.

There are also a few major dynamically typed hybrid languages. Chief among these is Python, which has several implementations in C, Java and Python(!). Ruby is also a newer hybrid language, originating in Japan, but was made famous worldwide because of its Rails web programming framework that was possible in large part due to its hybrid nature. It also has a Java implementation. Finally, if you're working on the JVM an want a hybrid, dynamically-typed language with syntax that is close to Java's, Groovy is your best bet.

If you're very interested in performance, Common Lisp may have something to offer you, and there are several performant implementations. If you want to learn more about why hybrid languages are so powerful, Paul Graham's book On Lisp explains why. If you really care about using the power of the JVM and are doing concurrent programming, a very modern lisp called Clojure will be very interesting.