CSC/ECE 517 Fall 2009/wiki1b 11 al
Overview
Static languages are known for its conservative approach to language processing, making most typing decisions at compile time itself thereby allowing the compilers to optimize on the byte code generated and provide better type-error checking. Thus, performance is tweaked in these languages. Dynamic languages, on the other hand, attempt to improve the productivity of the average developer, allowing him more flexibility in terms of type checking and so, delay most typing decisions as much as feasible. Thus they sacrifice slightly on performance, and allow for more expressiveness and freedom.
Introduction
The debate between statically typed languages and dynamic languages is one that has been going on ever since programming languages came into existence and will endure on, constructively influencing the way interpreters and compilers process languages for many years to come. Apart from speed of processing of the two types of languages, stability, manageability, expressiveness, efficiency of the tools available for each, the community, and the joys of programming in one particular language over another have been debated upon. Since Object Oriented Design is the logical de-facto heir to the structured methodology of programming, this debate continues over the subset of object oriented languages existing in the market at this time.
Type Systems
Type System is defined formally as
"A tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute."[1]
The study of type systems is called Type Theory and has a significant role in Computer Language Design, apart from Computer Architecture, Compiler Construction and study of Grammars. Simplistically viewed, a Type System provides an abstraction to a collection of bits. All data to a computer is sequence of bits. The hardware is incapable to segregating or separating them into memory addresses, instruction code, characters, integers and floating-point numbers, since all it sees is the bit stream. However, any application makes references to its available memory within the program, as variables. Type system provides the necessary abstraction to programmers, offering them a higher level, modular view at the implementation, by, for example, allowing them to think of strings as a char collection, rather than a stream of bytes.
Static Typing
Static typing, which dates back to the 1950s, was the first form of typing available. It was extensively used in various compiled and interpreted languages including Ada, C, C++, C#, JADE, Java, Fortran, Haskell, ML, Pascal, Perl and Scala. Static Typing performs a one-time evaluation of type information at compile time. This also serves to be a program verification [2] mechanism, wherein, many type errors are caught early in the development cycle. However the number of errors that are caught depends on the strength of the typing system. Also possible among these are “type inference” algorithms[3], which are used in the newer generations of statically typed languages like ML, C# etc which help in two ways :
- Determine the legal values possible for a variable
- Preventing the occurrence of incompatible values in mathematical expressions
Thus, this type is a typical guide to the compiler on how much storage needs to be allocated to a variable.
Dynamic Typing
Dynamic Typing is a newer phenomenon, introduced with languages like Small Talk and Lisp. It is now included in a variety of languages such as Lua, Groovy, JavaScript, Lisp, Objective-C, Perl (with respect to user-defined types but not built-in types), PHP, Prolog, Python, Ruby, Smalltalk and Tcl. In these languages type checking is delayed to be performed at runtime rather than at compile time[3]. Types are therefore associated with values rather than variables. This allows for more flexibility than the former because of the fewer –in – number compile checks, which are inherently less restrictive than the static typing checks. For example, typing will check if the programis syntactically correct. However, due to their very nature, the checks at run time need to be more vigorous. Dynamically Typed Languages, however, are known to pay a price for this flexibility – which is a higher cost for optimization. They also need a lot of unit testing per module.
Performance Comparison
It is a well accepted fact that dynamic languages offer the additional freedom to developers by sacrificing a performance gradient. However which among them is more productive is up for debate. In the following sections, we examine why static compilers lend itself to optimization and dynamic compilers do not.
Optimized Compilers
In static typing mechanism, the compiler goes thorough the source code, assigning labels, or rather types to the syntax, then uses them to infer something about the program’s behavior. Since most of the information is already known at compile time, this architecture lends itself well to various code optimizations. Type error checking, as mentioned above is also a strong forte of these languages, weeding out problems at compile time that could derail the application at run time. Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that are in use, it produces optimized machine code. Further, compilers for statically typed languages can find assembler shortcuts more easily.[2] A compiler may also use the static type of a value to optimize the storage it needs and the choice of algorithms for operations on the value. For example, In many C compilers the "float" data type, for example, is represented in 32 bits, in accordance with the IEEE specification for single-precision floating point numbers. C thus uses floating-point-specific operations on those values (floating-point addition, multiplication, etc.).[2]
Dynamic Language Nature
Dynamic Object Oriented Languages try to minimize the number of static elements as possible. The code executed is known only at runtime.This is because it leaves as many decisions as possible to be resolved at runtime. This therefore results in code that is very difficult to analyze, and therefore very difficult to optimize. For example consider the trivial addition operation. You won’t be able to know what values the ‘+” will be bounded to at runtime. You cannot infer anything about the type of operands that will need the operation. In fact, you won’t be able to know what ‘+’ will be bounded to at runtime. This is the result of mutation [4] Further, names in dynamic typing are based on strings and lookups are employed to find them [5]. For example, to find fan.bar, you would find fan in the local variable hash table then find bar in fan’s hash table. In some dynamic languages, like Python and Ruby, there may be additional lookups/method calls to implement dynamically generated attributes. All these lookups are really hard to optimize. Work has been done to improve on, as mentioned in the next section, but these improvements are much more difficult to achieve.
Optimizations to Dynamic Languages
Researchers are looking at a number of ways to speed up dynamic languages. They are looking at various options to tweak, such as
- Improvements to the native language itself by providing support such as native threads, optional type system, etc
- Virtual Machine improvements
- Smarter Compliers ways to improvise over the compilers themselves. Examples worth mentioning are Python which has the most advanced hash table implementations and Javascript also has seen a marked improvement since the days of IE5.
Case Studies
Conclusion
In conclusion, it is a well known fact that dynamic languages are more expressive more flexible languages than their statically typed counterparts. But they achieve this flexibility, paying the price in performance and the difficulty in optimizations. However, the difference in performance gradient has narrowed down gracefully over the last couple of decades. Common Lisp and Smalltalk can now compete evenly with static languages in many use cases because dynamic lookups/modifications are more controlled.
See Also
Although we have compared the two typing methodologies over a single parameter, it would be worthwhile to study how they fare against each other in other parameters detailed in the introduction such as expressivess, documentation and the like. The efficiency of one over the other is a well documented debate with viewers blogging freely with their views about each.
References
[1] Pierce, Benjamin (2002). Types and Programming Languages.MIT Press
[2] Wikipedia - http://en.wikipedia.org/wiki/Type_system
[3] Adam Turoff - http://notes-on-haskell.blogspot.com/2008/05/static-vs-dynamic-languages.html
[4] Yann Dauphin - http://npcontemplation.blogspot.com/2008/06/want-to-make-dynamic-languages-faster.html
[5] stackoverflow.com - http://stackoverflow.com/questions/761426/why-are-dynamically-typed-languages-slow
[6] Steve Yegge - http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html