CSC/ECE 517 Fall 2009/wiki1b 11 al
Static Vs. Dynamic Object Oriented Languages from the Perspective of Performance
Static languages are known for its conservative approach to language processing, making most typing decisions at compile time itself thereby allowing the compilers to optimize on the byte code generated and provide better type-error checking. Thus, performance is tweaked in these languages. Dynamic languages, on the other hand, attempt to improve the productivity of the average developer, allowing him more flexibility in terms of type checking and so, delay most typing decisions as much as feasible. Thus they sacrifice slightly on performance, and allow for more expressiveness and freedom.
Overview
The debate between statically typed languages and dynamic languages is one that has been going on ever since programming languages came into existence and will endure on, constructively influencing the way interpreters and compilers process languages for many years to come. Apart from speed of processing of the two types of languages, stability, manageability, expressiveness, efficiency of the tools available for each, the community, and the joys of programming in one particular language over another have been debated upon. Since Object Oriented Design is the logical de-facto heir to the structured methodology of programming, this debate continues over the subset of object oriented languages existing in the market at this time.
Type Systems
Type System is defined formally as
"A tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute."[1]
The study of type systems is called Type Theory and has a significant role in Computer Language Design, apart from Computer Architecture, Compiler Construction and study of Grammars. Simplistically viewed, a Type System provides an abstraction to a collection of bits. All data to a computer is sequence of bits. The hardware is incapable to segregating or separating them into memory addresses, instruction code, characters, integers and floating-point numbers, since all it sees is the bit stream. However, any application makes references to its available memory within the program, as variables. Type system provides the necessary abstraction to programmers, offering them a higher level, modular view at the implementation, by, for example, allowing them to think of strings as a char collection, rather than a stream of bytes.
Static_Typing
Static typing, which dates back to the 1950s, was the first form of typing available. It was extensively used in various compiled and interpreted languages including Ada, C, C++, C#, JADE, Java, Fortran, Haskell, ML, Pascal, Perl, Scala. Static Typing performs a one-time evaluation of type information at compile time. This also serves to be a program verification [2] mechanism, wherein, many type errors are caught early in the development cycle. However the number of errors that are caught depends on the strength of the typing system. Also possible among these are “type inference” algorithms[3], which are used in the newer generations of statically typed languages like ML, C# etc which help in two ways :
- Determine the legal values possible for a variable
- Preventing the occurrence of incompatible values in mathematical expressions
Thus, this type is a typical guide to the compiler on how much storage needs to be allocated to a variable.
/* C Code */ int num; num = num + 10;
The above code snippet shows how the variable 'num' would have to be explicitly declared as an integer in a statically typed language like C.
Dynamic Typing
Dynamic Typing is a newer phenomenon, introduced with languages such as Lua, Groovy, JavaScript, Lisp, Objective-C, Perl (with respect to user-defined types but not built-in types), PHP, Prolog, Python, Ruby, Smalltalk and Tcl. In these languages type checking is delayed to be performed at runtime rather than at compile time[3]. Types are therefore associated with values rather than variables. This allows for more flexibility than the former because of the fewer –in – number compile checks, which are inherently less restrictive than the static typing checks. For example, typing will check if the programis syntactically correct. However, due to their very nature, the checks at run time need to be more vigorous. Dynamically Typed Languages, however, are known to pay a price for this flexibility – which is a higher cost for optimization. They also need a lot of unit testing per module.
irb(main):001:0> num = 10 => 10 irb(main):002:0> num = 'string' => "string" irb(main):003:0>
In the example above, the variable 'num' is not declared as an integer or string. Instead at runtime, it's type is assigned as appropriate.
Performance Comparison
It is a well accepted fact that dynamic languages offer the additional freedom to developers by sacrificing a performance gradient. However which among them is more productive is up for debate. In the following sections, why static compilers lend itself to optimization and dynamic compilers do not is examined.
Optimized Compilers
In static typing mechanism, the compiler goes thorough the source code, assigning labels, or rather types to the syntax, then uses them to infer something about the program’s behavior. Since most of the information is already known at compile time, this architecture lends itself well to various code optimizations. Type error checking, as mentioned above is also a strong forte of these languages, weeding out problems at compile time that could derail the application at run time. Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that are in use, it produces optimized machine code. Further, compilers for statically typed languages can find assembler shortcuts more easily.[2] A compiler may also use the static type of a value to optimize the storage it needs and the choice of algorithms for operations on the value. For example, In many C compilers the "float" data type, for example, is represented in 32 bits, in accordance with the IEEE specification for single-precision floating point numbers. C thus uses floating-point-specific operations on those values (floating-point addition, multiplication, etc.).[2]
Dynamic Language Nature
Dynamic Object Oriented Languages try to minimize the number of static elements as possible. The code executed is known only at runtime.This is because it leaves as many decisions as possible to be resolved at runtime. This therefore results in code that is very difficult to analyze, and therefore very difficult to optimize. For example consider the trivial addition operation. It will not be known, what values the '+' will be bounded to at runtime. Nothing can be inferred about the type of operands. This is the result of mutation[4]. Further, names in dynamic typing are based on strings and look-ups are employed to find them [5]. For example, to find fan.bar, fan would have to be found in the local variable hash table and then find bar in fan’s hash table. In some dynamic languages, like Python and Ruby, there may be additional lookups/method calls to implement dynamically generated attributes. All these lookups are really hard to optimize.
Researchers are looking at a number of ways to speed up dynamic languages. They are looking at various options to tweak, such as
- Improvements to the native language itself by providing support such as native threads, optional type system, etc.
- Virtual Machine improvements
- Smarter Compliers ways to improvise over the compilers themselves. Examples worth mentioning are Python which has the most advanced hash table implementations and Javascript also has seen a marked improvement since the days of IE5.
Conclusion
In conclusion, it is a well known fact that dynamic languages are more expressive more flexible languages than their statically typed counterparts. But this flexibility is achieved at the cost of performance and the difficulty in optimizations. However, the difference in performance gradient has narrowed down gracefully over the last couple of decades. Common Lisp and Smalltalk can now compete evenly with static languages in many use cases because dynamic lookups/modifications are more controlled.
See Also
The two typing methodologies have been compared over a single parameter, however it would be worthwhile to study how they fare against each other in other parameters detailed in the introduction such as expressivess, documentation and the like. The efficiency of one over the other is a well documented debate with technology enthusiasts blogging freely with their views about each.
References
[1] Pierce, Benjamin (2002). Types and Programming Languages.MIT Press
[2] Wikipedia - http://en.wikipedia.org/wiki/Type_system
[3] Premshree Pillai - http://articles.sitepoint.com/article/typing-versus-dynamic-typing
[4] Adam Turoff - http://notes-on-haskell.blogspot.com/2008/05/static-vs-dynamic-languages.html
[4] Yann Dauphin - http://npcontemplation.blogspot.com/2008/06/want-to-make-dynamic-languages-faster.html
[5] stackoverflow.com - http://stackoverflow.com/questions/761426/why-are-dynamically-typed-languages-slow
[6] Steve Yegge - http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html