CSC/ECE 517 Fall 2010/ch3 3i MM
Introduction
Computer programs are made up of little more than instructions and data. How those instructions and data came into being is irrelevant to the hardware on which the program is executed. Many different computer programming languages exist[1] that can be compiled or interpreted into executable instructions. These languages are largely split into the realms of dynamic, and static languages.
Ruby is a high-level, dynamic programming language that is useful for everything from short scripting assignments, to entire web applications and desktop GUI applications. However, there are inherent downsides to using Ruby. Although Ruby is generally far more flexible due to its dynamic, interpreted nature, using C or C++ code for some tasks can improve memory usage, and raw execution speed[2].
However, what if you could combine the good performance aspects of C, with with the dynamic elements of Ruby? It seems like that should be possible, since Ruby is actually written in C. Well, luckily the maintainers of Ruby have thought ahead, and have made that possible in several different ways. There are ways to extend Ruby's capabilities with C code, and there ways to use Ruby functions within your own C application. This page will attempt to cover aspects of both of these mechanisms.
Reasons to Mix Static and Dynamic Languages
Ruby, Perl and Python are very easy to use and generally easier to maintain than static language projects, but sometimes a project may already have elements written in C, C++ or Java. Sometimes there are vendor supplied compiled libraries that you would like to use in your Ruby application. In those cases, you may want to write a thin wrapper layer in the C or Java API for Ruby or Python around those already existing libraries, and continue the main part of your application in the dynamic language.
Sometimes the reverse is true. If a large project exists in a dynamic language, and you would like to use parts of that in a different application written in a static language, you may write small hooks into the dynamic code using the same APIs from before.
Reasons to use a static language:
- Execution speed[3]
- Memory efficiency
- Existing 3rd party/IHV libraries
- Existing internally developed libraries
Reasons to use a dynamic language:
- Easier to get started (subjective)
- Easier to modify and maintain (subjective)
- Useful 3rd party libraries far too difficult to replicate in a static language
- Large existing internally developed project
- You really do not mind requiring a local interpreter.
- Any static pieces can be easily wrapped with the dynamic language's static API
Those are very abstract reasons to use a particular type of language, and there are still more reasons to use a specific language. The list is intended simply as an abstract guide. To aid in understanding the concept of mixing static and dynamic languages, here is a quick list of examples mixing
Examples:
- JSON parsing is 10x faster in C than in Python[4]
- cPickle vs pickle in Python
- Language bindings to static, compiled language GUI toolkits (PyQt, QtRuby
- Large amounts of remotely hosted database code is already written in, e.g. Perl, but a desktop GUI application needs to access those databases
- Embedded devices require C or C++, but an out of band test harness may be written in any language, and a dynamic language would better fit that project
The possibilities are truly endless. The key may be understanding that any dynamic language is usually written in some static language, which opens the door for mixing the two. Now, on to some actual implementations and techniques for doing this.
A comprehensive overview of every static and dynamic language combination does not seem prudent for this forum, as there are just too many. Instead, this page will attempt a brief overview of several techniques to mix Ruby and C, and will show one instance where large speedups are realized through the use of C in a Ruby program.
Using C from Ruby
Following are some overviews and examples of ways to create C or C++ extensions to Ruby.
Ruby C API
README.EXT
The README.EXT file contains the latest information and an overview of how to create Ruby extensions in C. It is an invaluable source of information, and is included in any source code distribution of Ruby. The link is to the latest HEAD version in Ruby's official subversion repository, but you may want to read the version that came with your installed version of Ruby.
In Unix-like distributions, for example, this file may be installed installed in '/usr/share/doc/ruby1.8-dev/README.EXT.gz'. To read, from a command prompt type:
> zless /usr/share/doc/ruby1.8-dev/README.EXT.gz
VALUE
In C, every variable has a type. In Ruby, everything is an object. To bridge the gap between C's static types and Ruby's dynamic objects, Ruby creators came up with the "VALUE" typedef in C. In the Ruby C API, when a variable is of type "VALUE, you know that it either comes from the Ruby side of the program, will be returned to Ruby, or will be used by the Ruby side in some form or fashion [5]. The typedef VALUE is defined in ruby.h, and is typically an unsigned long. The Ruby interpreter uses VALUE as either a pointer to a larger Ruby data type, or - in the case of more primitive types, such as Fixnums, booleans, and the NilClass - to actually the value itself.
Example
This example shows a simple way to return the value "10" from a C function to Ruby when the method is called[6].
mytest.c
The C code below is the functional part of the Ruby C extension for this example[7].
// Include the Ruby headers and goodies #include "ruby.h" // Defining a space for information and references about the module to be stored internally VALUE MyTest = Qnil; // Prototype for the initialization method - Ruby calls this, not you void Init_mytest(); // Prototype for our method 'test1' - methods are prefixed by 'method_' here VALUE method_test1(VALUE self); // The initialization method for this module void Init_mytest() { MyTest = rb_define_module("MyTest"); rb_define_method(MyTest, "test1", method_test1, 0); } // Our 'test1' method.. it simply returns a value of '10' for now. VALUE method_test1(VALUE self) { int x = 10; return INT2NUM(x); }
This is relatively complex. The function that we will actually use directly from Ruby is the "method_test1(VALUE self)" function. You will notice that it returns a VALUE, which every function must do whether you return nil or real value (?). Also, every function must take at least one VALUE parameter, whether it uses it or not because Ruby will always call it with at least one parameter. In this function, we call that VALUE self.
Every Ruby extension in C must also have an "Init_X" method, where X is the name that you will load with "require," from Ruby. Inside this Init method, we register a module name, and store that module instance in C with the MyTest value and the rb_define_module method. We then attach a method to that modeul with the rb_define_method call, and register a name for that method, "test1," give it the C function, "method_test1." We will eventually call this method from Ruby by calling MyTest.test1, so it is important to see where those names and functions are being connected here.
Also note the implementation of method_test1. The last line returns the VALUE produced by applying the macro INT2NUM applied to the "x" integer. The macro INT2NUM, as its name implies, takes a C integer and converts it to a Fixnum type for Ruby. In this case, since Fixnum is a "small" value to Ruby, the "VALUE" returned by method_test1 does not contain a pointer to an object anywhere, but rather the integer value stored in a special way in the unsigned long "VALUE" instance.
extconf.rb
Ruby uses the mkmf package to make it very easy to create a Makefile for a Ruby C extension. The following is an example[8]:
# Loads mkmf which is used to make makefiles for Ruby extensions require 'mkmf' # Give it a name extension_name = 'mytest' # The destination dir_config(extension_name) # Do the work create_makefile(extension_name)
The important thing here is that the "extension_name" variable, or anything passed to "dir_config" and "create_makefile" must match the module name that you want to load with "require" later in Ruby.
To create the Makefile for this extension, run the following command:
$ ruby extconf.rb creating Makefile
You should see nothing but the message "creating Makefile" printed out if everything is successful. From there, simply type "make," and the C extension in mytest.c will be compiled into the "mytest" shared library. This library will be what is included in Ruby in the next step.
test.rb
This is a very simple Ruby script that loads the shared library Ruby C extension that we just created, and executes our test method:
require 'mytest' include MyTest puts test1
Output
$ ruby test.rb 10
As you can see, the module that we had to include was "MyTest," as specified in the call to rb_define_module("MyTest") in mytest.c. The method that we called was "test1," as specified as the second argument in the call, "rb_define_method(MyTest, "test1", method_test1, 0)." There really is not a lot of code here. Ruby makes it very easy to create the Makefile for your extension, and to include it in your Ruby project. You also do not need to jump through a bunch of hoops to return values from C functions, or specify arguments to C functions.
RubyInline
RubyInline is a gem that is designed to allow easy C extensions through embedding C code directly in Ruby classes themselves. These embedded C functions are compiled at runtime when needed, and work just as fast as if they were standalone C extensions. See below for more explanation.
Installation
RubyInline is a separate gem, available through rubygems, and can be installed thusly:
> gem install RubyInline
Example
This is a simple example that does the same as the C API example: it prints "10," as it is returned from a C function[9].
require 'rubygems' require 'inline' class Example inline(:C) do |builder| builder.c "int method_test1() { int x = 10; return x; }" end end puts Example.new.method_test1
To do anything with RubyInline you must include the "inline" module. The next step is to add the inline method to a given class -- this case, the "Example" class. The inline method in the Example class yields a "builder" for the request language - in this case, ":C." The builder for C has a method, "c", to which you pass a string containing valid C code that you wish to have compiled. There is also a "c_raw" method that can take C code that is already written using the Ruby C API (e.g., uses VALUE, and such). After the string is set for the builder, control returns to the inline method, which calls "build," and "load" on the given builder. This builds the string in a hidden directory within your project, and loads the specified method into the class where the inline method was defined. This seems complicated, but taking a look at the RubyInline source on the RubyForge page is worthwhile. Note, RubyInline keeps enough information to know when the C source code has changed and needs to be recompiled, otherwise all subsequent runs will use the already compiled shared object.
As of this writing, it appears that there are only builders for C and C++, but the option is available to develop your own builders for any other language that you wish.
SWIG
SWIG is a popular wrapper generator that can produce wrappers for C and C++ code in several high level programming languages, including Perl, Python, and Ruby.
Example
There is a really good page for SWIG examples using Ruby here. The first simple example is a good one though, and there a few things that could be added, which will be below.
Simple Example
This is a simple setup to create a "greatest common denominator" function in C, and use it in Ruby by way of a SWIG-generated wrapper [10]. This is example also adds a global variable, "Foo."
example.c
/* File : example.c */ /* A global variable */ double Foo = 3.0; /* Compute the greatest common divisor of positive integers */ int gcd(int x, int y) { int g; g = y; while (x > 0) { g = x; x = y % x; y = g; } return g; }
This snippet of C creates a function called "gcd," which takes two integers, and returns the greatest common denominator. It also declares a global variable, "Foo," which we would like to use from Ruby. Next, we must create a SWIG interface file so we can use it to generate the wrapper.
example.i
%module example extern int gcd(int x, int y); extern double Foo;
This is a very simple file, and it's all that we need to generate many different language wrappers for this C function. All we need to do is tell SWIG what the module name is, and declare what resemble forward values for the function, and the global value.
Generating example_wrap.c
Now we can generate the Ruby wrapper by running the following command:
swig -ruby example.i
This creates a very large C file called "example_wrap.c." It may be useful to look over this file, but there is a lot going on there. In particular, the "_wrap_gcd(int argc, VALUE *argv, VALUE self)" function that gets generated checks to makes sure it is called with the right number of values, and does argument type checking, which is very useful. SWIG also built setter and getter functions for the "Foo" variable, which is nice.
Building this example
Unfortunately there are no fancy make-generating scripts to use with SWIG. You generally have to create and maintain your own Makefile. Rather than a whole makefile, however, here is a simple compilation line to build this example on an Ubuntu 10.04 x86_64 setup:
gcc -shared -fPIC -L/usr/lib -lruby1.8 -I/usr/lib/ruby/1.8/x86_64-linux/ example_wrap.c example.c -o example.so
Error during compilation
Also unfortunately, when this wiki page author was trying out the code on his machine, there was compilation error related to the resolution of the "Foo" symbol in the "example_wrap.c" file generated by SWIG. This seems like an error in SWIG Version 1.3.40. The error was as follows:
example_wrap.c: In function ‘_wrap_Foo_get’: example_wrap.c:1953: error: ‘Foo’ undeclared (first use in this function) example_wrap.c:1953: error: (Each undeclared identifier is reported only once example_wrap.c:1953: error: for each function it appears in.) example_wrap.c: In function ‘_wrap_Foo_set’: example_wrap.c:1966: error: ‘Foo’ undeclared (first use in this function)
This was resolved by adding the following line somewhere before the first use of the Foo variable:
extern Foo;
If you receive the same error, try adding that line.
Ruby script using SWIG module
The build should produce a shared object library called "example.so," which can be used just like the .so libraries produced by the Ruby C API mkmf process. The following is a short script that uses the gcd function, and the Foo variable:
require 'example' puts Example.gcd(10,20) puts Example.gcd(120, 160) puts Example.Foo
Output:
10 40 3.0
The test script is easy enough to follow. The Example.gcd() function gets called just as if it were a module method written in Ruby or with the Ruby C API.
This should be some demonstration of the power of SWIG. Rather than writing an extension in the native API of Ruby, with very little additional SWIG code, this simple C function can be made usable from Ruby, or from Python, Perl, or any language that SWIG supports. Traditionally, however, SWIG wrappers are much more complex, particularly for C++ applications.
Performance Comparison
This is a short test to compare an iterative calculation of PI in both a C function, and Ruby. The test yielded quite surprising results. This is a relatively inefficient way to calculate PI, but it works, nonetheless.
C PI Calculation
#include "ruby.h" VALUE PiCalc_C = Qnil; static VALUE pi_calc_c(VALUE self) { int numPartitions = 12000; int circleCount = 0; double interval = 0, pi = 0; int i = 0, j = 0; double a, b; interval = 1.0/(double)numPartitions; for (i = 0; i < numPartitions; i++) { a = (i + .5)*interval; for (j = 0; j < numPartitions; j++) { b = (j + .5)*interval; if ((a*a + b*b) <= 1) circleCount++; } } pi = (double)(4*circleCount)/(numPartitions * numPartitions); return rb_float_new(pi); } // The initialization method for this module void Init_pi_calc_c() { PiCalc_C = rb_define_module("PiCalc_C"); rb_define_method(PiCalc_C, "pi_calc_c", pi_calc_c, 0); }
As you may have noticed, the Init_pi_calc_c() function simply creates a module called "PiCalc_C," instead of a class. It loads the "pi_calc_c" method into that module. Notice that the pi_calc_c(VALUE self) function still needs to take one argument no matter what.
RubyInline PI Calculation
require 'rubygems' require 'inline' module PiCalcRubyInline inline(:C) do |builder| builder.c " double pi_calc_rubyinline() { int numPartitions = 12000; int circleCount = 0; double interval = 0, pi = 0; int i = 0, j = 0; double a, b; interval = 1.0/(double)numPartitions; for (i = 0; i < numPartitions; i++) { a = (i + .5)*interval; for (j = 0; j < numPartitions; j++) { b = (j + .5)*interval; if ((a*a + b*b) <= 1) circleCount++; } } pi = (double)(4*circleCount)/(numPartitions * numPartitions); return pi; }" end end
Note that the RubyInline version is substantially similar to the C function, but it returns a double instead of a VALUE, and it does not need to specify an unused "self" parameter in its method signature.
Ruby PI Calculation
module PiCalcRuby def pi_calc_ruby numPartitions = 12000 circleCount = interval = pi = 0.0 interval = 1.0/numPartitions; for i in 0..numPartitions do a = (i + 0.5)*interval for j in 0..numPartitions do b = (j + 0.5)*interval if ((a*a + b*b) <= 1) then circleCount += 1 end end end pi = (4*circleCount)/(numPartitions * numPartitions) end end
As you can see, this is a relatively inefficient nested loop function that calculates PI. There is a fair amount of floating point arithmetic.
Results
Test Harness:
#!/usr/bin/env ruby require 'pi_calc_ruby' require 'pi_calc_c' require 'pi_calc_rubyinline' include PiCalcRuby include PiCalc_C include PiCalcRubyInline def c_test start = Time.now pi = pi_calc_c() stop = Time.now puts "C PI Result: #{pi}" puts "C Time: #{stop - start}" end def ruby_inline_test start = Time.now pi = pi_calc_rubyinline() stop = Time.now puts "RubyInline PI Result: #{pi}" puts "RubyInline Time: #{stop - start}" end def ruby_test start = Time.now pi = pi_calc_ruby() stop = Time.now puts "Ruby PI Result: #{pi}" puts "Ruby Time: #{stop - start}" end c_test ruby_inline_test ruby_test
Results for Ruby 1.8:
$ ruby1.8 perf_test.rb C Time: 1.193169 RubyInline Time: 1.192995 Ruby Time: 274.178188
Results for Ruby 1.9.1:
$ ruby1.9.1 -rubygems perf_test.rb C Time: 1.192857364 RubyInline Time: 1.192744329 Ruby Time: 175.468406517
As you can see from the test, Ruby-1.8 took 274 seconds to calculate PI, and Ruby-1.9.1 took 175.5 seconds, but the C function and RubyInline did it in about 1.9 seconds. Note, all three PI calculation version produced the same PI value: 3.14159491666667. This algorithm was designed to calculate PI to 8 decimal points, so the rest is superfluous.
Using Ruby from C
README.EXT
Once again, this file, included in the Ruby source code, has a section on using Ruby features from C. Section 2.2 is a brief overview of some techniques for doing this.
List of useful Ruby Functions
- Evaluate Ruby code
VALUE rb_eval_string(const char* ruby_code)
- Create a Ruby object steps:
ID class_id = rb_intern("class-name"); VALUE class = rb_const_get(rb_cObject, class_id); VALUE obj = rb_class_new_instance(argc, argv, class);
- Invoke a method
VALUE rb_funcall( VALUE receiver, ID method_id, int argc, ...) VALUE rb_funcall2( VALUE receiver, ID method_id, int argc, VALUE* argv)
Examples
Simple String case change
The following example shows a string being converted to all-caps by using the Ruby string function "upcase."[11]
#include "ruby.h" int main(int argc, char**argv) { ruby_init(); VALUE ruby_string = rb_str_new2("Hello, world!"); printf("Original string: %s\n", RSTRING(ruby_string)->ptr); ID method_id = rb_intern("upcase"); VALUE ruby_up_string = rb_funcall(ruby_string, method_id, 0); printf("Processed string: %s\n", RSTRING(ruby_up_string)->ptr); return 0; }
Output:
Original string: Hello, world! Processed string: HELLO, WORLD!
First, the Ruby interpreter must be initialized by calling "ruby_init()." Then, the rb_str_new2() function is used to create a Ruby string. An ID to the "upcase" method is retrieved using "rb_intern", and then called on the Ruby string. Since "VALUE" is really just a pointer to a real string, we have to use the RSTRING() macro to case the VALUE as a ruby string struct, then we access the ptr value of that struct, and print it. The printed output shows that the string has been converted into an uppercase string.
Other Mixed-language Combinations
In general, dynamic language interpreter maintainers will provide very good documentation for mixing it with the interpreter's native language. This is a list of those links. There may also be very good books for your particular language combination desires. There will almost invariably be some very good tutorials that can be found via the great brain enhancer: Google.
Perl and C/C++
Perl and Java
Python and C/C++
Python and Java
Conclusion
Ruby is an excellent dynamic programming language that can tackle a wide range of problems, but it is always nice to be able to extend its functionality. When performance is a pretty big concern, the Ruby C API comes to the rescue. Ruby provides 'mkmf,' and a bevy of macros and functions that make it very easy to write C code that can be used from your Ruby application. RubyInline is a very convenient way to write a few functions in C as well. SWIG might be complicated, and might require some planning ahead in your C or C++ application to adapt it for use in a SWIG, but it does a good job of producing usable Ruby C API wrappers that be imported like any other of these methods.
References
- Dave Thomas, with Chad Fowler and Andy Hunt Programing Ruby 1.9, the Pragmatic Programmers, LLC, 2009