CSC/ECE 517 Fall 2010/ch3 3i MM

From Expertiza_Wiki
Revision as of 19:06, 15 October 2010 by Mamatthe (talk | contribs) (→‎Ruby 1.9.1)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction

Computer programs are made up of little more than instructions and data. How those instructions and data come into being is irrelevant to the hardware on which the program is executed. Many different computer programming languages exist[1] that can be compiled or interpreted into executable instructions. These languages are largely split into the realms of dynamic, and static languages.

Ruby is a high-level, dynamic programming language that is useful for everything from short scripting assignments, to entire web applications and desktop GUI applications. However, there are inherent downsides to using Ruby. Although Ruby is generally far more flexible due to its dynamic, interpreted nature, using C or C++ code for some tasks can improve memory usage, and raw execution speed[2].

However, what if you could combine the good performance aspects of C, with with the dynamic elements of Ruby? It seems like that should be possible, since the primary Ruby interpreter is actually written in C. Well, luckily the maintainers of Ruby have thought ahead, and have made that possible in several different ways. There are ways to extend Ruby's capabilities with C code, and there are ways to use Ruby functions within your own C application. Prior wiki chapters have focused on mixing Ruby and Java primarily, but this page will attempt to cover mixing Ruby and C in more depth, and provide resources for mixing other dynamic and static languages.

Reasons to Mix Static and Dynamic Languages

Ruby, Perl and Python are very easy to use and generally easier to maintain than static language projects, but sometimes a project may already have elements written in C, C++ or Java. Sometimes there are vendor supplied compiled libraries that you would like to use in your Ruby application. In those cases, you may want to write a thin wrapper layer in the C or Java API for Ruby or Python around those already existing libraries, and continue the main part of your application in the dynamic language.

Sometimes the reverse is true. If a large project exists in a dynamic language, and you would like to use parts of that in a different application written in a static language, you may write small hooks into the dynamic code using the same APIs from before.

Additionally, you may like to allow hooks into your primarily compiled language application so that one could use a scripting language. Finally dynamic language bindings to existing compiled projects are a popular way to use, for example, entire GUI toolkits in a dynamic language.

Reasons to use a static language

In a mixed language environment, these are reasons you might want to use the static language:

  • Execution speed[3]
  • Memory efficiency
  • Existing 3rd party/IHV libraries
  • Existing internally developed libraries
Reasons to use a dynamic language

In a mixed language environment, these are reasons you might want to use the dynamic language:

  • Easier to get started (subjective)
  • Easier to modify and maintain (subjective)
  • Useful 3rd party libraries far too difficult to replicate in a static language
  • Large existing internally developed project
  • You really do not mind requiring a local interpreter.
  • Any static pieces can be easily wrapped with the dynamic language's static API

Those are very abstract reasons to use a particular type of language, and there are still more reasons to use a specific language. The list is intended simply as an abstract guide. To aid in understanding the concept of mixing static and dynamic languages, here is a quick list of examples mixing

Example Situations for Mixing Static and Dynamic code
  • JSON parsing is 10x faster in C than in Python[4].
  • cPickle vs pickle in Python.
  • Language bindings to static, compiled language GUI toolkits (wxRuby,PyQt, QtRuby).
  • Large amounts of remotely hosted database code is already written in, e.g. Perl, but a desktop GUI application needs to access those databases.
  • Embedded devices require C or C++, but an out of band test harness may be written in any language, and a dynamic language would better fit that project.
  • Allow scripted extensions to your applicaiton, for example, Civilization IV, and Blender 3D.

The possibilities are truly endless. The key may be understanding that any dynamic language is usually written in some static language, which opens the door for mixing the two. Now, on to some actual implementations and techniques for doing this.

A comprehensive overview of every static and dynamic language combination does not seem prudent for this forum, as there are just too many. Instead, this page will attempt a brief overview of several techniques to mix Ruby and C, and will show one instance where large speedups are realized through the use of C in a Ruby program.

Using C from Ruby

Following are some overviews and examples of ways to create C or C++ extensions to Ruby.

Ruby C API

README.EXT

The README.EXT file contains the latest information and an overview of how to create Ruby extensions in C. It is an invaluable source of information, and is included in any source code distribution of Ruby. The link is to the latest HEAD version in Ruby's official subversion repository, but you may want to read the version that came with your installed version of Ruby. Additionally, the README.EXT file contains information about how to use Ruby from a C program, which is also mentioned below.

In Unix-like distributions, for example, this file may be installed in '/usr/share/doc/ruby1.8-dev/README.EXT.gz'. To read, from a command prompt type:

> zless /usr/share/doc/ruby1.8-dev/README.EXT.gz

VALUE (a Ruby type in C)

In C, every variable has a type. In Ruby, everything is an object. To bridge the gap between C's static types and Ruby's dynamic objects, Ruby creators came up with the VALUE typedef in C. In the Ruby C API, when a variable is of type "VALUE, you know that it either comes from the Ruby side of the program, will be returned to Ruby, or will be used by the Ruby side in some form or fashion [5]. The typedef VALUE is defined in ruby.h, and is typically an unsigned long. The Ruby interpreter uses VALUE as either a pointer to a larger Ruby data type, or - in the case of more primitive types, such as Fixnums, booleans, and the NilClass - to actually the value itself.

Example

This example shows a simple way to return the value "10" from a C function to Ruby when the method is called[6].

mytest.c

The C code below is the functional part of the Ruby C extension for this example[7].

// Include the Ruby headers and goodies
#include "ruby.h"

// Defining a space for information and references about the module to be stored internally
VALUE MyTest = Qnil;

// Prototype for the initialization method - Ruby calls this, not you
void Init_mytest();

// Prototype for our method 'test1' - methods are prefixed by 'method_' here
VALUE method_test1(VALUE self);

// The initialization method for this module
void Init_mytest() {
	MyTest = rb_define_module("MyTest");
	rb_define_method(MyTest, "test1", method_test1, 0);
}

// Our 'test1' method.. it simply returns a value of '10' for now.
VALUE method_test1(VALUE self) {
	int x = 10;
	return INT2NUM(x);
}

This is relatively complex. The function that we will actually use directly from Ruby is the "method_test1(VALUE self)" function. You will notice that it returns a VALUE, which every function must do whether you return nil or real value (?). Also, every function must take at least one VALUE parameter, whether it uses it or not because Ruby will always call it with at least one parameter. In this function, we call that VALUE self.

Every Ruby extension in C must also have an "Init_X" method, where X is the name that you will load with "require," from Ruby. Inside this Init method, we register a module name, and store that module instance in C with the MyTest value and the rb_define_module method. We then attach a method to that modeul with the rb_define_method call, and register a name for that method, "test1," give it the C function, "method_test1." We will eventually call this method from Ruby by calling MyTest.test1, so it is important to see where those names and functions are being connected here.

Also note the implementation of method_test1. The last line returns the VALUE produced by applying the macro INT2NUM applied to the "x" integer. The macro INT2NUM, as its name implies, takes a C integer and converts it to a Fixnum type for Ruby. In this case, since Fixnum is a "small" value to Ruby, the "VALUE" returned by method_test1 does not contain a pointer to an object anywhere, but rather the integer value stored in a special way in the unsigned long "VALUE" instance.

extconf.rb

Ruby uses the mkmf package to make it very easy to create a Makefile for a Ruby C extension. The following is an example[8]:

# Loads mkmf which is used to make makefiles for Ruby extensions
require 'mkmf'

# Give it a name
extension_name = 'mytest'

# The destination
dir_config(extension_name)

# Do the work
create_makefile(extension_name)

The important thing here is that the "extension_name" variable, or anything passed to "dir_config" and "create_makefile" must match the module name that you want to load with "require" later in Ruby.

To create the Makefile for this extension, run the following command:

$ ruby extconf.rb 
creating Makefile

You should see nothing but the message "creating Makefile" printed out if everything is successful. From there, simply type "make," and the C extension in mytest.c will be compiled into the "mytest" shared library. This library will be what is included in Ruby in the next step.

test.rb

This is a very simple Ruby script that loads the shared library Ruby C extension that we just created, and executes our test method:

require 'mytest'
include MyTest

puts test1

Output

$ ruby test.rb
10

As you can see, the module that we had to include was "MyTest," as specified in the call to rb_define_module("MyTest") in mytest.c. The method that we called was "test1," as specified as the second argument in the call, "rb_define_method(MyTest, "test1", method_test1, 0)." There really is not a lot of code here. Ruby makes it very easy to create the Makefile for your extension, and to include it in your Ruby project. You also do not need to jump through a bunch of hoops to return values from C functions, or specify arguments to C functions.

RubyInline

RubyInline is a gem that is designed to allow easy C extensions through embedding C code directly in Ruby classes themselves. These embedded C functions are compiled at runtime when needed, and work just as fast as if they were standalone C extensions. See below for more explanation.

Installation

RubyInline is a separate gem, available through rubygems, and can be installed thusly:

> gem install RubyInline

Example

This is a simple example that does the same as the C API example: it prints "10," as it is returned from a C function[9].

require 'rubygems'
require 'inline'

class Example
    inline(:C) do |builder|
        builder.c "int method_test1() {
            int x = 10;
            return x;
        }"
    end
end

puts Example.new.method_test1

To do anything with RubyInline you must include the "inline" module. The next step is to add the inline method to a given class -- this case, the "Example" class. The inline method in the Example class yields a "builder" for the request language - in this case, ":C." The builder for C has a method, "c", to which you pass a string containing valid C code that you wish to have compiled. There is also a "c_raw" method that can take C code that is already written using the Ruby C API (e.g., uses VALUE, and such). After the string is set for the builder, control returns to the inline method, which calls "build," and "load" on the given builder. This builds the string in a hidden directory within your project, and loads the specified method into the class where the inline method was defined. This seems complicated, but taking a look at the RubyInline source on the RubyForge page is worthwhile. Note, RubyInline keeps enough information to know when the C source code has changed and needs to be recompiled, otherwise all subsequent runs will use the already compiled shared object.

As of this writing, it appears that there are only builders for C and C++, but the option is available to develop your own builders for any other language that you wish.

SWIG

SWIG is a popular wrapper generator that can produce wrappers for C and C++ code in several high level programming languages, including Perl, Python, and Ruby.

Example

There is a really good page for SWIG examples using Ruby here. The first simple example is a good one though, and there a few things that could be added, which will be below.

Simple Example

This is a simple setup to create a "greatest common divisor" function in C, and use it in Ruby by way of a SWIG-generated wrapper [10]. This is example also adds a global variable, "Foo."

example.c
/* File : example.c */

/* A global variable */
double Foo = 3.0;

/* Compute the greatest common divisor of positive integers */
int gcd(int x, int y) {
  int g;
  g = y;
  while (x > 0) {
    g = x;
    x = y % x;
    y = g;
  }
  return g;
}

This snippet of C creates a function called "gcd," which takes two integers, and returns the greatest common denominator. It also declares a global variable, "Foo," which we would like to use from Ruby. Next, we must create a SWIG interface file so we can use it to generate the wrapper.

example.i
%module example

extern int gcd(int x, int y);
extern double Foo;

This is a very simple file, and it's all that we need to generate many different language wrappers for this C function. All we need to do is tell SWIG what the module name is, and declare what resemble forward values for the function, and the global value.

Generating example_wrap.c

Now we can generate the Ruby wrapper by running the following command:

swig -ruby example.i

This creates a very large C file called "example_wrap.c." It may be useful to look over this file, but there is a lot going on there. In particular, the "_wrap_gcd(int argc, VALUE *argv, VALUE self)" function that gets generated checks to makes sure it is called with the right number of values, and does argument type checking, which is very useful. SWIG also built setter and getter functions for the "Foo" variable, which is nice.

Building this example

Unfortunately there are no fancy make-generating scripts to use with SWIG. You generally have to create and maintain your own Makefile. Rather than a whole makefile, however, here is a simple compilation line to build this example on an Ubuntu 10.04 x86_64 setup:

gcc -shared -fPIC -L/usr/lib -lruby1.8 -I/usr/lib/ruby/1.8/x86_64-linux/ example_wrap.c example.c -o example.so

If you get the following compilation error, there is a simple solution:

example_wrap.c: In function ‘_wrap_Foo_get’:
example_wrap.c:1953: error: ‘Foo’ undeclared (first use in this function)
example_wrap.c:1953: error: (Each undeclared identifier is reported only once
example_wrap.c:1953: error: for each function it appears in.)
example_wrap.c: In function ‘_wrap_Foo_set’:
example_wrap.c:1966: error: ‘Foo’ undeclared (first use in this function)

The solution was to add the following line somewhere before the first use of the Foo variable in the generated "example_wrap.c" file:

extern Foo;

If you receive the same error, try adding that line. This error occurred in SWIG Version 1.3.40.

Ruby script using SWIG module

The build should produce a shared object library called "example.so," which can be used just like the .so libraries produced by the Ruby C API mkmf process. The following is a short script that uses the gcd function, and the Foo variable:

require 'example'

puts Example.gcd(10,20)
puts Example.gcd(120, 160)
puts Example.Foo

Output:

10
40
3.0

The test script is easy enough to follow. The Example.gcd() function gets called just as if it were a module method written in Ruby or with the Ruby C API.

This should be some demonstration of the power of SWIG. Rather than writing an extension in the native API of Ruby, with very little additional SWIG code, this simple C function can be made usable from Ruby, or from Python, Perl, or any language that SWIG supports. Traditionally, however, SWIG wrappers are much more complex, particularly for C++ applications.

Performance Comparison

This is a short test to compare an iterative calculation of PI in both a C function, and Ruby. The test yielded quite surprising results. This is a relatively inefficient way to calculate PI, but it works, nonetheless.

C PI Calculation

#include "ruby.h"
VALUE PiCalc_C = Qnil;

static VALUE pi_calc_c(VALUE self)
{
    int numPartitions = 12000;
    int circleCount = 0;
    double interval = 0, pi = 0;
    int i = 0, j = 0;
    double a, b;

    interval = 1.0/(double)numPartitions;

    for (i = 0; i < numPartitions; i++) {
        a = (i + .5)*interval;
        for (j = 0; j < numPartitions; j++) {
            b = (j + .5)*interval;
            if ((a*a + b*b) <= 1) circleCount++;
        }
    }

    pi = (double)(4*circleCount)/(numPartitions * numPartitions);
    return rb_float_new(pi);
}


// The initialization method for this module
void Init_pi_calc_c() {
    PiCalc_C = rb_define_module("PiCalc_C");
    rb_define_method(PiCalc_C, "pi_calc_c", pi_calc_c, 0);
}

As you may have noticed, the Init_pi_calc_c() function simply creates a module called "PiCalc_C," instead of a class. It loads the "pi_calc_c" method into that module. Notice that the pi_calc_c(VALUE self) function still needs to take one argument no matter what.

RubyInline PI Calculation

require 'rubygems'
require 'inline'

module PiCalcRubyInline
    inline(:C) do |builder|
        builder.c "
double pi_calc_rubyinline()
{
    int numPartitions = 12000;
    int circleCount = 0;
    double interval = 0, pi = 0;
    int i = 0, j = 0;
    double a, b;

    interval = 1.0/(double)numPartitions;

    for (i = 0; i < numPartitions; i++) {
        a = (i + .5)*interval;
        for (j = 0; j < numPartitions; j++) {
            b = (j + .5)*interval;
            if ((a*a + b*b) <= 1) circleCount++;
        }
    }

    pi = (double)(4*circleCount)/(numPartitions * numPartitions);
    return pi;
}"
    end
end

Note that the RubyInline version is substantially similar to the C function, but it returns a double instead of a VALUE, and it does not need to specify an unused "self" parameter in its method signature.

Ruby PI Calculation

module PiCalcRuby
    def pi_calc_ruby
        numPartitions = 12000
        circleCount = interval = pi = 0.0
        interval = 1.0/numPartitions;
        
        for i in 0..numPartitions do
            a = (i + 0.5)*interval
            for j in 0..numPartitions do
                b = (j + 0.5)*interval
                if ((a*a + b*b) <= 1) then
                    circleCount += 1
                end
            end
        end
        pi = (4*circleCount)/(numPartitions * numPartitions)
    end
end

As you can see, this is a relatively inefficient nested loop function that calculates PI. There is a fair amount of floating point arithmetic.

Results

Test Harness:

#!/usr/bin/env ruby

require 'pi_calc_ruby'
require 'pi_calc_c'
require 'pi_calc_rubyinline'

include PiCalcRuby
include PiCalc_C
include PiCalcRubyInline

def c_test
    start = Time.now
    
    pi = pi_calc_c()

    stop = Time.now
    puts "C PI Result: #{pi}"
    puts "C Time: #{stop - start}"
end

def ruby_inline_test
    start = Time.now

    pi = pi_calc_rubyinline()

    stop = Time.now
    puts "RubyInline PI Result: #{pi}"
    puts "RubyInline Time: #{stop - start}"
end

def ruby_test
    start = Time.now

    pi = pi_calc_ruby()

    stop = Time.now
    puts "Ruby PI Result: #{pi}"
    puts "Ruby Time: #{stop - start}"
end

c_test
ruby_inline_test
ruby_test
Ruby 1.8.3

$ ruby1.8 perf_test.rb 
C Time: 1.193169
RubyInline Time: 1.192995
Ruby Time: 274.178188
Ruby 1.9.1

$ ruby1.9.1 -rubygems perf_test.rb 
C Time: 1.192857364
RubyInline Time: 1.192744329
Ruby Time: 175.468406517

As you can see from the test, Ruby-1.8 took 274 seconds to calculate PI, and Ruby-1.9.1 took 175.5 seconds, but the C function and RubyInline did it in about 1.9 seconds. Note, all three PI calculation version produced the same PI value: 3.14159491666667. This algorithm was designed to calculate PI to 8 decimal points, so the rest is superfluous.

Using Ruby from C/C++

In general, if you are writing a primarily C or C++ application, using Ruby and thus requiring a Ruby interpreter should be closely considered. Nevertheless, there are many applications that do this. In the case of applications like Civilization IV, Blender 3D, and others, this may be done to allow hooks into the application to allow scripted extensions.

This is a brief overview a simple technique for using Ruby functionality from C.

README.EXT

As with before, the README.EXT file, included in the Ruby source code, has a section on using Ruby features from C. Section 2.2 in that file is a brief overview of some techniques for doing this.

List of useful Ruby Functions

  • Evaluate Ruby code
VALUE rb_eval_string(const char* ruby_code)
  • Create a Ruby object steps:
ID class_id = rb_intern("class-name");
VALUE class = rb_const_get(rb_cObject, class_id);
VALUE obj = rb_class_new_instance(argc, argv, class);
  • Invoke a method
VALUE rb_funcall( VALUE receiver, ID method_id, int argc, ...)
VALUE rb_funcall2( VALUE receiver, ID method_id, int argc, VALUE* argv)

Examples

Simple String case change

The following example shows a string being converted to all-caps by using the Ruby string function "upcase."[11]

#include "ruby.h"

int main(int argc, char**argv)
{
    ruby_init();
    VALUE ruby_string = rb_str_new2("Hello, world!");
    printf("Original string: %s\n", RSTRING(ruby_string)->ptr);

    ID method_id = rb_intern("upcase");
    VALUE ruby_up_string = rb_funcall(ruby_string, method_id, 0);
    printf("Processed string: %s\n", RSTRING(ruby_up_string)->ptr);

    return 0;
}

Output:

Original string: Hello, world!
Processed string: HELLO, WORLD!

First, the Ruby interpreter must be initialized by calling "ruby_init()." Then, the rb_str_new2() function is used to create a Ruby string. An ID to the "upcase" method is retrieved using "rb_intern", and then called on the Ruby string. Since "VALUE" is really just a pointer to a real string, we have to use the RSTRING() macro to case the VALUE as a ruby string struct, then we access the ptr value of that struct, and print it. The printed output shows that the string has been converted into an uppercase string.

Other Mixed-language Combinations

In general, dynamic language interpreter maintainers will provide very good documentation for mixing it with the interpreter's native language. This is a list of those links. There may also be very good books for your particular language combination desires. There will almost invariably be some very good tutorials that can be found via the great brain enhancer: Google.

Perl and C/C++

Perl and Java

Python and C/C++

Python and Java

Conclusion

Ruby is an excellent dynamic programming language that can tackle a wide range of problems, but it is always nice to be able to extend its functionality. When performance is a pretty big concern, the Ruby C API comes to the rescue. Ruby provides 'mkmf,' and a bevy of macros and functions that make it very easy to write C code that can be used from your Ruby application. RubyInline is a very convenient way to write a few functions in C as well. SWIG might be complicated, and might require some planning ahead in your C or C++ application to adapt it for use in a SWIG, but it does a good job of producing usable Ruby C API wrappers that can be imported like any other of these methods.

References

  • Dave Thomas, with Chad Fowler and Andy Hunt Programing Ruby 1.9, the Pragmatic Programmers, LLC, 2009

External Links