Archive

Posts Tagged ‘FFI’

Ruby: FFI or not FFI?

October 26, 2013 Leave a comment

I started making Ruby bindings for my SAM/BAM library, and it’s not at all clear whether to use FFI or good old C extension for MRI.

For you to get the clear picture, I’m going support only Linux and Mac OS X, distributing binary packages, because that’s the easiest option given the current state of DMD compiler infrastructure—it’s not available by default on almost every system, like GCC.

One factor is convenience. By that word FFI proponents usually mean that they are too lazy to sit and write some C code. But hey, since I’m going to distribute binary packages only, I can just use Rice which should be much easier.

Another important factor is interoperatibility. Well, I did some simple benchmarks and discovered that in MRI, the speed sucks if I use FFI, and in JRuby, simply calling Picard library gives the same or better performance (of course, if you’re aware of flags –server and -Xji.objectProxyCache=false). But more importantly, on JRuby the overhead with either FFI or Java Integration is huge =\ Namely, counting reads in a 2.5GB file took about 2 minutes, but adding computation of average read name length added another 2 minutes, giving total of 4! For comparison, it took only 2 minutes using PySAM, and this is the baseline that should be followed.

My conclusions from this are that
1) JVM is not well suited for dynamic languages, and this opinion is supported by JRuby developers. Hopefully, Topaz will mature eventually.
2) Overhead of FFI is too substantial to ignore in my particular case, where we want to work with lots of short reads.

So, I will go with Rice. The bonus part is that I will have to write C++ classes wrapping D functionality, which could theoretically be also used for CPython extension using Boost.Python.

EDIT:  compared bindings generated with Rice and SWIG, the latter wins. So, the full chain is D -> C bindings -> C++ wrapper -> SWIG wrapper -> Ruby

EDIT2: after some playing with SWIG I realized that I want ultimate control over what’s going on, and finally decided to write in good old C.

Categories: Uncategorized Tags: , ,

Bug hunting

April 21, 2012 Leave a comment

It might seem that everything is clear about D and Ruby interaction.

Not quite so, as it turns out. Let’s consider this (almost) trivial example:

test.d:

module test;
...
import std.array : array;
import std.algorithm : splitter;

extern (C) void foo() {
 array(splitter("abc", ' ')); // any string will do
}
...

test.rb:

#!/usr/bin/env ruby
require 'ffi'
module Bar
 extend FFI::Library
 ffi_lib './test.so'
 attach_function :foo, [], :void # bind our function
end
Bar.foo # let's call it!

(Full code and Makefile is available at https://gist.github.com/2366623)

Running this Ruby code leads to a segmentation fault with the following backtrace:

...
./test.so(+0x297d0) [0xf72017d0]
./test.so(_D3std5array18__T8AppenderTAAyaZ8Appender11newCapacityFkZk+0x33) [0xf7202f6f] /usr/include/d/std/array.d:1984
./test.so(_D3std5array18__T8AppenderTAAyaZ8Appender13ensureAddableMFkZv+0x66) [0xf7202e82] /usr/include/d/std/array.d:1960
./test.so(_D3std5array18__T8AppenderTAAyaZ8Appender12__T3putTAyaZ3putMFAyaZv+0x52) [0xf7203146] /usr/include/d/std/array.d:2007
./test.so(_D3std5array63__T5arrayTS3std9algorithm19__T8splitterTAyaTaZ8splitter6ResultZ5arrayFS3std9algorithm19__T8splitterTAyaTaZ8splitterFAyaaZS3std9algorithm19__T8splitterTAyaTaZ8splitter6Result6ResultZAAya+0x5e) [0xf7202b36] /usr/include/d/std/array.d:66
...

What the hell is going on?! Let’s investigate…

By looking at std.array source code near the lines mentioned in backtrace, and some playing with it (I, for example, inserted printf call before 1984th line to test which expression causes segfault), I managed to make our foo even simpler:

import core.bitop : bsr;

extern (C) void foo() {
 auto dummy = 0L / bsr(2);
}

Now, some simple observations.

  1. If you change 0L to 0 without ‘L’, you get no segfault.
  2. If you replace bsr(2) with its value, i.e. 1, you also get no segfault.
  3. If this construction is used in standalone D application, everything is also OK.
  4. If we don’t use the sophisticated scheme with Runtime.initialize() and Runtime.terminate(), everything also works fine. (Let me remind you that it’s necessary to initialize D runtime in order to be able to use its GC and most library functions.) That is, if you call attach and detach functions manually via FFI, it’s ok.
  5. If you replace / with +, -, or *, also no segfault.

Sounds completely crazy, doesn’t it?

Good news is that our observations contain a workaround: don’t use linking with C object file but use -shared flag of dmd instead, and call D runtime initialization function manually. I believe it should work. And, as a matter of fact, D should support “static this() {…}” as a function to be called at library loading. So basically, this workaround is just replacing one kludge with another.

Categories: programming Tags: ,

D & Ruby FFI, part 2: working with arrays and strings

April 10, 2012 Leave a comment

Firstly, string in D is nothing but immutable(char)[], that is, an array of particular type.
Secondly, the memory layout of an array is simple: size_t (for storing length) + pointer (which points to the array elements).

The immediate conclusion is that D strings are not necessarily zero-terminated. It’s not very convenient when we’re working via FFI; the bright side is, together with immutability that gives the D compiler some opportunities to optimize operations with strings.

Strings

Let’s play a bit with the struct from the previous post:

public struct Foo {
    public string hello = "hello, world!";
}

Using FFI::Struct, it can be represented in Ruby code as

class Foo < FFI::Struct
    layout :hello_length, :size_t,
           :hello,        :string

    def initialize
        @ptr = MyLibrary.foo_new
        ObjectSpace.define_finalizer @ptr, Foo.finalize(@ptr)
        super(@ptr) # init FFI::Struct with the pointer
    end
    # ... finalize stuff ...
end

Then we may access our string as Foo.new[:hello]. Luckily, it’s zero-terminated — as all string literals in D.

Let’s now do something involving slicing:

string str;
extern (C) immutable(char)* foo_hello(Foo* p) {
    str = std.array.split(p.hello)[0];
    return str.ptr;
}

If you now bind it to Ruby by means of

...
    attach_function :foo_hello, [:pointer], :string
...
    def hello
        MyLibrary.foo_hello @ptr
    end
...

you’ll see that Foo.new.hello returns “hello, world!” instead of expected “hello,”.

In order to return a zero-terminated string, use std.string.toStringz.
To do the converse (that is to convert char* into a string), use std.conv.to!string (for some reasons, std.string.toString was deprecated).

 

Arrays

Now let’s invent a method to send arrays from D.

For instance, the array of strings which we get after calling split. Firstly, we have to convert all the strings into zero-terminated ones. Then we can return a struct with array as a field. (I don’t know if it’s possible to return the array without packing it into a struct)

import std.algorithm : map;
import std.array : array;

alias immutable(char)* cstring;

struct WordsArray {
    cstring[] arr;
}

static WordsArray words;

extern (C) WordsArray foo_hello(Foo* p) {
    words.arr = array(map!(std.string.toStringz)(std.array.split(p.hello)));
    return words;
}

The Ruby code is also not very sophisticated:

class WordsArray < FFI::Struct 
    layout :length, :size_t, # dynamic array layout
           :data,   :pointer # 

    def to_a
        self[:data].get_array_of_string(0, self[:length])
    end
end

...
    attach_function :foo_hello, [:pointer], WordsArray.by_value
...

    def hello
        (MyLibrary.foo_hello @ptr).to_a
    end
...
Categories: programming Tags: , ,

D & Ruby FFI, part 1: mapping D structs onto Ruby classes

April 10, 2012 Leave a comment

This post starts series about interaction between D and Ruby via FFI. Several things are to be investigated.

Firstly, both languages have their own garbage collectors. Although we can disable D garbage collector manually for some classes/structs, that’s not convenient at all. We want to use existing D code without modifying it.

Also, D ABI is a bit more sophisticated than C ABI (some documentation is at http://dlang.org/abi.html). So we need to understand what is the layout of D objects and how to work with it effectively.


This post describes how to deal with garbage collection issues.

The method suggested below currently works with structs only(Although, structs can be passed by reference in D, and they are more lightweight than classes, so it’s not a big limitation.)

Suppose we want to create D struct from Ruby. The memory occupied by the struct can’t be managed by D garbage collector, because it’s the Ruby code which is to manage the memory, not D code. D functions can only hide some details of allocating/deallocating memory.

Let’s take this struct as an example:

public struct Foo {
    public string hello = "hello, world!";
}

We have to provide two functions for alloc/free following C calling conventions:

extern (C) void* foo_new() {
    Foo* p = cast(Foo*)malloc(Foo.sizeof);
    std.conv.emplace(p);

    GC.addRange(p, Foo.sizeof);
    return p;
}

extern (C) void foo_free(Foo* p) {
    GC.removeRange(p);
    free(cast(void*)p);
}

Allocating memory is easy – one calls corresponding external function via FFI, gets a pointer, and that’s it. Deallocation is a little bit trickier, since Ruby also has its own GC. Therefore, when we create the instance, we should also register the object’s finalizer which will be called by GC after the instance destruction. Notice this word, ‘after’. It means that we must copy the pointer to somewhere, otherwise GC just won’t be able to destroy the object because the finalizer will reference the object itself. (See http://www.mikeperham.com/2010/02/24/the-trouble-with-ruby-finalizers/ where this subtle thing is described in detail.)

So the source code of the Ruby part is something like this:

require 'ffi'

module MyLibrary
    extend FFI::Library
    ffi_lib './test.so'
    attach_function :foo_new, [], :pointer
    attach_function :foo_free, [:pointer], :void
end

class Foo
    def initialize
        @ptr = MyLibrary.foo_new
        ObjectSpace.define_finalizer @ptr, Foo.finalize(@ptr)
    end

    def self.finalize ptr
        proc { MyLibrary.foo_free ptr }
    end
end

Here I visualised the process of interaction of the two languages. You may open in another tab this gist to see all the code.

Another subtle thing is, how do we initialize D garbage collector? In order for it to work, we are to call core.runtime.Runtime.initialize, and when we’re done with the library, we are to call terminate() method

Theoretically, one could use “static this() { … }” and “static ~this() { … }” in D code. Practically, it doesn’t work yet. Thus we are forced to resort to some dirty tricks with gcc function attributes, namely, with __attribute__((constructor))/__attribute__((destructor)). (Read the discussion at http://stackoverflow.com/questions/9759880/automatically-executed-functions-when-loading-shared-libraries).

UPD: dirty tricks mentioned above can cause segfaults for no apparent reason. You better manually call the function initializing runtime, through FFI.

Categories: programming Tags: , ,