Home > programming > [GSoC] Weekly report #0

[GSoC] Weekly report #0

Finally, OBF seems to be convinced about usefulness of new BAM parser. Thus now I can spend time on coding instead of writing/talking on why what I’m about to do is so important :)

I began to write code before the official start for two reasons. First, I just can’t wait to get my hands on this. The project is a great opportunity to learn more about D and Ruby, and what they’re capable of when being combined. By the way, it’s one of the goals we set for this summer — discover some best practices. Moreover, I’ll describe some techniques I already use, in this very post ;) The second reason is more prosaic. I’ll have exams in June, and therefore I have to do some stuff in advance. However, I believe I’ll deal with time management issues. Since July I’ll be able to work mostly full-time.

Now let’s move on to what was done during last week. The project repository is at https://github.com/lomereiter/sambamba. Directory structure is not well organized yet, I’ll do that a bit later.

The main struct, BamFile (bamfile.d), has just a constructor, method close() and, the most important at the moment, header property which returns SamHeader struct (defined in samheader.d).

Now I advise you to take a look at samheader.d. It is rather small file which begins with ~50 lines of rather crazy mixture of string mixins and mixin templates, and then…

mixin HeaderLineStruct!("SqLine",
          Field!("sequence_name", "SN"),
          Field!("sequence_length", "LN", uint),
          Field!("assembly", "AS"),
          Field!("md5", "M5"),
          Field!("species", "SP"),
          Field!("uri", "UR"));

That’s how I want to define structs for information from SAM header line fields. And D allows me to do that. This code generates struct definition, with static method parse() which takes a string and returns an instance of the struct. Really, if “each line is TAB-delimited and except the @CO lines, each data field follows a format ‘TAG:VALUE’ where TAG is a two-letter string <…>”, why should I write more than this tag and the type of the value? And since the value type is almost always a string, let’s make it default in the Field template ;-)

So, here’s the first useful technique: whenever you find some code to be too repetitive, see if it’s worthwhile to create a small DSL based on mixins. D makes code generation easy with compile-time function evaluation, almost as easy as writing Lisp macros.

How about writing Ruby FFI bindings to these structs? I decided to apply DRY principle again, and here’s my another idea: scaffolding. RoR guys have been doing automatic boilerplate generation for years, so why not apply this method for generating FFI bindings?

So I wrote a small Ruby code generator for D structs, which currently supports numeric types, strings, and arrays as struct fields. Now just a few lines of code will do:

import samheader;
mixin(import("utils/scaffold.d"));

import std.stdio;

void main() {
    File("bindings/scaffolds/SqLine.rb", "w").write(toRuby!SqLine);
    File("bindings/scaffolds/RgLine.rb", "w").write(toRuby!RgLine);
    File("bindings/scaffolds/PgLine.rb", "w").write(toRuby!PgLine);
    File("bindings/scaffolds/SamHeader.rb", "w").write(toRuby!SamHeader);
}
The results of generation are in bindings/scaffolds. Since Ruby has Open Classes, you can reopen any class and define/undefine its methods. For instance, I haven’t found a way to determine at compile-time whether a field is private or not, and you might want to undef corresponding methods. However, there is a thread on dlang.org with suggestion to add getProtection trait. Hopefully, it will be added to the language in the near future.
With a few lines of hand-coded bindings, we can use D functions from Ruby:

load './bindings/libbam.rb'

bam = BamFile.new "HG00125.chrom20.ILLUMINA.bwa.GBR.low_coverage.20111114.bam"
puts bam.header.pg_lines.map(&:program_name)

The code lacks tests and documentation, though, and that’s what I’ll be working on during the next week.

Advertisements
Categories: programming Tags: , ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: