Home > Uncategorized > GSoC weekly report #7

GSoC weekly report #7

First version of gem

I’ve released first version of bioruby-sambamba gem, it’s available on rubygems.org. It works via Bio::Command, parsing JSON output produced by sambamba executable with the aid of Oj gem (that stands for ‘Optimized Json’).

At the moment, users have to compile sambamba tool themselves, and I realize that it’s not very convenient. I’ll look into producing binaries for all platforms without any dependencies so that gem will download the executable for user. Another reason for doing that is slow code generated by DMD. In general, with respect to D compilers, the more time it takes to install the compiler, the better speed you have :) I’ve tested my code with three compilers (GDC, LDC2 and DMD), and from my experience, GDC generates the fastest code, LDC2 produces a little slower executables, and DMD is the slowest among the three in that respect. Moreover, on a long pieces of code generated by Ragel, DMD with optimization flag (-O) turned on spends a lot of time in optimization phase because… its optimization phase is, uhm, not well optimized =\ (http://d.puremagic.com/issues/show_bug.cgi?id=7157)

So I’m going to compile stand-alone executables with GDC, though I’ll perhaps will need help of Iain Buclaw to install the compiler (the latest version of it is hosted on Github, and instructions on bitbucket.org are a bit outdated).


I decided to spend some time on refactoring so as to eliminate some design drawbacks in my code. One of them was very inconvenient API for working with SAM header. Since I didn’t initially intend to provide BAM/SAM output, I didn’t pay any attention to it at all. I looked at the ways Picard and BamTools implement this kind of functionality, and borrowed a few ideas from there. Now the code is much better, the user can add/remove reference sequence, read group, or program information (corresponding to @SQ, @RG and @PG lines in a header).

Also, after the code review when Marjan pointed out some problems with the current validation code, I began to refactor it. I have a feeling that the code can be made much more flexible (maybe some kind of Visitor pattern with walking through a tree of validators?). During the process of refactoring, I also improved the performance quite a bit, using profiler and eliminating most bottlenecks.  Validation became about 5x faster, so the speed is decent now.

Faster SAM parsing

I’ve said earlier about the opportunity to cut the time of parsing SAM by 20-25%, and I used that. Now the parsing code uses way less (re-)allocations.

Speaking about allocations, it’s only this week that I realized that static variables in D are thread-local. That makes me reconsider some pieces of code and replace some heap-allocated arrays with static arrays which I was erroneously avoiding before, thinking that it might cause multithreading-related issues. That might give another performance boost.

Nicer syntax for working with tag values

I’ve implemented better opCast() and opEquals() for Value struct, and now the user doesn’t need to do explicit conversions in almost all cases. For example, previously in order to compare a value holding a ubyte with an int, one had to say “if (to!ubyte(value) == 32) …”, and now it is just “if (value == 32) …”. Also, when assigning a value to a read tag, previously one had to explicitly call Value constructor while now it’s not needed: ‘read[“RG”] = “111111”;’ instead of ‘read[“RG”] = Value(“111111”)’

I’ll spend a few days more on refactoring, but the main task for this week is BAM indexing, i.e. producing BAI files from a sorted BAM file. Next logical step is sorting/merging, I hope to finish that by 20th July at most, and then prepare next version of gem. Maybe I’ll release another version in the middle of July when I’ll have working binaries for all platforms.

Categories: Uncategorized Tags:
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: