Home > programming > Reading BAM files in D

Reading BAM files in D

It turned out to be really easy, using std.stream and std.zlib modules :)

Sequence of actions is as follows:

  1. Create an instance of BufferedFile, providing file name to its constructor.
  2. Then pass it to constructor of EndianStream, specifying little endian byte order
  3. Read BGZF blocks from this stream, according to their specification, and decompress them using std.zlib uncompress function (winbits must be -15).
  4. Then the range of decompressed blocks is being wrapped by subclass of Stream, using MemoryStream internally. To make a Stream subclass, you only need to define just three methods: readBlock, writeBlock, and seek. The latter two may just throw an exception in this partucular case. That took me as little as 64 lines of code.
  5. Now that we have this full-fledged stream, we again pass it to EndianStream constructor. And then we can read BAM data from it.
About 200 lines in total. That’s how powerful D standard library is. Much like Python ‘batteries’ ;-) And what I like the most, comparing D to C++, are ranges instead of STL iterators. They are really easier to write and easier to use, at least for iterating when the length is not known in advance. Calling empty() is way more intuitive than comparing with some idiotic default iterator indicating end of stream, like it’s done with istreambuf_iterator.

Now that I’ve learned how to read data from BAM, next step will be SAM header parsing. The header is typically small, about a hundred of lines, only in extraordinary cases being more than 4MB, so it makes little sense to optimize speed. Better to spend more time on validation part.

Advertisements
Categories: programming Tags: ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: