GSoC weekly report #4

Last week, I’ve implemented BAM output in my library. In order for it to be useful, I’ve also added support for alignment modification and creation.

BAM output

Currently, the interface consists of just one function, writeBAM, which takes following parameters:

  1. Output stream (it can wrap stdout if you wish)
  2. Text of SAM header
  3. Information about reference sequences (which comes after header)
  4. Range of alignments
  5. (Optional) Compression level, from -1 to 9. Default is -1 which stays for default compression in zlib. Zero corresponds to so-called uncompressed BAM, although it turned out to be zlib-compressed, but with zero compression.

For example of usage, see wiki: https://github.com/lomereiter/sambamba/wiki/Getting-started

Modification support

All tags, flags, and fields now can be modified. I have used a lot of bithacks to make it fast, especially with tag values which can hold integers, floats, chars, strings, or arrays of numbers.

In my code, every tag value is a tagged union. Type tags are designed in such a way that (typetag >> 5) is size of stored value (or element of stored array) in bytes, (typetag & 1) tells whether tag value is an array or not, and (typetag & 3) tells whether it is a string or not. This way, it is easy to copy data from value to stream, no more information is needed.

So it’s time to think about new project title, because BAMread is misleading now.

Now I’ll concentrate on filling wiki with details about what library is capable of, and making installation process more user-friendly. There’s a lot of functionality already, but it needs to be documented.

Meanwhile, I’ll create a branch for playing with Ragel and will work on adding SAM support.

