Home > Uncategorized > GSoC weekly report #8

GSoC weekly report #8

Indexing

Last week I implemented building BAI index for BAM files. In order to understand how indexing works I had to read source code of BamTools and Picard (samtools code is unreadable to me).

Finally I managed to get it working, and it exploits parallelism so that time of building index is roughly equal to the time of reading the file. I did some experiments on Pjotr’s server, and when files are located on a single HDD, my tool works about 40% faster than ‘samtools index’: one thread does all I/O stuff + indexing, while other threads unpack BGZF blocks. It would be interesting to see what’s the situation with SSD because they have much better reading performance. (Pjotr was going to test my tool on his 4-core laptop with SSD.)

Sorting

Also I recently began to implement sorting. It already works, and the speed is good enough and also limited by I/O since external sorting involves dealing with a lot of files. I use the simplest approach: first write sorted chunks to temporary directory, then merge them in one file.

However, I’m not yet satisfied with memory consumption. For sorting, I need to tweak some numbers in runtime in order not to exceed a given memory limit, and it’s a bit tricky, partly because it’s hard to predict anything with garbage collector, partly – because memory allocations are hidden under several levels of abstraction :) I’m now investigating this issue, it’s essential for decreasing the number of sorted chunks and thus getting larger buffers for reading these chunks in merge phase.

 

Advertisements
Categories: Uncategorized Tags:
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: