Home > Uncategorized > GSoC weekly report #10

GSoC weekly report #10

Filtering

Now sambamba supports custom filter expressions for specifying which alignments to process.

Example:

sambamba view mydata.bam -F "mapping_quality >= 60 and [MQ] >= 60 and mate_is_reverse_strand and [RG] == 'ERR016155' and sequence =~ /^ACTG/ and ([OP] != null or [OC] != null)" -c

As you can see, filter query syntax supports:

  • integer/string comparison for fields/tags
  • regex match operator
  • tag existence condition
  • and, or, not logical operators

Full description is available at sambamba wiki.

I also began working on providing Ruby wrapper for this syntax. After reading some pieces of Martin Fowler’ “Domain specific languages” book, I have appreciated the power of ‘instance_eval’ :-)

As of now, it looks like this (not pushed to the repo yet):

filter = Bio::Bam::AlignmentFilter.new {
  ref_id == 19

  flag.unmapped.is_unset
  flag.mate_is_unmapped.is_unset

  tag(:RG) =~ /ERR001*/i

  tag(:NM) > 0

  union {

    intersection {
      mapping_quality >= 40
      mapping_quality < 60
    }

    mapping_quality.is_unknown

  }
}

Looks kinda cool, but I’m still thinking of ways to reduce amount of dots and use spaces instead :-)

Progressbar

For me, it’s annoying to not have any progress indicator. In those cases where file size is known beforehand, it’s relatively easy to calculate current progress. The only issue is design.

What I came up with is as follows: the user can provide optional compile-time argument — a function with one argument, percentage. An important observation mentioned in code comments is that as float computations are not cheap, the user can make this argument lazy (http://blackwhale.github.com/lazy-evaluation.html) and compute it only, say, each 1000 alignments.

The working example is sambamba-index tool, where wget-like progressbar is outputted to STDERR. In the next few days I’ll add this functionality to the other utilities as well.

Advertisements
Categories: Uncategorized Tags:
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: