Archive for September, 2014

EU CodeFest 2014

September 22, 2014 Leave a comment

Yesterday morning I returned from the CodeFest, and it was a wonderful experience.

The format of the event is unusual. There are very few talks – in this year, about BioJS and BioNode, but they serve as a flavoring rather than the meat. The people are totally free to choose whatever they want to do. Of course, some structure is still needed, and for this matter at the very start the attendants organized into groups according to their fields of interest: BioJS, BioNode, RDF & semantic web, pipelines & HPC. Unsurprisingly, I stayed with the latter.

This division doesn’t at all mean there’s no communication between these groups. To the contrary, I noted for myself that seeing large groups raises curiosity about what’s so hot about their topic. Before the event, I had no idea what RDF (and SPARQL) is, but once I saw how much interest this topic attracts, it felt impossible to remain ignorant, so I approached one of guys sitting at the RDF table and asked him all sorts of novice questions about the technology, its current usage, performance and so on. My overall impression is that RDF vs relational DBs is like dynamic vs static typing in programming languages – it doesn’t require to think upfront about the data organization, thus being much easier to administer and play with.

Strikingly, main interest of about half of people was BioJS. As much as I don’t fancy JS, I have to admit its superiority when it comes to visualization. So, out of curiosity I attended the BioJS tutorial, from which I learned a few things. First of all, it was clarified that there’s a bit of a mess now because version 2.0 is not yet released but at the same time this is what everyone should use. Second, the tutorials and interactive examples at were highlighted, which look quite inspirational for newcomers. Finally, it’s interesting to observe how BioJS goes to decentralized development in the same way BioRuby did, so that joining the growing community is as easy as it gets, one just writes a plugin and makes it available through npm. And one is not forced to use JS, e.g. one of the most sophisticated plugins, MSA, is entirely in CoffeeScript! Overall, the presentation left the impression of a very impressive and well organized project.

It was also the first time I met a bioinformatics freelancer, Khalil from Belgium; and it is no wonder that he was one of (the one?) the oldest persons at the event—one has to build reputation in scientific community first, I believe. In some way, freelancing is fascinating because of the feel of freedom, but it doesn’t suit most of bioinformatics work; IMO it’s only applicable in the case of pure development, since any research usually requires some collaboration in order to be productive. Therefore, existence of freelancing in bioinformatics indicates just how vast the field is.

The last impressive piece is only related to the Codefest because of the hosting location, EBI. Pjotr and me met James Bonfield, the wizard of sequencing data compression and the author of CRAM implementation in HTSlib. We chiefly discussed  the CRAM format and its future. We agreed that it’s theoretically possible for BAM to fade out eventually, because flexibility of CRAM allows to write records in uncompressed form, making it suitable for piping. I also decided after the discussion that making another implementation goes against all possible principles of good programming, and I should reuse HTSlib in Sambamba and become a contributor to the library. I’ve heard from James that his implementation in Staden is somewhat better because the version in HTSlib slighly suffered from integration, but I’m not yet certain if it’s worthwhile to link to less popular library. So far I created D bindings to HTSlib, and was able to successfully read CRAM and convert its records into BAM (cram.d), and the performance is substantially better than that of my own implementation of CRAM reader (nevertheless, the latter served its purpose of understanding the format). The next step for me is to introduce more flexibility to HTSlib so that users can switch between the default thread pool implementation by James and any other implementations (in my case, it’s reusing thread pool from D standard library).

In summary, attending such events helps great for keeping in sync with recent developments, looking at things from a different perspective, and simply enjoying geek atmosphere. Kudos to the organizers!

I’m also very thankful for the Open Bioinformatics Foundation, my mentoring organization in GSoC 2012, for covering my travel & accommodation costs. It’s a remarkable example of a successful umbrella non-profit organization, associating with yet another one, Software in the Public Interest.

In this year, I participated in scoring GSoC 2014 proposals to O|B|F, and now that a month has passed since the end of the program, I must note that the average proposal scores correlate amazingly well with the quality of work & involvement of the students. Every year O|B|F meets my notion of success w.r.t. GSoC participation, in that at least one student continues contributing to their project after the end of summer. Hopefully this trend will continue!

Categories: Uncategorized