Home > Uncategorized > On programming languages and HPC

On programming languages and HPC

Some people believe one should never rewrite old code. I’m not among them, and let me tell why, taking samtools as an example.

If you want your programs to scale on modern multicores, you must use parallelism. Free lunch is over, damn it! It’s great that samtools’ authors realize that, but hey, do you really want to add 40 lines of code each time you need a trivial parallel for loop?

Modern languages make such stuff easy to use, and moreover, when it’s in a standard library, whenever someone comes up with the idea how to make some routine faster, you get this performance boost automatically with the new version. Conversely, if this ‘someone’ is you, you will help not only yourself but the whole community. It’s all about not reinventing the wheel and code reuse.

I’m sorry to say this, but to me, a lot of C coders seem a bit like cavemen. I’ve seen a lot of examples of using home-made object system, or string library, or container/algorithm library. They live without generic programming, and when they come to the point where even macros are of no help, they roll out their own code generators in Ruby/Python/etc. The situation with OOP is a bit better because of GObject existence. As for multithreading, I would advise many of them to study OpenMP and look if it’s suitable for their tasks. (That’s KISS principle.) And only if it is not, this is the case for pthread.h.

Well, one could ask, how about reusing old code? The thing is, in this particular case, samtools were not designed to be used as API. All the existing bindings to dynamic languages arose out of necessity. To appreciate what a pain in the ass it is, I advise you to read how Pjotr Prins describes his experience with creating bindings. Seems like nobody ever gave a damn about refactoring.

Now, why are we so concerned about having API, shared libraries, and dynamic languages? Scientists love DSLs, and R is the brightest example for that. In the absence of DSL, one uses some dynamic language so as to concentrate on his/her tasks instead of fighting with language quirks. Many people also like interactiveness, i.e. having REPL environment with ability to visualize data they’re working with. But dynamic languages are usually slow, and that’s the reason we have to use compiled code. There is some progress in creating languages both fast and dynamic, e.g. Julia, but it’s far too experimental at the moment.

And we don’t want to duplicate efforts in creating language bindings. At the moment, we have SWIG for C/C++, and GObject Introspection for Vala is being developed. There’re some developments for D, namely, RuDy and PyD. D is great in that it allows to generate C wrappers at compile time — you shouldn’t use a separate parser, just use __traits keyword and std.traits module. And compile-time function evaluation works well enough to write even compile-time raytracer.

Another point why D matters, in my opinion, is generic programming. Science is all about abstractions. I feel uncomfortable writing low-level code and being unable to express them in the most appropriate way, not losing any execution speed along the road. C++11 is another good language, though the template syntax is weird and opportunities are limited compared to D. Other than that, I see no languages supporting compile-time generic programming.

So, as I see it, new high-performance scientific libraries should be written in Cilk Plus or D. In my opinion, that’s the best way for them to be maintainable, generic, extremely fast, and easily bindable altogether. Another option would be to have a lot of domain-specific languages compiling to native code or using JIT compiler, like BioScala. Time shall tell…

Advertisements
Categories: Uncategorized Tags:
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: