Home > Uncategorized > D and string formatting

D and string formatting

D standard library is completely unsuitable for outputting big amounts of text. That’s the conclusion I came to after profiling my BAM to SAM converter.

Let’s begin with the fact that there’re two implementations of formatting in the language, both living in std.format. One is doFormat function, another one is formattedWrite. Those two have some subtle difference in behaviour (http://d.puremagic.com/issues/show_bug.cgi?id=4532)

doFormat

It treats all strings as UTF-8 text. That causes a lot of overhead for simple ASCII strings. Profiling showed that about 1/3 of time is spent on this Unicode stuff.

Also I’m afraid to count how many bytes it allocates on stack:

void formatArg(char fc)
{
bool vbit;
ulong vnumber;
char vchar;
dchar vdchar;
Object vobject;
real vreal;
creal vcreal;
Mangle m2;
int signed = 0;
uint base = 10;
int uc;
char[ulong.sizeof * 8] tmpbuf; // long enough to print long in binary
const(char)* prefix = "";
string s;

/* here finally comes actual code */

It is used by all streams from std.stream module.

formattedWrite

This one is newer and uses more template techniques, thus less runtime overhead, more is moved to compile time. It is used by std.stdio module. Here another problem arises. About 1/3 of time is spent in locking/unlocking output stream. Every time you use writef function, it locks, writes a few bytes, and then unlocks. Absolutely awful.

 

Hey, I don’t need all this fancy stuff! As a C++ guy, I don’t want to pay for features I don’t use! I want performance, damn it! I tried to use std.cstream which provides a stream interface around stdout. You think it worked? Ha-ha! Simple dout.printf(“%d”, 123) causes segfault on x86_64 machines. I posted a bug, of course, but with respect to string formatting, D streams are completely f*cked up.

Another cool thing is that std.stream is going to be deprecated at some time anyway. Why?! Ah, just because ranges are so good that stream interfaces have to be designed with ranges in mind. Sorry, but I wrote BAM format reader very quickly with std.stream, and I didn’t need any bloody ranges at all. Bytes, integers and floats are not that abstract, you know. Even for streams operating on text, I can’t imagine a situation where I would need anything more than iterating over lines, which is already presented in the standard library.

The same applies to container library. Better to have no containers at all than have badly designed ones, that’s what Alexandrescu believes in. And since he don’t have a lot of time to flesh out the design because of his family and such, everybody will write queues and stacks from scratch, and use associative arrays as sets, until finally the miracle occurs, after years of intense and deep thinking. Seems like D wants to become yet another Haskell, but with C-like syntax and a bit closer to metal. It’s not necessarily bad, but the process is too slow. It’s been several years since people started to ask for container library. What do we see now? A bunch of some strange stuff which even can’t be easily used in multithreaded applications. Array, for instance, is a reference-counted object which doesn’t use garbage collection. What the hell is it all about? Does it mean that they don’t even hope to make GC better than it is now? No, there’s even a GSoC project concerned with making GC precise. Personally, I don’t find it an issue, because 64-bit is the future. Though, as you can see from my experience with std.cstream, D language developers seem to not think so.

So, my dear friend, use std.c.stdio, std.c.stdlib, std.c.string, and enjoy real performance with fprintf!

Categories: Uncategorized Tags:
  1. No comments yet.
  1. No trackbacks yet.

Leave a comment