Unichar optimization: performance vs RAM usage. Your opinion?

Jun 08, 2004

Hauke Duden

Jun 08, 2004

Arcane Jill

Jun 08, 2004

Jun 09, 2004

Jun 08, 2004

Jun 08, 2004

Jun 08, 2004

Jun 08, 2004

Jun 08, 2004

I had an idea last night on how to optimize the unichar module to only use about 50 KB instead of the current 2 MB of RAM at runtime. It may also be possible to reduce the impact on the executable size (12 KB currently) but I haven't worked that out fully yet. Unfortunately the change would make the lookup code slower. For example, translating "chr" to lower case would have to be changed from return lowerData[chr] to return chr + lowerData[chr>>9][chr & 511]; In other words instead of one memory lookup it will do two memory lookups, a shift, and AND and an addition. It is not THAT expensive, but in an inner loop it might add up. And people have lots of RAM nowadays - 2 MB is a small amount and since only small parts of it will be actually used (depending on the language the program is working with) most of it will be paged to the disk by the OS. So what do you think: is the lower RAM usage worth the performance decrease? I could also implement both versions and select one at compile time, but then the question is which one is the default? Hauke

In article <ca3vvv$gpf$1@digitaldaemon.com>, Hauke Duden says... >return lowerData[chr] Huh? There are over a million codepoints in Unicode (1,114,112, in fact). That must be a ----ing HUGE lookup table! Even if the data were only one byte per codepoint, that's a meg already. Of course it's reasonable to compress that. There are huge swathes of codepoints that are just totally unused (yet), and many, many blocks in which all characters have identical properties. One humungous lookup table is definitely not how I would have done it. Maybe I'm just not enough of a speed freak. Jill

In article <ca3vvv$gpf$1@digitaldaemon.com>, Hauke Duden says... >return chr + lowerData[chr>>9][chr & 511]; You can get faster than that. > union U > { > dchar c; > ubyte[3] n; > } > > U u; > u.c = chr; > return lowerData[u.n[2]][u.n[1]][u.n[0]]; No bitshifting. You just need tables for blocks of 256 characters, then tables for blocks of 256 pointers to these, then tables for blocks of 17 pointers to these. Duplicate blocks (in particular, those for unassigned codepoints) can simply be the same block. Well, that's how I would have done it. Jill

In article <ca46cu$rfj$1@digitaldaemon.com>, Arcane Jill says... > >> union U >> { >> dchar c; >> ubyte[3] n; >> } >> That's assuming little-endian architecture, of course. You'd need two versions to cover both possibilities. Jill

Arcane Jill wrote: > In article <ca3vvv$gpf$1@digitaldaemon.com>, Hauke Duden says... > > >>return chr + lowerData[chr>>9][chr & 511]; > > > You can get faster than that. > > >> union U >> { >> dchar c; >> ubyte[3] n; >> } >> >> U u; >> u.c = chr; >> return lowerData[u.n[2]][u.n[1]][u.n[0]]; > > > No bitshifting. You just need tables for blocks of 256 characters, then tables > for blocks of 256 pointers to these, then tables for blocks of 17 pointers to > these. Duplicate blocks (in particular, those for unassigned codepoints) can > simply be the same block. > > Well, that's how I would have done it. Splitting the data into blocks and using the same memory for equal blocks is indeed the main idea behind the optimization. But doing 3 memory lookups with byte indices is probably slower than doing the bit shift and 2 lookups. Memory is usually a bottleneck and shifts have been 1-cycle operations for ages (even on the 386 IIRC). Pipelining is also possible to some degree. Also, byte-based operations are slower than int-based ones on many CPUs. Hauke

Arcane Jill wrote: > In article <ca3vvv$gpf$1@digitaldaemon.com>, Hauke Duden says... > > >>return lowerData[chr] > > > Huh? There are over a million codepoints in Unicode (1,114,112, in fact). That > must be a ----ing HUGE lookup table! Even if the data were only one byte per > codepoint, that's a meg already. Not quite ;). If you look at the data then you'll see that only the first ~200.000 actually have case mappings. That translates to about 2 MB of needed RAM (for all tables together). > Of course it's reasonable to compress that. There are huge swathes of codepoints > that are just totally unused (yet), and many, many blocks in which all > characters have identical properties. One humungous lookup table is definitely > not how I would have done it. Maybe I'm just not enough of a speed freak. Dividing the Unicode range into pages and using the same memory for identical ones is exactly the optimization I think about doing. The question is if the modest (by today's standards) decrease in needed RAM is worth the decrease in performance. Hauke

It's such a huge reduction in memory usage, I'd go for it. Reducing memory consumption has its own associated performance speedups - less swapping, less memory for the gc to scan, etc. Keep the faster version there with a version {} statement, though, in case experience shows otherwise.

I think what Walter means to say is that it'll never make it into Phobos using 2mb of ram. Yes? John On Tue, 08 Jun 2004 09:23:08 -0700, Walter wrote: > It's such a huge reduction in memory usage, I'd go for it. Reducing memory consumption has its own associated performance speedups - less swapping, less memory for the gc to scan, etc. Keep the faster version there with a version {} statement, though, in case experience shows otherwise.