Ceci n'est pas une char (page 3)

April 07, 2006

Re: Ceci n'est pas une char

Posted by Jari-Matti Mäkelä
in reply to Thomas Kuehne

Permalink

Jari-Matti Mäkelä

Posted in reply to Thomas Kuehne

Permalink

Thomas Kuehne wrote:
> Jari-Matti wrote:
>>> That's very true. A "normal" hard drive reads 60 MB/s. So,
>>> reading a 4 MB file takes at least 66 ms and a 1 MB UTF-8-file (only
>>> ASCII-characters) is read in 17 ms (well, I'm a bit optimistic here :).
>>> A modern processor executes 3 000 000 000 operations in a
>>> second. Going through the UTF-8 stream takes 1 000 000 * 10 (perhaps?)
>>> operations and thus costs 3 ms. So it's actually faster to read UTF-8.
> 
> 1) your sample: English (consider Chinese)
> 2) magic word: seek

Yes, I know. This was just an optimistic tongue-in-the-cheek analysis :) A real world example would naturally have a lot of non-ASCII characters too, but the point is that reading huge loads of uncompressed UTF-32 data will be usually slower than reading UTF-8 if we are also checking against text corruptions. I wonder if it's any faster to read UTF-32-files from a transparently compressed reiser4 drive?

-- 
Jari-Matti

Forums