Wide characters support in D (page 5) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Wide characters support in D (page 5)

June 08, 2010

Re: Wide characters support in D

Posted by dennis luehring
in reply to Ruslan Nikolaev

dennis luehring

Posted in reply to Ruslan Nikolaev

Am 08.06.2010 20:20, schrieb Ruslan Nikolaev:
> No. New messages are definitely not created by me. You can verify it here:
> http://blog.gmane.org/gmane.comp.lang.d.general
>
> You can easily see that in none of the top posts (except for the first one) my name appears first. In fact, you have just created another top post. I am only replying to other's comments.

but my newsreader (thunderbird and mail live) telling an different story

http://de.tinypic.com/r/mbjndc/6 (click on the image)

as you can see - no top-poster can beat you

i should give thunderbird a try - very good and nice with newsgroups

June 08, 2010

Re: Wide characters support in D

Posted by Nick Sabalausky
in reply to dennis luehring

Nick Sabalausky

Posted in reply to dennis luehring

"dennis luehring" <dl.soluz@gmx.net> wrote in message news:hum3fc$2dp6$1@digitalmars.com...
> Am 08.06.2010 20:20, schrieb Ruslan Nikolaev:
>> No. New messages are definitely not created by me. You can verify it
>> here:
>> http://blog.gmane.org/gmane.comp.lang.d.general
>>
>> You can easily see that in none of the top posts (except for the first one) my name appears first. In fact, you have just created another top post. I am only replying to other's comments.
>
> but my newsreader (thunderbird and mail live) telling an different story
>
> http://de.tinypic.com/r/mbjndc/6 (click on the image)
>
> as you can see - no top-poster can beat you
>
> i should give thunderbird a try - very good and nice with newsgroups
>

That link didn't show the image for me, but this one does: http://i50.tinypic.com/mbjndc.jpg

I get the same results as dennis in Outlook Express.

Also, that link from Ruslan seems to display in a blog-style, which is a really bizarre way to view a newsgroup.

June 08, 2010

Re: Wide characters support in D

Posted by Nick Sabalausky
in reply to Nick Sabalausky

Nick Sabalausky

Posted in reply to Nick Sabalausky

"Nick Sabalausky" <a@a.a> wrote in message news:hum6c8$2j0s$1@digitalmars.com...
> "dennis luehring" <dl.soluz@gmx.net> wrote in message news:hum3fc$2dp6$1@digitalmars.com...
>> Am 08.06.2010 20:20, schrieb Ruslan Nikolaev:
>>> No. New messages are definitely not created by me. You can verify it
>>> here:
>>> http://blog.gmane.org/gmane.comp.lang.d.general
>>>
>>> You can easily see that in none of the top posts (except for the first one) my name appears first. In fact, you have just created another top post. I am only replying to other's comments.
>>
>> but my newsreader (thunderbird and mail live) telling an different story
>>
>> http://de.tinypic.com/r/mbjndc/6 (click on the image)
>>
>> as you can see - no top-poster can beat you
>>
>> i should give thunderbird a try - very good and nice with newsgroups
>>
>
> That link didn't show the image for me, but this one does: http://i50.tinypic.com/mbjndc.jpg
>
> I get the same results as dennis in Outlook Express.
>

Well, more-or-less. This is what I'm getting: http://www.semitwist.com/download/wideCharNG.png

A standard newsgroup tree-view, except all of Ruslan's posts are immediate children of the original post. Everyone else's posts show up as proper replies. Yea, I am using Outlook Express, but I've never seen anyone else on this NG for which every one of their posts is always either first or second level.

> Also, that link from Ruslan seems to display in a blog-style, which is a really bizarre way to view a newsgroup.
>

June 08, 2010

Re: Wide characters support in D

Posted by bearophile
in reply to Walter Bright

bearophile

Posted in reply to Walter Bright

Walter Bright:
> The problem with dchar's is strings of them consume memory at a prodigious rate.

Warning: lazy musings ahead.

I hope we'll soon have computers with 200+ GB of RAM where using strings that use less than 32-bit chars is in most cases a premature optimization (like today is often a silly optimization to use arrays of 16-bit ints instead of 32-bit or 64-bit ints. Only special situations found with the profiler can justify the use of arrays of shorts in a low level language).

Even in PCs with 200 GB of RAM the first levels of CPU caches can be very small (like 32 KB), and cache misses are costly, so even if huge amounts of RAMs are present, to increase performance it can be useful to reduce the size of strings.

A possible solution to this problem can be some kind of real-time hardware compression/decompression between the CPU and the RAM. UTF-8 can be a good enough way to compress 32-bit strings. So we are back to writing low-level programs that have to deal with UTF-8.

To avoid this, CPUs and RAM can compress/decompress the text transparently to the programmer. Unfortunately UTF-8 is a variable-length encoding, so maybe it can't be done transparently enough. So a smarter and better compression algorithm can be used to keep all this transparent enough (not fully transparent, some low-level situations can require code that deals with the compression).

Bye,
bearophile

June 08, 2010

Re: Wide characters support in D

Posted by Walter Bright
in reply to bearophile

Walter Bright

Posted in reply to bearophile

bearophile wrote:
> Walter Bright:
>> The problem with dchar's is strings of them consume memory at a prodigious
>> rate.
> 
> Warning: lazy musings ahead.
> 
> I hope we'll soon have computers with 200+ GB of RAM where using strings that
> use less than 32-bit chars is in most cases a premature optimization (like
> today is often a silly optimization to use arrays of 16-bit ints instead of
> 32-bit or 64-bit ints. Only special situations found with the profiler can
> justify the use of arrays of shorts in a low level language).
> 
> Even in PCs with 200 GB of RAM the first levels of CPU caches can be very
> small (like 32 KB), and cache misses are costly, so even if huge amounts of
> RAMs are present, to increase performance it can be useful to reduce the size
> of strings.
> 
> A possible solution to this problem can be some kind of real-time hardware
> compression/decompression between the CPU and the RAM. UTF-8 can be a good
> enough way to compress 32-bit strings. So we are back to writing low-level
> programs that have to deal with UTF-8.
> 
> To avoid this, CPUs and RAM can compress/decompress the text transparently to
> the programmer. Unfortunately UTF-8 is a variable-length encoding, so maybe
> it can't be done transparently enough. So a smarter and better compression
> algorithm can be used to keep all this transparent enough (not fully
> transparent, some low-level situations can require code that deals with the
> compression).

I strongly suspect that the encode/decode time for UTF-8 is more than compensated for by the 4x reduction in memory usage. I did a large app 10 years ago using dchars throughout, and the effects of the memory consumption were murderous.

(As the recent article on memory consumption shows, large data structures can have huge negative speed consequences due to virtual and cache memory, and multiple cores trying to access the same memory.)

https://lwn.net/Articles/250967/

Keep in mind that the overwhelming bulk of UTF-8 text is ascii, and requires only one cycle to "decode".

June 08, 2010

Re: Wide characters support in D

Posted by Rainer Deyke
in reply to bearophile

Rainer Deyke

Posted in reply to bearophile

On 6/8/2010 13:57, bearophile wrote:
> I hope we'll soon have computers with 200+ GB of RAM where using strings that use less than 32-bit chars is in most cases a premature optimization (like today is often a silly optimization to use arrays of 16-bit ints instead of 32-bit or 64-bit ints. Only special situations found with the profiler can justify the use of arrays of shorts in a low level language).

Off-topic, but I don't need a profiler to tell me that my 1024x1024x1024 arrays should use shorts instead of ints.  And even when 200GB becomes common, I'd still rather not waste that memory by using twice as much space as I have to just because I can.


-- 
Rainer Deyke - rainerd@eldwood.com

June 08, 2010

Re: Wide characters support in D

Posted by Pelle
in reply to Ruslan Nikolaev

Pelle

Posted in reply to Ruslan Nikolaev

On 06/08/2010 08:20 PM, Ruslan Nikolaev wrote:
> No. New messages are definitely not created by me. You can verify it here:
> http://blog.gmane.org/gmane.comp.lang.d.general
>
> You can easily see that in none of the top posts (except for the first one) my name appears first. In fact, you have just created another top post. I am only replying to other's comments.
>
> Ruslan.
>

Speaking as someone with a tiny bit of knowledge about nntp, you are sending your messages without references; that would be top posting.

Please, fix. Your thread is all over the place.

June 08, 2010

Re: Wide characters support in D

Posted by Nick Sabalausky
in reply to Rainer Deyke

Nick Sabalausky

Posted in reply to Rainer Deyke

"Rainer Deyke" <rainerd@eldwood.com> wrote in message news:humes8$s8$1@digitalmars.com...
> On 6/8/2010 13:57, bearophile wrote:
>> I hope we'll soon have computers with 200+ GB of RAM where using strings that use less than 32-bit chars is in most cases a premature optimization (like today is often a silly optimization to use arrays of 16-bit ints instead of 32-bit or 64-bit ints. Only special situations found with the profiler can justify the use of arrays of shorts in a low level language).
>
> Off-topic, but I don't need a profiler to tell me that my 1024x1024x1024 arrays should use shorts instead of ints.  And even when 200GB becomes common, I'd still rather not waste that memory by using twice as much space as I have to just because I can.
>
>

I think he was just musing that it would be nice to be able to ignore multiple encodings and multiple-code-units, and get back to something much closer to the blissful simplicity of ASCII. On that particular point, I concur ;)

June 08, 2010

Re: Wide characters support in D

Posted by Nick Sabalausky
in reply to Nick Sabalausky

Nick Sabalausky

Posted in reply to Nick Sabalausky

"Nick Sabalausky" <a@a.a> wrote in message news:humfrk$2gk$1@digitalmars.com...
> "Rainer Deyke" <rainerd@eldwood.com> wrote in message news:humes8$s8$1@digitalmars.com...
>> On 6/8/2010 13:57, bearophile wrote:
>>> I hope we'll soon have computers with 200+ GB of RAM where using strings that use less than 32-bit chars is in most cases a premature optimization (like today is often a silly optimization to use arrays of 16-bit ints instead of 32-bit or 64-bit ints. Only special situations found with the profiler can justify the use of arrays of shorts in a low level language).
>>
>> Off-topic, but I don't need a profiler to tell me that my 1024x1024x1024 arrays should use shorts instead of ints.  And even when 200GB becomes common, I'd still rather not waste that memory by using twice as much space as I have to just because I can.
>>
>>
>
> I think he was just musing that it would be nice to be able to ignore multiple encodings and multiple-code-units, and get back to something much closer to the blissful simplicity of ASCII. On that particular point, I concur ;)
>

Keep in mind too, that for an English-language app (and there are plenty), even using ASCII still wastes space, since you usually only need the 26 letters, 10 digits, a few whitespace characters, and a handful of punctuation. You could probably fit that in 6 bits per character, less if you're ballsy enough to use huffman encoding internally. Yea, there's twice as many letters if you count uppercase/lowercase, but random-casing is rare so there's tricks you can use to just stick with 26 plus maybe a few special control characters. But, of course, nobody actually does any of that because with the amount of memory we have, and the amount of memory already used by other parts of a program, the savings wouldn't be worth the bother.

But I agree with your point too. Just saying.

June 09, 2010

Re: Wide characters support in D

Posted by Simen kjaeraas
in reply to Pelle

Simen kjaeraas

Posted in reply to Pelle

Pelle <pelle.mansson@gmail.com> wrote:

> On 06/08/2010 08:20 PM, Ruslan Nikolaev wrote:
>> No. New messages are definitely not created by me. You can verify it here:
>> http://blog.gmane.org/gmane.comp.lang.d.general
>>
>> You can easily see that in none of the top posts (except for the first one) my name appears first. In fact, you have just created another top post. I am only replying to other's comments.
>>
>> Ruslan.
>>
>
> Speaking as someone with a tiny bit of knowledge about nntp, you are sending your messages without references; that would be top posting.
>
> Please, fix. Your thread is all over the place.

Weird. I'm getting all his messages in their right places. Using
Opera's built-in newsreader.

-- 
Simen

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation