Wide characters support in D (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Wide characters support in D (page 4)

June 08, 2010

Re: Wide characters support in D

Posted by Yao G.
in reply to Ruslan Nikolaev

Yao G.

Posted in reply to Ruslan Nikolaev

Every time you reply to somebody, a new message is created. Is kinda difficult to follow this discussion when you need to look more than 15 separated messages about the same issue. Please check your news client or something.

Yao G.

On Tue, 08 Jun 2010 10:11:34 -0500, Ruslan Nikolaev <nruslan_devel@yahoo.com> wrote:

>>
>> Generally Linux systems use UTF-8 so I guess the "system
>> encoding" there will be UTF-8. But then if you start to use
>> QT you have to use UTF-16, but you might have to intermix
>> UTF-8 to work with other libraries in the backend (libraries
>> which are not necessarily D libraries, nor system
>> libraries). So you may have a UTF-8 backend (such as the
>> MySQL library), UTF-8 "system encoding" glue code, and
>> UTF-16 GUI code (QT). That might be a good or a bad choice,
>> depending on various factors, such as whether the glue code
>> send more strings to the backend or the GUI.
>>
>> Now try to port the thing to Windows where you define the
>> "system encoding" as UTF-16. Now you still have the same
>> UTF-8 backend, and the same UTF-16 GUI code, but for some
>> reason you're changing the glue code in the middle to
>> UTF-16? Sure, it can be made to work, but all the string
>> conversions will start to happen elsewhere, which may change
>> the performance characteristics and add some potential for
>> bugs, and all this for no real reason.
>>
>> The problem is that what you call "system encoding" is only
>> the encoding used by the system frameworks. It is relevant
>> when working with the system frameworks, but when you're
>> working with any other API, you'll probably want to use the
>> same character type as that API does, not necessarily the
>> "system encoding". Not all programs are based on extensive
>> use of the system frameworks. In some situations you'll want
>> to use UTF-16 on Linux, or UTF-8 on Windows, because you're
>> dealing with libraries that expect that (QT, MySQL).
>>
>
> Agreed. True, system encoding is not always that clear. Yet, usually UTF-8 is common for Linux (consider also Gtk, wxWidgets, system calls, etc.) At the same time, UTF-16 is more common for Windows (consider win32api, DFL, system calls, etc.). Some programs written in C even tend to have their own 'tchar' so that they can be compiled differently depending on platform.
>
>> A compiler switch is a poor choice there, because you can't
>> mix libraries compiled with a different compiler switches
>> when that switch changes the default character type.
>
> Compiler switch is only necessary for system programmer. For instance, gcc also has 'fshort-wchar' that changes width of wchar_t to 16 bit. It also DOES break the code casue libraries normally compiled for wchar_t to 32 bit. Again, it's generally not for application programmer.
>
>>
>> In most cases, it's much better in my opinion if the
>> programmer just uses the same character type as one of the
>> libraries it uses, stick to that, and is aware of what he's
>> doing. If someone really want to deal with the complexity of
>
> Programmer should not know generally what encoding he works with. For both UTF-8 and UTF-16, it's easy to determine number of bytes (words) in multibyte (word) sequence by just looking at first code point. This can also be builtin function (e.g. numberOfChars(tchar firstChar)). Size of each element can easily be determined by sizeof. Conversion to UTF-32 and back can be done very transparently.
>
> The only problem it might cause - bindings with other libraries (but in this case you can just use fromUTFxx and toUTFxx; you do this conversion anyway). Also, transferring data over the network - again you can just stick to a particular encoding (for network and files, UTF-8 is better since it's byte order free).
>
>> supporting both character types depending on the environment
>> it runs on, it's easy to create a "tchar" and "tstring"
>> alias that depends on whether it's Windows or Linux, or on a
>> custom version flag from a compiler switch, but that'll be
>> his choice and his responsibility to make everything work.
>
> If it's a choice of programmer, then almost all advantages of tchar are lost. It's like garbage collector - if used by everybody, you can expect advantages of using it. However, if it's optional - everybody will write libraries assuming no GC is available, thus - almost all performance advantages are lost.
>
> And after all, one of the goals of D (if I am not wrong) to be flexible, so that performance gains will be available for particular configurations if they can be achieved (it's fully compiled language). It does not stick to something particular and say 'you must use UTF-8' or 'you must use UTF-16'.
>
>> michel.fortin@michelf.com
>> http://michelf.com/
>>
>>
>
>
>


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

June 08, 2010

Re: Wide characters support in D

Posted by Walter Bright
in reply to Ruslan Nikolaev

Walter Bright

Posted in reply to Ruslan Nikolaev

Ruslan Nikolaev wrote:
> No. From the very beginning I said "it would also be nice to have some
> builtin function for conversion to dchar". That means it would be nice to
> have function that converts from tchar (regardless of its width) to UTF-32.
> The reason was always clear - you normally don't need UTF-32 chars/strings
> but for some character analysis you might need them.

http://www.digitalmars.com/d/2.0/phobos/std_utf.html

Function overloading takes care of selecting the right version.

June 08, 2010

Re: Wide characters support in D

Posted by Ruslan Nikolaev
in reply to Yao G.

Ruslan Nikolaev

Posted in reply to Yao G.

> Every time you reply to somebody, a
> new message is created. Is kinda difficult to follow this
> discussion when you need to look more than 15 separated
> messages about the same issue. Please check your news client
> or something.
> 
> Yao G.
> 
Sorry for that, I did not know there was some problem there. It looks there is some problem with web-based mail I am using and I do click "Reply". I need to check & fix it.

Just a last note regarding the topic:
Anyway, I already explained all my points. Others have good points, too. There can be good and bad reasons for tchar. It primarily depends on the:
1. You view D  as a language with single framework that behaves absolutely the same way on all platforms.
2. You allow some diversion from common view for the sake of better interoperability with system libraries.

In addition, tchar can be added to 3 already existent types. I doubt that it will hurt. If library developers prefer to work with native encoding, they can use it. Otherwise, they can provide templates that can be used for any of those 4 types. Finally, if someone wants to use something particular, s/he can use it.

It would be nice to hear something from Walter. If he says "no, in no way we need this", I am fine with this. The final decision, as you know, is made by the developer of the language.

Thanks.

June 08, 2010

Re: Wide characters support in D

Posted by Ruslan Nikolaev
in reply to Walter Bright

Ruslan Nikolaev

Posted in reply to Walter Bright

Yes, I know function overloading takes care of it. But my whole point was totally different. 'tchar' has nothing to do with overloading, and the rationale is totally different - provide a type depending on the target platform preferences.

Ruslan.

--- On Tue, 6/8/10, Walter Bright <newshound1@digitalmars.com> wrote:

> From: Walter Bright <newshound1@digitalmars.com>
> Subject: Re: Wide characters support in D
> To: digitalmars-d@puremagic.com
> Date: Tuesday, June 8, 2010, 8:36 PM
> Ruslan Nikolaev wrote:
> > No. From the very beginning I said "it would also be
> nice to have some
> > builtin function for conversion to dchar". That means
> it would be nice to
> > have function that converts from tchar (regardless of
> its width) to UTF-32.
> > The reason was always clear - you normally don't need
> UTF-32 chars/strings
> > but for some character analysis you might need them.
> 
> http://www.digitalmars.com/d/2.0/phobos/std_utf.html
> 
> Function overloading takes care of selecting the right version.
>

June 08, 2010

Re: Wide characters support in D

Posted by Nick Sabalausky
in reply to Andrei Alexandrescu

Nick Sabalausky

Posted in reply to Andrei Alexandrescu

"Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message news:hul65q$o98$1@digitalmars.com...
> On 06/08/2010 03:12 AM, Nick Sabalausky wrote:
>> "Nick Sabalausky"<a@a.a>  wrote in message news:huktq1$8tr$1@digitalmars.com...
>>> "Ruslan Nikolaev"<nruslan_devel@yahoo.com>  wrote in message news:mailman.128.1275979841.24349.digitalmars-d@puremagic.com...
>>>> In addition, C# has been released already when UTF-16 became variable length.
>>>
>>> Right, like I said, C#/.NET use UTF-16 because that's what MS had
>>> already
>>> standardized on.
>>>
>>
>> s/UTF-16/16-bit/  It's getting late and I'm starting to mix terminology...
>
> s/16-bit/UCS-2/
>
> The story is that Windows standardized on UCS-2, which is the uniform 16-bit-per-character encoding that predates UTF-16. When UCS-2 turned out to be insufficient, it was extended to the variable-length UTF-16. As has been discussed, that has been quite unpleasant because a lot of code out there handles strings as if they were UCS-2.
>

Ok, that's what I had thought, but then I started second-guessing, so I figured "s/UTF-16/16-bit/" was a safer claim than "s/UTF-16/UCS-2/".

June 08, 2010

Re: Wide characters support in D

Posted by Nick Sabalausky
in reply to dennis luehring

Nick Sabalausky

Posted in reply to dennis luehring

"dennis luehring" <dl.soluz@gmx.net> wrote in message news:hulqni$1ssj$1@digitalmars.com...
> please stop top-posting - just click on the post you want to reply and click then reply - your flooding the newsgroup root with replies ...
>
> Am 08.06.2010 17:11, schrieb Ruslan Nikolaev:
>>>
>>>  Generally Linux systems use UTF-8 so I guess the "system
>>>  encoding" there will be UTF-8. But then if you start to use

Speaking of top-posting... ;)

June 08, 2010

Re: Wide characters support in D

Posted by Ruslan Nikolaev
in reply to Nick Sabalausky

Ruslan Nikolaev

Posted in reply to Nick Sabalausky

Yeah... Exactly. I just verified our posts via web interface. Why did he blame me for top posting (at least it can be inferred from that my message has been addressed to)? I am simply replying to already existing messages.

Ruslan.

--- On Tue, 6/8/10, Nick Sabalausky <a@a.a> wrote:

> From: Nick Sabalausky <a@a.a>
> Subject: Re: Wide characters support in D
> To: digitalmars-d@puremagic.com
> Date: Tuesday, June 8, 2010, 9:50 PM
> "dennis luehring" <dl.soluz@gmx.net>
> wrote in message
> news:hulqni$1ssj$1@digitalmars.com...
> > please stop top-posting - just click on the post you
> want to reply and
> > click then reply - your flooding the newsgroup root
> with replies ...
> >
> > Am 08.06.2010 17:11, schrieb Ruslan Nikolaev:
> >>>
> >>>  Generally Linux systems use UTF-8 so I
> guess the "system
> >>>  encoding" there will be UTF-8. But then
> if you start to use
> 
> Speaking of top-posting... ;)
> 
> 
>

June 08, 2010

Re: Wide characters support in D

Posted by dennis luehring
in reply to Ruslan Nikolaev

dennis luehring

Posted in reply to Ruslan Nikolaev

Am 08.06.2010 19:55, schrieb Ruslan Nikolaev:
> Yeah... Exactly. I just verified our posts via web interface. Why did he blame me for top posting (at least it can be inferred from that my message has been addressed to)? I am simply replying to already existing messages.

sorry but - there are serveral others using the web-interface and you the only power-top-poster around - maybe you should switch over to thunderbird or something

> --- On Tue, 6/8/10, Nick Sabalausky<a@a.a>  wrote:
>
>>  From: Nick Sabalausky<a@a.a>
>>  Subject: Re: Wide characters support in D
>>  To: digitalmars-d@puremagic.com
>>  Date: Tuesday, June 8, 2010, 9:50 PM
>>  "dennis luehring"<dl.soluz@gmx.net>
>>  wrote in message
>>  news:hulqni$1ssj$1@digitalmars.com...
>>  >  please stop top-posting - just click on the post you
>>  want to reply and
>>  >  click then reply - your flooding the newsgroup root
>>  with replies ...
>>  >
>>  >  Am 08.06.2010 17:11, schrieb Ruslan Nikolaev:
>>  >>>
>>  >>>   Generally Linux systems use UTF-8 so I
>>  guess the "system
>>  >>>   encoding" there will be UTF-8. But then
>>  if you start to use
>>
>>  Speaking of top-posting... ;)
>>
>>
>>
>
>
>

June 08, 2010

Re: Wide characters support in D

Posted by Nick Sabalausky
in reply to Ruslan Nikolaev

Nick Sabalausky

Posted in reply to Ruslan Nikolaev

"Ruslan Nikolaev" <nruslan_devel@yahoo.com> wrote in message news:mailman.134.1276019725.24349.digitalmars-d@puremagic.com...
>Yeah... Exactly. I just verified our posts via web interface. Why did he blame me for top posting (at least it can be inferred from that my message has been addressed to)? I am simply replying to already existing messages.<

Sorry, I think I created some confusion:

What I think dennis was talking about (or am I mistaken?) was how all of your replies are being shown in tree-view as replying directly to the original post instead of being shown as a reply to the message that it *really* replies to. That makes the discussion hard to follow.

Then I came in and made a smart-ass comment about how he wrote his message above the quoted text instead of below the quoted text (usually we follow the convention here of writing below the quoted text).

So, two totally different things.

June 08, 2010

Re: Wide characters support in D

Posted by Ruslan Nikolaev
in reply to dennis luehring

Ruslan Nikolaev

Posted in reply to dennis luehring

No. New messages are definitely not created by me. You can verify it here: http://blog.gmane.org/gmane.comp.lang.d.general

You can easily see that in none of the top posts (except for the first one) my name appears first. In fact, you have just created another top post. I am only replying to other's comments.

Ruslan.

--- On Tue, 6/8/10, dennis luehring <dl.soluz@gmx.net> wrote:

> From: dennis luehring <dl.soluz@gmx.net>
> Subject: Re: Wide characters support in D
> To: digitalmars-d@puremagic.com
> Date: Tuesday, June 8, 2010, 10:11 PM
> Am 08.06.2010 19:55, schrieb Ruslan
> Nikolaev:
> > Yeah... Exactly. I just verified our posts via web
> interface. Why did he blame me for top posting (at least it can be inferred from that my message has been addressed to)? I am simply replying to already existing messages.
> 
> sorry but - there are serveral others using the
> web-interface and you
> the only power-top-poster around - maybe you should switch
> over to
> thunderbird or something
> 
> > --- On Tue, 6/8/10, Nick Sabalausky<a@a.a>  wrote:
> >
> >>  From: Nick Sabalausky<a@a.a>
> >>  Subject: Re: Wide characters support in D
> >>  To: digitalmars-d@puremagic.com
> >>  Date: Tuesday, June 8, 2010, 9:50 PM
> >>  "dennis luehring"<dl.soluz@gmx.net>
> >>  wrote in message
> >>  news:hulqni$1ssj$1@digitalmars.com...
> >>  >  please stop top-posting - just
> click on the post you
> >>  want to reply and
> >>  >  click then reply - your flooding
> the newsgroup root
> >>  with replies ...
> >>  >
> >>  >  Am 08.06.2010 17:11, schrieb
> Ruslan Nikolaev:
> >>  >>>
> >>  >>>   Generally
> Linux systems use UTF-8 so I
> >>  guess the "system
> >>  >>>   encoding"
> there will be UTF-8. But then
> >>  if you start to use
> >>
> >>  Speaking of top-posting... ;)
> >>
> >>
> >>
> >
> >
> >
> 
>

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation