Thread overview | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
June 07, 2006 dchar unicode phobos | ||||
---|---|---|---|---|
| ||||
That D supports UTF is great, and by using dchar[] all Unicode code points can bee used. But phobos does not support dchar[]s adequately. (or wchar[]s for that matter) Wouldn't it bee expected of the language standard library to support all of the languages string encodings? Proposal: add wchar[] and dchar[] versions of the string functions in phobos (should this bee filed as a bug?) |
June 07, 2006 Re: dchar unicode phobos | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Granberg | On Wed, 07 Jun 2006 22:40:00 +1000, Johan Granberg <lijat.meREM@OVEgmail.com> wrote: > That D supports UTF is great, and by using dchar[] all Unicode code points can bee used. But phobos does not support dchar[]s adequately. (or wchar[]s for that matter) Wouldn't it bee expected of the language standard library to support all of the languages string encodings? > > Proposal: add wchar[] and dchar[] versions of the string functions in phobos > > (should this bee filed as a bug?) YES! I've had to recode many of them for dchar/wchar support. -- Derek Parnell Melbourne, Australia |
June 07, 2006 Re: dchar unicode phobos | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Granberg | In article <e66hf1$otn$1@digitaldaemon.com>, Johan Granberg says... > >That D supports UTF is great, and by using dchar[] all Unicode code points can bee used. But phobos does not support dchar[]s adequately. (or wchar[]s for that matter) Wouldn't it bee expected of the language standard library to support all of the languages string encodings? > >Proposal: add wchar[] and dchar[] versions of the string functions in phobos > >(should this bee filed as a bug?) Ya know, I never really thought about this, but you're right: D has three character types yet only has full library support for one of them. If you ask me, there's only so many ways to go about this: 1. Refactor std.string to use implicit templates 2. Branch std.string into three modules, one for each char type 3. Support all three char types via overloads within std.string Personally, I like #1 since it would be seamless to implement, and would require almost exactly as much code as is in use now. The only drawback here is centers around problems with distributing template code in libraries. Also, do you personally need this kind of support in your project? Have you looked at Mango? - EricAnderton at yahoo |
June 07, 2006 Re: dchar unicode phobos | ||||
---|---|---|---|---|
| ||||
Posted in reply to pragma | pragma wrote:
> In article <e66hf1$otn$1@digitaldaemon.com>, Johan Granberg says...
>> That D supports UTF is great, and by using dchar[] all Unicode code points can bee used. But phobos does not support dchar[]s adequately. (or wchar[]s for that matter) Wouldn't it bee expected of the language standard library to support all of the languages string encodings?
>>
>> Proposal: add wchar[] and dchar[] versions of the string functions in phobos
>>
>> (should this bee filed as a bug?)
>
> Ya know, I never really thought about this, but you're right: D has three
> character types yet only has full library support for one of them.
>
> If you ask me, there's only so many ways to go about this:
>
> 1. Refactor std.string to use implicit templates
> 2. Branch std.string into three modules, one for each char type
> 3. Support all three char types via overloads within std.string
>
> Personally, I like #1 since it would be seamless to implement, and would require
> almost exactly as much code as is in use now. The only drawback here is centers
> around problems with distributing template code in libraries.
>
> Also, do you personally need this kind of support in your project? Have you
> looked at Mango?
>
> - EricAnderton at yahoo
Yes I have needed support for dchar[] with functions like split , splitline and strip in std.string.
Yes your ways of doing the support looks ok, I would choose 3 thou instead of 1. It may bee because I'm not 100% sure about how 1 would work. (care to give an example)
No I have not looked closly at mango yet. (Will do)
|
June 07, 2006 Re: dchar unicode phobos | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Granberg | In article <e66qqr$1er5$1@digitaldaemon.com>, Johan Granberg says... > >pragma wrote: >> In article <e66hf1$otn$1@digitaldaemon.com>, Johan Granberg says... >>> That D supports UTF is great, and by using dchar[] all Unicode code points can bee used. But phobos does not support dchar[]s adequately. (or wchar[]s for that matter) Wouldn't it bee expected of the language standard library to support all of the languages string encodings? >>> >>> Proposal: add wchar[] and dchar[] versions of the string functions in phobos >>> >>> (should this bee filed as a bug?) >> >> Ya know, I never really thought about this, but you're right: D has three character types yet only has full library support for one of them. >> >> If you ask me, there's only so many ways to go about this: >> >> 1. Refactor std.string to use implicit templates >> 2. Branch std.string into three modules, one for each char type >> 3. Support all three char types via overloads within std.string >> >> Personally, I like #1 since it would be seamless to implement, and would require almost exactly as much code as is in use now. The only drawback here is centers around problems with distributing template code in libraries. >> >> Also, do you personally need this kind of support in your project? Have you looked at Mango? >> >> - EricAnderton at yahoo >Yes I have needed support for dchar[] with functions like split , >splitline and strip in std.string. >Yes your ways of doing the support looks ok, I would choose 3 thou >instead of 1. It may bee because I'm not 100% sure about how 1 would >work. (care to give an example) Sure. D will now try to implicitly instantiate templates where it finds them. So you can do this: /**/ template trim(TChar){ /**/ TChar[] trim(TChar[] src){ /* ... */ } /**/ } ..and the call to trim will still be as simple as the non-templated version: /**/ dchar[] foo,bar; /**/ foo = trim(bar); So we get to have our cake and eat it too. The onus is now placed on the compiler, as it will generate a distinct version of each template as needed. The astute observer will notice that any array type can be used as a parameter in the above example. Proper use of static if() and the 'is' operator can easily ensure that only char, wchar and dchar are being used. Template overloads, while verbose, are another way to go. - EricAnderton at yahoo |
June 07, 2006 Re: dchar unicode phobos | ||||
---|---|---|---|---|
| ||||
Posted in reply to pragma | pragma wrote:
> In article <e66hf1$otn$1@digitaldaemon.com>, Johan Granberg says...
>> That D supports UTF is great, and by using dchar[] all Unicode code points can bee used. But phobos does not support dchar[]s adequately. (or wchar[]s for that matter) Wouldn't it bee expected of the language standard library to support all of the languages string encodings?
>>
>> Proposal: add wchar[] and dchar[] versions of the string functions in phobos
>>
>> (should this bee filed as a bug?)
>
> Ya know, I never really thought about this, but you're right: D has three
> character types yet only has full library support for one of them.
>
> If you ask me, there's only so many ways to go about this:
>
> 1. Refactor std.string to use implicit templates
> 2. Branch std.string into three modules, one for each char type
> 3. Support all three char types via overloads within std.string
>
> Personally, I like #1 since it would be seamless to implement, and would require
> almost exactly as much code as is in use now. The only drawback here is centers
> around problems with distributing template code in libraries.
And the fact that template overloading and implicit templates just aren't ready for this kind of use. But I believe this is ultimately the correct solution.
Sean
|
June 07, 2006 Re: dchar unicode phobos | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Granberg | Johan Granberg wrote: > > Yes I have needed support for dchar[] with functions like split , splitline and strip in std.string. > Yes your ways of doing the support looks ok, I would choose 3 thou instead of 1. It may bee because I'm not 100% sure about how 1 would work. (care to give an example) > No I have not looked closly at mango yet. (Will do) Oskar has an array template library that can do much of this, and I have the beginnings of one in Ares as well. The source is here: http://svn.dsource.org/projects/ares/trunk/src/ares/std/array.d As you can see however, half the functions are commented out because template function overloading basically just doesn't work yet. Eventually however, I plan to add split, join, etc. These will probably all assume fixed-width elements, with improved support for char and wchar strings in a std.string module, as supporting variable width encoding will slow down the algorithms. Sean |
June 07, 2006 Re: dchar unicode phobos | ||||
---|---|---|---|---|
| ||||
Posted in reply to pragma | pragma wrote:
> So we get to have our cake and eat it too. The onus is now placed on the
> compiler, as it will generate a distinct version of each template as needed.
>
> The astute observer will notice that any array type can be used as a parameter
> in the above example. Proper use of static if() and the 'is' operator can
> easily ensure that only char, wchar and dchar are being used. Template
> overloads, while verbose, are another way to go.
>
> - EricAnderton at yahoo
Ok that was neat. (I have to look a bit more into templates)
Is their any special cases that need to bee handled.
Could utf32 have more posible symbols for line endings or withspace than utf8 or anything like that. It could bee handled with static if on a case by case basis thou.
|
June 07, 2006 Re: dchar unicode phobos | ||||
---|---|---|---|---|
| ||||
Posted in reply to pragma | pragma skrev: > In article <e66hf1$otn$1@digitaldaemon.com>, Johan Granberg says... >> That D supports UTF is great, and by using dchar[] all Unicode code points can bee used. But phobos does not support dchar[]s adequately. (or wchar[]s for that matter) Wouldn't it bee expected of the language standard library to support all of the languages string encodings? >> >> Proposal: add wchar[] and dchar[] versions of the string functions in phobos >> >> (should this bee filed as a bug?) > > Ya know, I never really thought about this, but you're right: D has three > character types yet only has full library support for one of them. > > If you ask me, there's only so many ways to go about this: > > 1. Refactor std.string to use implicit templates In http://www.digitalmars.com/d/archives/digitalmars/D/35455.html and other posts, I suggested a rough specification and a proof of concept implementation of implicit array templates that replace many of the functions in std.string with generic versions. If there is a definite interest in taking this path, I will gladly write a full generic replacement for std.string. Most of the functions in my earlier suggestion were aimed at a std.array module, and it it hard to draw a definite line between std.string and std.array. My current divider is something along the line of anything that only makes sense for text strings are in std.string, the rest in std.array. One suggestion was to make std.string aliases to the generic functions in std.array (for instance std.string.find -> std.array.find) > 2. Branch std.string into three modules, one for each char type > 3. Support all three char types via overloads within std.string > Personally, I like #1 since it would be seamless to implement, and would require > almost exactly as much code as is in use now. The only drawback here is centers > around problems with distributing template code in libraries. The template/library issues really need to be resolved. /Oskar |
June 07, 2006 Re: dchar unicode phobos | ||||
---|---|---|---|---|
| ||||
Posted in reply to pragma | pragma skrev: > In article <e66qqr$1er5$1@digitaldaemon.com>, Johan Granberg says... >> pragma wrote: >>> In article <e66hf1$otn$1@digitaldaemon.com>, Johan Granberg says... >>>> That D supports UTF is great, and by using dchar[] all Unicode code points can bee used. But phobos does not support dchar[]s adequately. (or wchar[]s for that matter) Wouldn't it bee expected of the language standard library to support all of the languages string encodings? >>>> >>>> Proposal: add wchar[] and dchar[] versions of the string functions in phobos >>>> >>>> (should this bee filed as a bug?) >>> Ya know, I never really thought about this, but you're right: D has three >>> character types yet only has full library support for one of them. >>> >>> If you ask me, there's only so many ways to go about this: >>> >>> 1. Refactor std.string to use implicit templates >>> 2. Branch std.string into three modules, one for each char type >>> 3. Support all three char types via overloads within std.string >>> >>> Personally, I like #1 since it would be seamless to implement, and would require >>> almost exactly as much code as is in use now. The only drawback here is centers >>> around problems with distributing template code in libraries. >>> >>> Also, do you personally need this kind of support in your project? Have you >>> looked at Mango? >>> >>> - EricAnderton at yahoo >> Yes I have needed support for dchar[] with functions like split , splitline and strip in std.string. >> Yes your ways of doing the support looks ok, I would choose 3 thou instead of 1. It may bee because I'm not 100% sure about how 1 would work. (care to give an example) > > Sure. D will now try to implicitly instantiate templates where it finds them. > So you can do this: > > /**/ template trim(TChar){ > /**/ TChar[] trim(TChar[] src){ /* ... */ } > /**/ } D is unfortunately not really that smart yet. You need exactly the same function argument types and in the same order as the template arguments. template trim(MyString) { MyString trim(MyString src) { /* */ } } works. > > ..and the call to trim will still be as simple as the non-templated version: > > /**/ dchar[] foo,bar; > /**/ foo = trim(bar); and even bar.trim() will work. > The astute observer will notice that any array type can be used as a parameter > in the above example. Proper use of static if() and the 'is' operator can > easily ensure that only char, wchar and dchar are being used. Template > overloads, while verbose, are another way to go. I don't really see any reason to limit string functions to char, wchar and dchar. Strings in other encodings (for instance latin1, iso8859-1), are readily encoded as ubyte[] or with a typedef:ed type. It would be useful to be able to work with such string too. I have myself several times cast latin1 strings into char[], just to be able to use one of the std.string functions on it before casting the result back into a ubyte[]. /Oskar |
Copyright © 1999-2021 by the D Language Foundation