Casting between char[]/wchar[]/dchar[] - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Casting between char[]/wchar[]/dchar[]

Thread overview

Casting between char[]/wchar[]/dchar[]
Aug 05, 2006 Hasan Aljudy
Aug 05, 2006 kris
Aug 05, 2006 Walter Bright
Aug 05, 2006 Hasan Aljudy
Aug 05, 2006 kris
Aug 05, 2006 Jarrett Billingsley
Aug 05, 2006 Hasan Aljudy
Aug 05, 2006 Hasan Aljudy
Aug 05, 2006 Jarrett Billingsley
Aug 05, 2006 Serg Kovrov
Aug 05, 2006 Frits van Bommel
Aug 05, 2006 Derek Parnell
Aug 05, 2006 Derek Parnell
Aug 05, 2006 Hasan Aljudy

August 05, 2006

Casting between char[]/wchar[]/dchar[]

Posted by Hasan Aljudy

Hasan Aljudy

What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ?

When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()?

Here's an idea that should simplify much of string handling in D:
allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method.
I think this is perfectly safe; no data is lost, and string handling can become much more flexable.

Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[].

This is also relevies developers from writing templetized functions/class when they deal with strings.

Thoughts?

August 05, 2006

Re: Casting between char[]/wchar[]/dchar[]

Posted by kris
in reply to Hasan Aljudy

kris

Posted in reply to Hasan Aljudy

Hasan Aljudy wrote:
> What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ?
> 
> When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()?
> 
> Here's an idea that should simplify much of string handling in D:
> allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method.
> I think this is perfectly safe; no data is lost, and string handling can become much more flexable.
> 
> Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[].
> 
> This is also relevies developers from writing templetized functions/class when they deal with strings.
> 
> Thoughts?

This one was beaten soundly around the head & shoulders in the past :)

In a systems language like D, one could argue that hidden conversions and/or translations (a) can mask what would otherwise be unintended compile-time errors (b) can be terribly detrimental to performance where multiple conversions are implicitly applied. Such an environment could potentially put C0W to shame in terms of heap abuse -- recall some of the recent CoW examples, and sprinkle in a few unintended conversions for good measure :)

IIRC, the last time this came up there was a pretty strong feeling that such things should be explicit (partly because it can be an expensive operation ~ likely sucking on the heap also). Although foreach() will convert on the fly, that's perhaps not something one should do with extensive chunks of text?

One approach would be to make the Unicode converters more attractive for daily use. There are libraries other than Phobos which attempt to do just that.

On the other hand, if you're writing some kind of platform where convenience is more important than, say, performance, being able to /add/ the implicit conversion might be of real value. One might, for example, implement such a platform using a String class to abstract the encoding differences. Functions could accept said String rather than one of the three stooges^H^H^H^H^H^H^H Unicode types.

If I recall correctly, I think Regan was quite keen on implicit Unicode conversions (during function calls also), so a google on the subject along with his name might get you to the prior threads?

Either way, having the compiler tell you at compile time when you're mixing metaphors is a-good-thing (tm). Being able to 'extend' the language (via classes or whatever) to implement higher level abstractions such as String is also a-good-thing. Having both provides for differing uses of D without stepping on toes, or hitting said appendages with a hammer

- Kris

August 05, 2006

Re: Casting between char[]/wchar[]/dchar[]

Posted by Walter Bright
in reply to kris

Walter Bright

Posted in reply to kris

kris wrote:
> Hasan Aljudy wrote:
>> What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ?
>>
>> When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()?
>>
>> Here's an idea that should simplify much of string handling in D:
>> allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method.
>> I think this is perfectly safe; no data is lost, and string handling can become much more flexable.
>>
>> Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[].
>>
>> This is also relevies developers from writing templetized functions/class when they deal with strings.
>>
>> Thoughts?
> 
> This one was beaten soundly around the head & shoulders in the past :)
> 
> In a systems language like D, one could argue that hidden conversions and/or translations (a) can mask what would otherwise be unintended compile-time errors (b) can be terribly detrimental to performance where multiple conversions are implicitly applied. Such an environment could potentially put C0W to shame in terms of heap abuse -- recall some of the recent CoW examples, and sprinkle in a few unintended conversions for good measure :)
> 
> IIRC, the last time this came up there was a pretty strong feeling that such things should be explicit (partly because it can be an expensive operation ~ likely sucking on the heap also).

Yes. It's hard to judge where the line is, but too many implicit conversions leads to very hard to understand/debug programs.

> Although foreach() will convert on the fly, that's perhaps not something one should do with extensive chunks of text?

foreach also doesn't consume memory for the conversion.

August 05, 2006

Re: Casting between char[]/wchar[]/dchar[]

Posted by Hasan Aljudy
in reply to Walter Bright

Hasan Aljudy

Posted in reply to Walter Bright


Walter Bright wrote:
> kris wrote:
> 
>> Hasan Aljudy wrote:
>>
>>> What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ?
>>>
>>> When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()?
>>>
>>> Here's an idea that should simplify much of string handling in D:
>>> allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method.
>>> I think this is perfectly safe; no data is lost, and string handling can become much more flexable.
>>>
>>> Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[].
>>>
>>> This is also relevies developers from writing templetized functions/class when they deal with strings.
>>>
>>> Thoughts?
>>
>>
>> This one was beaten soundly around the head & shoulders in the past :)
>>
>> In a systems language like D, one could argue that hidden conversions and/or translations (a) can mask what would otherwise be unintended compile-time errors (b) can be terribly detrimental to performance where multiple conversions are implicitly applied. Such an environment could potentially put C0W to shame in terms of heap abuse -- recall some of the recent CoW examples, and sprinkle in a few unintended conversions for good measure :)
>>
>> IIRC, the last time this came up there was a pretty strong feeling that such things should be explicit (partly because it can be an expensive operation ~ likely sucking on the heap also).
> 
> 
> Yes. It's hard to judge where the line is, but too many implicit conversions leads to very hard to understand/debug programs.

Can I ask you atleast to simplify the conversion by adding properties utf* to char/wchar/dchar arrays?

so, if I have:
----
char[] process( char[] str ) { ... }

...

dchar[] my32str = .....;

//I can write
my32str = process( my32str.utf8 ).utf32;

//instead of
//my32str = toUTF32( process( toUTF8( my32str ) ) );
----

August 05, 2006

Re: Casting between char[]/wchar[]/dchar[]

Posted by kris
in reply to Hasan Aljudy

kris

Posted in reply to Hasan Aljudy

Hasan Aljudy wrote:
> 
> 
> Walter Bright wrote:
> 
>> kris wrote:
>>
>>> Hasan Aljudy wrote:
>>>
>>>> What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ?
>>>>
>>>> When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()?
>>>>
>>>> Here's an idea that should simplify much of string handling in D:
>>>> allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method.
>>>> I think this is perfectly safe; no data is lost, and string handling can become much more flexable.
>>>>
>>>> Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[].
>>>>
>>>> This is also relevies developers from writing templetized functions/class when they deal with strings.
>>>>
>>>> Thoughts?
>>>
>>>
>>>
>>> This one was beaten soundly around the head & shoulders in the past :)
>>>
>>> In a systems language like D, one could argue that hidden conversions and/or translations (a) can mask what would otherwise be unintended compile-time errors (b) can be terribly detrimental to performance where multiple conversions are implicitly applied. Such an environment could potentially put C0W to shame in terms of heap abuse -- recall some of the recent CoW examples, and sprinkle in a few unintended conversions for good measure :)
>>>
>>> IIRC, the last time this came up there was a pretty strong feeling that such things should be explicit (partly because it can be an expensive operation ~ likely sucking on the heap also).
>>
>>
>>
>> Yes. It's hard to judge where the line is, but too many implicit conversions leads to very hard to understand/debug programs.
> 
> 
> Can I ask you atleast to simplify the conversion by adding properties utf* to char/wchar/dchar arrays?
> 
> so, if I have:
> ----
> char[] process( char[] str ) { ... }
> 
> ...
> 
> dchar[] my32str = .....;
> 
> //I can write
> my32str = process( my32str.utf8 ).utf32;
> 
> //instead of
> //my32str = toUTF32( process( toUTF8( my32str ) ) );
> ----
> 
> 


er, you can do that yourself, Hasan?

char[] utf8 (dchar[] s)
{
  ...
}

dchar[] utf32 (char[] s)
{
  ...
}

etc, followed by:

> char[] process( char[] str ) { ... }
>
> ...
>
> dchar[] my32str = .....;
>
> //I can write
> my32str = process( my32str.utf8 ).utf32;
>
> //instead of
> //my32str = toUTF32( process( toUTF8( my32str ) ) );


However, this is sucking on the heap, since you're not providing anywhere for the conversion to occur. Hence it it expensive (heap allocation is several times slower than a 'typical' utf conversion, and there's potential lock-contention to deal with also). This is partly why there was some pushback against such properties in the past; especially when you can add them yourself using the funky array-prop syntax (demonstrated above).

There's nothing wrong with convenience props and so on, but if the ones built-in to the compiler are expensive to use, D will inevitably get a reputation for being slow and/or heap-bound; just like Java did ~ deserved or otherwise. D currently offers a number of alternatives anyway.

Again, why not use a String aggregate instead? To hide/abstract the distinction between Unicode types? I suspect that would be both more efficient and more convenient? Having written just such a class, I can attest to these attributes.

August 05, 2006

Re: Casting between char[]/wchar[]/dchar[]

Posted by Jarrett Billingsley
in reply to Hasan Aljudy

Jarrett Billingsley

Posted in reply to Hasan Aljudy

"Hasan Aljudy" <hasan.aljudy@gmail.com> wrote in message news:eb2u9n$psv$1@digitaldaemon.com...

> Can I ask you atleast to simplify the conversion by adding properties utf* to char/wchar/dchar arrays?
>
> so, if I have:
> ----
> char[] process( char[] str ) { ... }
>
> ...
>
> dchar[] my32str = .....;
>
> //I can write
> my32str = process( my32str.utf8 ).utf32;
>
> //instead of
> //my32str = toUTF32( process( toUTF8( my32str ) ) );

import utf = std.utf;

wchar[] utf16(char[] s)
{
    return utf.toUTF16(s);
}

...

char[] s = "hello";
wchar[] t = s.utf16;

 ;)

Aren't first-array-param-as-a-property functions cool?

August 05, 2006

Re: Casting between char[]/wchar[]/dchar[]

Posted by Jarrett Billingsley
in reply to kris

Jarrett Billingsley

Posted in reply to kris

"kris" <foo@bar.com> wrote in message news:eb322c$sml$1@digitaldaemon.com...

> er, you can do that yourself, Hasan?
>
> char[] utf8 (dchar[] s)
> {
>   ...
> }
>
> dchar[] utf32 (char[] s)
> {
>   ...
> }

lol :)

August 05, 2006

Re: Casting between char[]/wchar[]/dchar[]

Posted by Serg Kovrov
in reply to Jarrett Billingsley

Serg Kovrov

Posted in reply to Jarrett Billingsley

Jarrett Billingsley wrote:
> Aren't first-array-param-as-a-property functions cool? 

Cool indeed =)
Is it documented?

--
serg.

August 05, 2006

Re: Casting between char[]/wchar[]/dchar[]

Posted by Frits van Bommel
in reply to Jarrett Billingsley

Frits van Bommel

Posted in reply to Jarrett Billingsley

Jarrett Billingsley wrote:
> "Hasan Aljudy" <hasan.aljudy@gmail.com> wrote in message news:eb2u9n$psv$1@digitaldaemon.com...
> 
>> Can I ask you atleast to simplify the conversion by adding properties utf* to char/wchar/dchar arrays?
>>
>> so, if I have:
>> ----
>> char[] process( char[] str ) { ... }
>>
>> ...
>>
>> dchar[] my32str = .....;
>>
>> //I can write
>> my32str = process( my32str.utf8 ).utf32;
>>
>> //instead of
>> //my32str = toUTF32( process( toUTF8( my32str ) ) );
> 
> import utf = std.utf;
> 
> wchar[] utf16(char[] s)
> {
>     return utf.toUTF16(s);
> }
> 
> ...
> 
> char[] s = "hello";
> wchar[] t = s.utf16;
> 
>  ;)
> 
> Aren't first-array-param-as-a-property functions cool? 

In fact, "raw" toUTF* functions work without the wrapper functions (though they're obviously named differently):

import std.utf;

void main()
{
    char[] s = "hello";
    wchar[] t = s.toUTF16();

    // Or, if you prefer:
    alias toUTF16 utf16;
    wchar[] u = s.utf16();
}

August 05, 2006

Re: Casting between char[]/wchar[]/dchar[]

Posted by Hasan Aljudy
in reply to kris

Hasan Aljudy

Posted in reply to kris


kris wrote:
> Hasan Aljudy wrote:
> 
>>
>>
>> Walter Bright wrote:
>>
>>> kris wrote:
>>>
>>>> Hasan Aljudy wrote:
>>>>
>>>>> What are the rules for implicit/explicit casting between char[] and wchar[] and dchar[] ?
>>>>>
>>>>> When one casts (explicitly or implicitly) does the compiler automatically invoke std.utf.toUTF*()?
>>>>>
>>>>> Here's an idea that should simplify much of string handling in D:
>>>>> allow char[] and wchar[] and dchar[] to be castable implicitly to each other, provided that the compiler invokes the appropriate std.utf.toUTF* method.
>>>>> I think this is perfectly safe; no data is lost, and string handling can become much more flexable.
>>>>>
>>>>> Instead of writing three version of the same funciton for each of char[] wchar[] and dchar[], one can just write a wchar[] version (for example) and the compiler will handle the conversion from/to char[] and dchar[].
>>>>>
>>>>> This is also relevies developers from writing templetized functions/class when they deal with strings.
>>>>>
>>>>> Thoughts?
>>>>
>>>>
>>>>
>>>>
>>>> This one was beaten soundly around the head & shoulders in the past :)
>>>>
>>>> In a systems language like D, one could argue that hidden conversions and/or translations (a) can mask what would otherwise be unintended compile-time errors (b) can be terribly detrimental to performance where multiple conversions are implicitly applied. Such an environment could potentially put C0W to shame in terms of heap abuse -- recall some of the recent CoW examples, and sprinkle in a few unintended conversions for good measure :)
>>>>
>>>> IIRC, the last time this came up there was a pretty strong feeling that such things should be explicit (partly because it can be an expensive operation ~ likely sucking on the heap also).
>>>
>>>
>>>
>>>
>>> Yes. It's hard to judge where the line is, but too many implicit conversions leads to very hard to understand/debug programs.
>>
>>
>>
>> Can I ask you atleast to simplify the conversion by adding properties utf* to char/wchar/dchar arrays?
>>
>> so, if I have:
>> ----
>> char[] process( char[] str ) { ... }
>>
>> ...
>>
>> dchar[] my32str = .....;
>>
>> //I can write
>> my32str = process( my32str.utf8 ).utf32;
>>
>> //instead of
>> //my32str = toUTF32( process( toUTF8( my32str ) ) );
>> ----
>>
>>
> 
> 
> er, you can do that yourself, Hasan?
> 
> char[] utf8 (dchar[] s)
> {
>   ...
> }
> 
> dchar[] utf32 (char[] s)
> {
>   ...
> }
> 
> etc, followed by:
> 
>  > char[] process( char[] str ) { ... }
>  >
>  > ...
>  >
>  > dchar[] my32str = .....;
>  >
>  > //I can write
>  > my32str = process( my32str.utf8 ).utf32;
>  >
>  > //instead of
>  > //my32str = toUTF32( process( toUTF8( my32str ) ) );
> 

I know, but
1: The syntax is still not documented..
2: I'm talking about making these properties a part of the standard.

actually, I think:

alias toUTF8 utf8;
alias toUTF16 utf16;
alias toUTF32 utf32;

would do the trick.


> 
> However, this is sucking on the heap, since you're not providing anywhere for the conversion to occur. Hence it it expensive (heap allocation is several times slower than a 'typical' utf conversion, and there's potential lock-contention to deal with also). This is partly why there was some pushback against such properties in the past; especially when you can add them yourself using the funky array-prop syntax (demonstrated above).
> 
> There's nothing wrong with convenience props and so on, but if the ones built-in to the compiler are expensive to use, D will inevitably get a reputation for being slow and/or heap-bound; just like Java did ~ deserved or otherwise. D currently offers a number of alternatives anyway.

Doesn't COW suck on the heap? object allocation? array concatenation? increasing the length property?

I suppose one could write custom allocators for these "temporary" conversions. For example, pre-allocate a chunk of heap for temporary utf conversions (10 K would suffice, I think) and use it like a stack to make the allocation faster?

Honestly, I don't know how that would work, but I bet someone else does, and I bet that person can write such an allocator.
Then, integrating that allocator into std.utf would make it faster to use the standard utf conversion properties. No?

> 
> Again, why not use a String aggregate instead? To hide/abstract the distinction between Unicode types? I suspect that would be both more efficient and more convenient? Having written just such a class, I can attest to these attributes.

Because the standard library functions always expect a char[].
What you did with mango was write a whole library, not just a String class.

BTW, are there tutorials for using mango Strings?

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation