November 16, 2005
"Derek Parnell" <derek@psych.ward> wrote ...
<snip>
> The 'cast(<X>char[])' idiom never does conversions of one utf encoding to another; not for literals or for variables. The apparent exception is that an undecorated string literal with a 'cast' is syntactically equivalent to a decorated string literal.

* cough *

Well I'll be dipped in dogshit ...

This was not always the case, as past topics will regale in bountiful measure. That's what all the implicit-utf-conversion huffing and puffing used to be about. Hurrah! Hurrah!

struct Foo
{
    void print (wchar[] msg) {}
}

void main()
{
    Foo f;

    f.print ("blah");     // kosher: 4 wide chars
    f.print ("blah"w);  // kosher: 4 wide chars

    char[] m = "blah".dup;
    f.print (cast(wchar[]) m);  // Hoorah!! No UTF conversion!
}

Needless to say, the explicit cast() does has a negative effect, as should be expected. What's good to see here, IMO, is both literals obtaining an appropriate compile-time storage class. And, as Derek noted, there's no runtime transcoding. Hoorah again! Hurrah!

(ahem ... cough ...)


November 16, 2005
"Sean Kelly" <sean@f4.ca> wrote .
> Kris wrote:
> Still over-engineering the problem a bit I think.

Perhaps. It might not be ... ;-)

> If decoration really isn't sufficient then I think the most practical approach would be to simply choose one encoding as the default to be used when overload ambiguities arise.  Or perhaps a more general rule?  "When an ambiguity arises as a result of overload resolution for UTF-based types, the narrowest suitable encoding will be chosen as the default." Thus:
>
> write( char[] );
> write( wchar[] );
> write( dchar[] );
>
> would result in the char[] method being chosen, but:
>
> write( wchar[] );
> write( dchar[] );
>
> would result in the wchar[] method being chosen.

Walter responded negatively to the 'default' notion (re compiler-flag). The second option might be susceptible to fragility when subclassing?


November 16, 2005
"Kris" <fu@bar.com> wrote in message news:dldhkg$2ve9$1@digitaldaemon.com...
> However, this also means that all non-static char[] and wchar[] instances will be converted "on the fly", regardless of how large they might be, or how they might be stored at the back-end. To 'enforce' that upon a design should be considered a criminal waste of bandwidth & horsepower <g> Transcoding is just not a practical answer, I'm afraid.

Clarification: it turns out that "on the fly" transcoding does not happen. That's good, but doesn't help in this case: you'd still need to /explictly/ convert char[] and wchar[] to get them accepted as the one-and-only write method. That's no better than the implicit transcoding noted previously.


November 16, 2005
Kris wrote:
> "Sean Kelly" <sean@f4.ca> wrote . 
>> If decoration really isn't sufficient then I think the most practical approach would be to simply choose one encoding as the default to be used when overload ambiguities arise.  Or perhaps a more general rule?  "When an ambiguity arises as a result of overload resolution for UTF-based types, the narrowest suitable encoding will be chosen as the default." Thus:
>>
>> write( char[] );
>> write( wchar[] );
>> write( dchar[] );
>>
>> would result in the char[] method being chosen, but:
>>
>> write( wchar[] );
>> write( dchar[] );
>>
>> would result in the wchar[] method being chosen.
> 
> Walter responded negatively to the 'default' notion (re compiler-flag).

Only as a compiler flag, however :-)

> The second option might be susceptible to fragility when subclassing?

It's a legitimate concern, I suppose, but is it really a problem?  The only effect of this behavior would be on overload resolution.  And while it's probably not the best idea to add exceptions to the overload rules, I can't think of an example that might silently break as a result of this suggestion.


Sean
November 16, 2005
On Tue, 15 Nov 2005 15:32:36 -0800, Sean Kelly wrote:

> Kris wrote:
>> 
>> That doesn't mean the use of operators should be cursed with this behaviour though <g>. In other words, the usage of operators in D is currently limited with respect to char/wchar/dchar and their corresponding array types (when it comes to literals, sans suffix). One could argue that D provides the tools to implement a rather compelling and succinct IO model, using operators ~ a model far superior to the iostreams model <g>.
>> 
>> That aside; let's, for a moment, try to look at this from a different angle?
>> 
>> The problem is that method-overloading has a hard-time resolving between various signatures, when provided with literals. What if, just in speculation, the method signatures had an opportunity to differentiate between literal and non-literal parameters? I just noticed the specialized "class TypeInfo_StaticArray : TypeInfo" in Object.d, which prompted the thought ... Yes, I can hear you wringing your hands ... yet it could be a solution <g>
> 
> Still over-engineering the problem a bit I think.  If decoration really isn't sufficient then I think the most practical approach would be to simply choose one encoding as the default to be used when overload ambiguities arise.  Or perhaps a more general rule?  "When an ambiguity arises as a result of overload resolution for UTF-based types, the narrowest suitable encoding will be chosen as the default."  Thus:
> 
> write( char[] );
> write( wchar[] );
> write( dchar[] );
> 
> would result in the char[] method being chosen, but:
> 
> write( wchar[] );
> write( dchar[] );
> 
> would result in the wchar[] method being chosen.

Another, 'low tech', tactic would be to avoid using anonymous literals.

  ====================
import std.stdio;

void write( char[] x) { writefln(x); }
void write(wchar[] x) { writefln(x); }
void write(dchar[] x) { writefln(x); }

void main()
{

    dchar[] First_String =  "First";
    wchar[] Second_String = "Second";
    char[]  Third_String =  "Third";

    write( First_String );
    write( Second_String );
    write( Third_String );
}
  ====================

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
16/11/2005 10:56:34 AM
November 16, 2005
"Derek Parnell" <derek@psych.ward> wrote ...
<snip>
> Another, 'low tech', tactic would be to avoid using anonymous literals.
>
>  ====================
> import std.stdio;
>
> void write( char[] x) { writefln(x); }
> void write(wchar[] x) { writefln(x); }
> void write(dchar[] x) { writefln(x); }
>
> void main()
> {
>
>    dchar[] First_String =  "First";
>    wchar[] Second_String = "Second";
>    char[]  Third_String =  "Third";
>
>    write( First_String );
>    write( Second_String );
>    write( Third_String );
> }
>  ====================

Sure. But would you be happy doing that for every use of writef/printf/Stream with literal char and char[] variants?

Before you say "I don't have to", consider that that those functions skirt the issue by mostly avoiding such method overloading, or by kludging the names to do so. That doesn't mean other designs are somehow less valid. Does it?


November 16, 2005
On Tue, 15 Nov 2005 16:00:08 -0800, Kris wrote:

> "Derek Parnell" <derek@psych.ward> wrote ...
> <snip>
>> The 'cast(<X>char[])' idiom never does conversions of one utf encoding to another; not for literals or for variables. The apparent exception is that an undecorated string literal with a 'cast' is syntactically equivalent to a decorated string literal.
> 
> * cough *
> 
> Well I'll be dipped in dogshit ...
> 
> This was not always the case, as past topics will regale in bountiful measure. That's what all the implicit-utf-conversion huffing and puffing used to be about. Hurrah! Hurrah!

I don't think that D ever worked that way. I think the D has not changed on this aspect. It now works the way it always has.

I believe that all the huffing and puffing was about function signature matching and not about conversion of strings. As far as I know the only way to do string encoding conversions was to explicitly use the functions in std.utf module. With one exception ...


   foreach( dchar c; "a sample string"w )
   {
      // c is a proper dchar!
   }


-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
16/11/2005 11:28:39 AM
November 16, 2005
On Tue, 15 Nov 2005 16:33:01 -0800, Kris wrote:

> "Derek Parnell" <derek@psych.ward> wrote ...
> <snip>
>> Another, 'low tech', tactic would be to avoid using anonymous literals.
>>
>>  ====================
>> import std.stdio;
>>
>> void write( char[] x) { writefln(x); }
>> void write(wchar[] x) { writefln(x); }
>> void write(dchar[] x) { writefln(x); }
>>
>> void main()
>> {
>>
>>    dchar[] First_String =  "First";
>>    wchar[] Second_String = "Second";
>>    char[]  Third_String =  "Third";
>>
>>    write( First_String );
>>    write( Second_String );
>>    write( Third_String );
>> }
>>  ====================
> 
> Sure. But would you be happy doing that for every use of writef/printf/Stream with literal char and char[] variants?

Have a look at the Build source code. Many of the string literals have been named and are not embedded. If fact, you have just encouraged me to revisit this and improve that source code even more by naming some of the stray literals.

> Before you say "I don't have to", consider that that those functions skirt the issue by mostly avoiding such method overloading, or by kludging the names to do so. That doesn't mean other designs are somehow less valid. Does it?

Just like yourself, I'm suggesting *alternatives* not *replacements*.

The judicial use of named (string) literals will also pay dividends at code
maintenance time.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
16/11/2005 11:35:21 AM
November 16, 2005
Derek Parnell wrote:
> 
> The judicial use of named (string) literals will also pay dividends at code
> maintenance time.

Agreed.  I don't have a lot of experience with internationalization, but embedded literals is an obvious no-no for this sort of thing.


Sean
November 16, 2005
"Sean Kelly" <sean@f4.ca> wrote in message news:dldvdp$8mv$1@digitaldaemon.com...
> Derek Parnell wrote:
>>
>> The judicial use of named (string) literals will also pay dividends at
>> code
>> maintenance time.
>
> Agreed.  I don't have a lot of experience with internationalization, but embedded literals is an obvious no-no for this sort of thing.

There is no questioning that aspect.