November 11, 2005
Kris wrote:
> This is the long standing mishmash between character literal arguments and parameters of type char[], wchar[], and/or dchar[]. Character literals don't really have a "solid" type ~ the compiler can, and will, convert between wide and narrow representations on the fly.
> 
> Suppose you have the following methods:
> 
> void write (char[] x);
> void write (wchar[] x);
> void write (dchar[] x);
> 
> Given a literal argument:
> 
> write ("what am I?");
> 
> D doesn't know whether to invoke the char[] or wchar[] signature, since the literal is treated as though it's possibly any of the three types. This is the kind of non-determinism you get when the compiler becomes too 'smart' (unwarranted automatic conversion, in this case).

I agree, except that I think the problem in this case is that it's not converting "from" anything! There's no "exact match" which it tries first.

A parallel case is that a floating point literal can be implicitly converted to float, double, real, cfloat, cdouble, creal. For fp literals, the default is double.

It's a bit odd that with
dchar [] q = "abc";  wchar [] w = "abc"
"abc" is a dchar literal the first time, but a wchar literal the second,whereas with
real q = 2.5;  double w = 2.5;
2.5 is a double literal in both cases.

No wonder array literals are such a problem...
November 11, 2005
In article <4374598B.30604@nospam.org>, Georg Wrede says...
> 

> 
>The compiler knows (or at least _should_ know) the character width of the source code file. Now, if there's an undecorated string literal in it, then _simply_assume_ that is the _intended_ type of the string!
> 

The *programmer* assumes so *anyway*.

Why on earth should the copiler assume anything else!

BTW, D is really cool!



November 11, 2005
"Jarrett Billingsley" <kb3ctd2@yahoo.com> wrote in message news:dl18s7$2r6i$1@digitaldaemon.com...
> "Kris" <fu@bar.com> wrote in message news:dl0ngf$2f7s$1@digitaldaemon.com...
>> Still, it troubles me that one has to decorate string literals in this manner, when the type could be inferred by the content, or by a cast() operator where explicit conversion is required.
>
> If I'm not mistaken, these suffixes were created to get _away_ from the casts ;)

It really does nothing but 'recast' the problem.

Don't you think the type can be inferred from the content? For the sake of discussion, how about this:

1) if the literal has a double-wide char contained, it is defaulted to a dchar[] type.

2) if not 1, and if the literal has a wide-char contained, if defaults to being of wchar[] type.

3) if neither of the above, it defaults to char[] type.

Given the above, I can't think of a /common/ situation where casting would be thus be required.



November 11, 2005
"Georg Wrede" <georg.wrede@nospam.org> wrote ...
> Kris wrote:
>> This is the long standing mishmash between character literal arguments and parameters of type char[], wchar[], and/or dchar[]. Character literals don't really have a "solid" type ~ the compiler can, and will, convert between wide and narrow representations on the fly.
>
> Compared to the bit thing I recently "bitched" about, this, IMHO, is an issue one can accept better. :-)

That doesn't make it any less problematic :-)


> It is a problem for small example programs. Larger programs tend to (and IMHO should) have wrappers anyhow:

Not so. You'd see people complaining about this constantly if Stream.write() was not decorated to distinguish between the three relevant methods. Generally speaking, any code that deals with all three array types will bump into this. Mango.io has the same problem, since it exposes write() methods for every D type plus their array counterparts.


> Why not change Phobos
>
> void write ( char[] s) {.....};
> void write (wchar[] s) {.....};
> void write (dchar[] s) {.....};
>
> into
>
> void _write ( char[] s) {.....};
> void _write (wchar[] s) {.....};
> void _write (dchar[] s) {.....};
> void write (char[] s) {_write(s)};
>
> I think this would solve the issue with string literals as discussed in this thread.

Then, how would one write a dchar[] literal? You just moved the problem to the _write() method instead. I think there needs to be a general resolution instead.

One might infer the literal type from the content therein?


> Also, overloading would not be hampered.
>
> And, those who really _need_ types other than the 8 bit chars, could still have their types work as usual.

Ahh. I think non-ASCII folks would be troubled by this bias <g>


November 11, 2005
"Georg Wrede" <georg.wrede@nospam.org> wrote ...
> The compiler knows (or at least _should_ know) the character width of the source code file. Now, if there's an undecorated string literal in it, then _simply_assume_ that is the _intended_ type of the string!

That sounds like a good idea; it would set the /default/ type for literals. But the compiler should still inspect the literal content to determine if it has explicit wchar or dchar characters within. The compiler apparently does this, but doesn't use it to infer literal type?

This combination would very likely resolve all such problems, assuming the auto-casting were removed also?



November 11, 2005
"Don Clugston" <dac@nospam.com.au> wrote ..
> Kris wrote:
>> D doesn't know whether to invoke the char[] or wchar[] signature, since the literal is treated as though it's possibly any of the three types. This is the kind of non-determinism you get when the compiler becomes too 'smart' (unwarranted automatic conversion, in this case).
>
> I agree, except that I think the problem in this case is that it's not converting "from" anything! There's no "exact match" which it tries first.

There would be if the auto-casting were disabled, and the type were determined via the literal content, in conjunction with the /default/ literal type suggested by GW. Yes?


November 11, 2005
On Fri, 11 Nov 2005 10:51:29 -0800, Kris wrote:

[snip]

> Given the above, I can't think of a /common/ situation where casting would be thus be required.

void Foo(wchar[] x) { . . .}
void Foo(dchar[] x) { . . .}

Foo("which encoding to use here?");


The example above would fail under those rules as the literal would be assumed to be char[].

-- 
Derek Parnell
Melbourne, Australia
12/11/2005 7:18:39 AM
November 11, 2005

Kris wrote:
> "Jarrett Billingsley" <kb3ctd2@yahoo.com> wrote in message news:dl18s7$2r6i$1@digitaldaemon.com...
> 
>> "Kris" <fu@bar.com> wrote in message news:dl0ngf$2f7s$1@digitaldaemon.com...
>> 
>>> Still, it troubles me that one has to decorate string literals in
>>> this manner, when the type could be inferred by the content, or
>>> by a cast() operator where explicit conversion is required.
>> 
>> If I'm not mistaken, these suffixes were created to get _away_ from
>> the casts ;)
> 
> It really does nothing but 'recast' the problem.
> 
> Don't you think the type can be inferred from the content? For the
> sake of discussion, how about this:
> 
> 1) if the literal has a double-wide char contained, it is defaulted
> to a dchar[] type.
> 
> 2) if not 1, and if the literal has a wide-char contained, if
> defaults to being of wchar[] type.
> 
> 3) if neither of the above, it defaults to char[] type.
> 
> Given the above, I can't think of a /common/ situation where casting
> would be thus be required.

Hmm. It might even be simpler than that.

Doesn't a file have to be in one character width in its _entirety_?

In that case, we already know the character width, and therefore that in the string literal.


For all other cases too, (that is, the decorated string literals,) the compiler has to know what the width "really" is, (i.e. character width of the source file,) before it can "cast" the literal into the desired width (i.e. that of the decoration).
November 11, 2005
"Derek Parnell" <derek@psych.ward> wrote ...
> On Fri, 11 Nov 2005 10:51:29 -0800, Kris wrote:
>
> [snip]
>
>> Given the above, I can't think of a /common/ situation where casting
>> would
>> be thus be required.
>
> void Foo(wchar[] x) { . . .}
> void Foo(dchar[] x) { . . .}
>
> Foo("which encoding to use here?");
>
>
> The example above would fail under those rules as the literal would be assumed to be char[].

Ahh. GW made a suggestion in the other thread (d.learn) that would help here. The notion is that the default type of string literals can be implied through the file encoding (D can handle multiple file encodings), or might be set explicitly via a pragma of some kind. I think the former is an interesting idea.

If the compiler were to use such a 'default' type, in conjunction with determining potential type-overrides via the literal content (the presence of 16- or 32-bit characters), then I think the system would be robust.

Your example could be handled appropriately via the default string-literal type, as described above. That is, if the file-encoding is 16-bit, the wchar[] version would be invoked. If 32-bit, then the dchar[] version would be invoked. If the file is ASCII, the compiler would generate an error ~ a developer /should/ cast ASCII literals in such exceptional cases, where there is no matching ASCII (char[]) signature. On the other hand, the compiler would know exactly how to route string literals in the vast majority of cases.

Additionally, these literals would be routed correctly and unambiguously:

Foo("I'm an explicit wchar literal \u1234");
Foo("I'm an explicit dchar literal \U00101234");

What sayeth thou?


November 11, 2005
On Fri, 11 Nov 2005 12:49:12 -0800, Kris wrote:

> "Derek Parnell" <derek@psych.ward> wrote ...
>> On Fri, 11 Nov 2005 10:51:29 -0800, Kris wrote:
>>
>> [snip]
>>
>>> Given the above, I can't think of a /common/ situation where casting
>>> would
>>> be thus be required.
>>
>> void Foo(wchar[] x) { . . .}
>> void Foo(dchar[] x) { . . .}
>>
>> Foo("which encoding to use here?");
>>
>>
>> The example above would fail under those rules as the literal would be assumed to be char[].
> 
> Ahh. GW made a suggestion in the other thread (d.learn) that would help here. The notion is that the default type of string literals can be implied through the file encoding (D can handle multiple file encodings), or might be set explicitly via a pragma of some kind. I think the former is an interesting idea.
> 
> If the compiler were to use such a 'default' type, in conjunction with determining potential type-overrides via the literal content (the presence of 16- or 32-bit characters), then I think the system would be robust.
> 
> Your example could be handled appropriately via the default string-literal type, as described above. That is, if the file-encoding is 16-bit, the wchar[] version would be invoked. If 32-bit, then the dchar[] version would be invoked. If the file is ASCII, the compiler would generate an error ~ a developer /should/ cast ASCII literals in such exceptional cases, where there is no matching ASCII (char[]) signature. On the other hand, the compiler would know exactly how to route string literals in the vast majority of cases.
> 
> Additionally, these literals would be routed correctly and unambiguously:
> 
> Foo("I'm an explicit wchar literal \u1234");
> Foo("I'm an explicit dchar literal \U00101234");
> 
> What sayeth thou?

The source file encoding is a function of the editor and not the source code. To rely on the coder's editor preferences to determine the implied encoding of a string literal will end in tears for someone. In other words, one should be able to alter the source file encoding without altering the meaning of the code it contains.

-- 
Derek Parnell
Melbourne, Australia
12/11/2005 8:45:24 AM