November 16, 2005
Kris wrote:
> 
> Pages of text? In a literal? You're a braver man than I <g>

Never done it myself, but I've seen it in C apps before:

char const* p = "blah blah"
		"blah blah"
		"blah blah";
> 
> <snip>
>> I've tried varying the number of zeroes in this char and this only seems to work if it's a valid wchar.  Is this a scan error in DMD or am I missing something obvious?
> 
> That should be an uppercase \U, not a lowercase one. Darn typo's ...

Ah, thanks.  That works :-)

> BTW: How does one get unicode to show up in these posts? I tried setting the encoding to UTF8, but to no avail. Any ideas? 

It seems to work when I paste chars into Thunderbird, but I recall having problems doing this with the web interface in Firefox.  Not sure why though.


Sean
November 16, 2005
Kris wrote:
> "Kris" <fu@bar.com> wrote
> 
>> Pages of text? In a literal? You're a braver man than I <g>
> 
> I'm sorry Sean; that should have read "In a literal /within a function call/ ?" 

Well no, I've never seen that.  But the spec technically allows it :-)


Sean
November 16, 2005
"Sean Kelly" <sean@f4.ca> wrote ...
> Kris wrote:
>>
>> Pages of text? In a literal? You're a braver man than I <g>
>
> Never done it myself, but I've seen it in C apps before:
>
> char const* p = "blah blah"
> "blah blah"
> "blah blah";

I wondered if that's what you meant. This kind of thing is not an issue for D, since one is already declaring the type (hence method resolving is deterministic ~ literals /as arguments/ are the concern).

However, I thought it would be interesting to see how auto deals with such things. Relevant, don't you think?

struct Foo
{
   void write (char[] x){}
   void write (wchar[] x) {}
   void write (dchar[] x) {}
   void write (char x) {}
   void write (wchar x) {}
   void write (dchar x) {}
}

void main()
{
    Foo f;

    auto c = 'c';
    auto w = '\u0001';
    auto d = '\U00000001';

    auto ascii = "I'm an ascii string";
    auto wide = "I'm a wide string \u0001";
    auto dbl  = "I'm a very wide string \U00000001";
    auto edbl  = "I'm a suffixed very wide string \U00000001"d;

    f.write (c);
    f.write (w);
    f.write (d);

    f.write (ascii);
    f.write (wide);
    f.write (dbl);
    f.write (edbl);
}

Everything compiles and links cleanly. What do you think happens?

- all the chars do the expected thing

- all the arrays *resolve to the char[] instance*

The latter is clearly a bug. Yet, the auto keyword, IMO, is more reason to support default-storage-class via content. Sure, the suffixed version should also work as expected, but auto should be able to tell that the literal "I'm an ascii string" is, indeed, ASCII. If we were /required/ to add the suffix (we're not), then auto is no longer 'auto' <g>

Again; if string literals were processed like char literals, and the suffix were retained as an override, the situation would be far less problematic and auto would operate as one might expect.

Please consider.


November 16, 2005
Kris wrote:
> 
> The latter is clearly a bug. Yet, the auto keyword, IMO, is more reason to support default-storage-class via content. Sure, the suffixed version should also work as expected, but auto should be able to tell that the literal "I'm an ascii string" is, indeed, ASCII. If we were /required/ to add the suffix (we're not), then auto is no longer 'auto' <g>

Very good point.  I hasn't considered the auto keyword in all this.


Sean
November 17, 2005
On Wed, 16 Nov 2005 12:37:27 -0800, Kris <fu@bar.com> wrote:
> (having trouble with unicode strings not displaying properly ...)
>
> It struck me that perhaps I might write this out more clearly:
>
> 1) It was suggested that default storage type for a string literal could be inferred from the content therein. The suffix would be used to override the default storage class, in the manner it does today.

I disagree. All string literals can be stored in all 3 types (char[], wchar[], dchar[]). Unlike char, wchar, and dchar which are similar to short, int, long, there is no example of a literal which cannot be stored in any of the 3 [] types).

> 4) Turns out that char/wchar/dchar instances take the exact approach as
> described above, contrary to #3. In other words, the approach outlined is
> good enough for characters but somehow not good enough for character arrays.

"somehow" is explained above and here:
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/5521

Regan
November 17, 2005
On Wed, 16 Nov 2005 14:18:24 -0800, Kris <fu@bar.com> wrote:
> Everything compiles and links cleanly. What do you think happens?
>
> - all the chars do the expected thing
>
> - all the arrays *resolve to the char[] instance*

What? Adding printf to each of the write methods in your example:

struct Foo
{
   void write (char[] x)  { printf("char[]\n"); }
   void write (wchar[] x) { printf("wchar[]\n"); }
   void write (dchar[] x) { printf("dchar[]\n"); }
   void write (char x)    { printf("char\n"); }
   void write (wchar x)   { printf("wchar\n"); }
   void write (dchar x)   { printf("dchar\n"); }
}

Gives me the following output:

char
wchar
dchar
char[]
char[]
char[]
dchar[]

> The latter is clearly a bug.

The array examples are are the compiler preferring char[] over the other types for 'auto'. Why this is interesting is it is really no different to the case where it refuses to choose (the start of this entire discussion) eg.

write("literal string"); //error matches write(char[]), write(wchar[]), ...

the above is "no more or less likely" to call a method the progrmmer didn't intend than this is:

auto p = "abc";
write(p);

Leaving us with Walters last argument to differentiate these cases:

<quote>
I did consider that for a while, but eventually came to the conclusion that
its behavior would be surprising to someone who did not very carefully read
the spec. Also, the distinction between the various character types is not
obvious when looking at the rendered text, further making it surprising.

I think it's better to now and then have to type in an extra character to
nail down an ambiguity than to have a complicated set of rules to try and
guess what the programmer's intent was.
</quote>

With 'auto' the programmer is asking the compiler to "guess" the type they want. The same cannot be said for a string literal passed to a method. So, IMO, the cases do differ.

> Yet, the auto keyword, IMO, is more reason to
> support default-storage-class via content.

Why? The content can be represented equally well in any of the 3 types. In fact the only consideration for which type to choose would be "the applications preferred type" to prevent excess transcoding or "the smallest amount of memory" if that was an important issue for the target application.

In other words it's application defined.

Regan
November 17, 2005
On Thu, 17 Nov 2005 13:44:41 +1300, Regan Heath <regan@netwin.co.nz> wrote:
> The array examples are are the compiler preferring char[] over the other types for 'auto'. Why this is interesting is it is really no different to the case where it refuses to choose (the start of this entire discussion) eg.

To be clear here. When I said "really no different" here I mean WRT choosing the wrong overload, as I later go on to say that they are still different, in other ways.

Regan
November 17, 2005
"Sean Kelly" <sean@f4.ca> wrote ...
> Kris wrote:
>>
>> The latter is clearly a bug. Yet, the auto keyword, IMO, is more reason to support default-storage-class via content. Sure, the suffixed version should also work as expected, but auto should be able to tell that the literal "I'm an ascii string" is, indeed, ASCII. If we were /required/ to add the suffix (we're not), then auto is no longer 'auto' <g>
>
> Very good point.  I hasn't considered the auto keyword in all this.

Let's put aside content-inspection for a moment.

Assuming the suffix works for auto, and auto does no inspection of the string literal content  ... this would imply a couple of things:

1) auto is selecting char[] as the "default" storage class, regardless of the content. We see this from the prior example ~ it operates as though there were a 'c' suffix attached to those literals.

2) if the compiler simply treated argument literals in the same manner as auto literals, then there'd obviously be no conflict between overloads.


Plus, a summary of the conflicting points:

a) string literal storage-class is not derived in the same manner as char literal. If the rule is too complex to comprehend for string literals then it's kinda' hard to argue the opposite for the char literals. But we're ignoring this aspect for now.

b) an auto literal is treated differently than an argument literal. That is, the compiler can resolve method-overloading for auto literals, but not non-auto (argument) literals. Yet both types are fundamentally equivalent.

c) an auto literal is currently assigned a /default/ storage-class, whereas argument literals are not.

-------

Why on Bobs green planet should there be any distinction between an auto literal and the argument variety? i.e.

void write (char[] x){}
void write (wchar[] x){}

void main()
{
    auto ascii = "ascii";

    write (ascii);     // yawn ...
    write ("ascii");  // Achtung! Achtung!!
}


We can argue elsewhere about whether the storage-class should be derived from the content or not. Likewise, we can argue elsewhere about whether the /default/ class assigned by auto should be dchar[] or wchar[], rather than char[]. But these are perhaps secondary to the fundamental question of why auto and argument stirng literals are treated differently.

If the compiler assigned a default storage-class to both, rather than just one, then there'd be no compile-time error. Woohoo! I, for one, would be ecstatic about that! Well, you know what I mean. We'd also lose some special cases, which surely can only be a good thing. Yes?







November 17, 2005
On Wed, 16 Nov 2005 17:49:51 -0800, Kris <fu@bar.com> wrote:
> Let's put aside content-inspection for a moment.

Good idea.

> Assuming the suffix works for auto, and auto does no inspection of the
> string literal content  ... this would imply a couple of things:
>
> 1) auto is selecting char[] as the "default" storage class, regardless of
> the content. We see this from the prior example ~ it operates as though
> there were a 'c' suffix attached to those literals.

Yep.

> 2) if the compiler simply treated argument literals in the same manner as
> auto literals, then there'd obviously be no conflict between overloads.

Yep.

> Plus, a summary of the conflicting points:
>
> a) string literal storage-class is not derived in the same manner as char
> literal. If the rule is too complex to comprehend for string literals then it's kinda' hard to argue the opposite for the char literals.

This argument may be somewhat of a red herring (I know it's Walters), the real difference IMO is that char literals are comparable to integer literals, string literals are not. there exist no string literals which cannot fit in a char[].

> But we're ignoring this aspect for now.

Ok, moving on...

> b) an auto literal is treated differently than an argument literal. That is, the compiler can resolve method-overloading for auto literals, but not
> non-auto (argument) literals.

Yep.

> Yet both types are fundamentally equivalent.

Are they?

IMO "auto" is a programmers way of saying "pick/guess the type for me" to the compiler. Whereas a string literal argument does not have the same request tied to it.

> c) an auto literal is currently assigned a /default/ storage-class, whereas argument literals are not.

<splitting hairs>
I dont think "auto literal"s exist as an entity, you have an untyped string literal and auto picking a type for that. I'd say "auto defaults to char[] for string literals". After all auto is supposed to figure out the type based on the type of the RHS, the RHS in this case has no type, so auto must be defaulting to char[].
</splitting hairs>

> We can argue elsewhere about whether the storage-class should be derived
> from the content or not. Likewise, we can argue elsewhere about whether the
> /default/ class assigned by auto should be dchar[] or wchar[], rather than
> char[]. But these are perhaps secondary to the fundamental question of why auto and argument stirng literals are treated differently.

Because "auto" is a request by the programmer for the compiler to pick/guess the type, whereas argument string literals are not.

Regan
November 17, 2005
"Regan Heath" <regan@netwin.co.nz> wrote
>
> Because "auto" is a request by the programmer for the compiler to pick/guess the type,

Specious:  gilded: based on pretense; deceptively pleasing; "the gilded and perfumed but inwardly rotten nobility"; "meretricious praise"; "a meretricious argument"

You'd have us support your notion that D deliberately introduce aspects of guesswork into our code ~ it's not enough that we can do it ourselves.


> whereas argument string literals are not.

Disingenuous:  not straightforward or candid; giving a false appearance of frankness; "an ambitious, disingenuous, philistine, and hypocritical operator, who...exemplified...the most disagreeable traits of his time"- David Cannadine

A /decorated/ string literal is certainly explicit about its storage-class, but an undecorated one is not. The distinction is important in that decorated literals behave the same in both argument and auto scenarios, yet not vice versa. A special case. One that occurs only with string literals.

void write (char[] x){}
void write (wchar[] x){}

void main()
{
    auto ascii = "ascii";       // implicit
    auto wide = "wide"w;  // explicit

    write (ascii);         // implicit :: good

    write (wide);        // explicit :: good

    write ("ascii");       // implicit :: won't compile
    write ("wide"w);   // explicit :: good
}


================


Some vaguely related comments:

Regan ~ I generally don't see your postings. When the NG started getting literally hundreds of posts per day I instigated filtering. As a result your posts, and probably many others, get dropped. This particular response is prompted by a third-party, so here I am.

In reviewing your related posts, I'm not seeing you offer much of anything constructive. You are certainly welcome to critique and poke holes at my suggestions. That's the easy position to adopt. You're also welcome to feel D should not change or evolve at all ~ lots of people dislike change. But, either way, it appears you'd rather just post a whole lot of negativity to this thread than /possibly/ discover a means of helping D become more consistent and/or simpler.


That aside:

I believe there's an issue here worth exploring further; some related behaviors are surfacing that are certainly new to me. Whilst this little foray may come to naught, I'm trying hard anyway. If you'd like to actually help, then add some substance to your swagger. For example, you might provide a set of reasons as to why your claim is of real practical benefit :: the claim that undecorated literals /should/ be treated differently depending on usage, when the converse is clearly not true (see example). In other words, backup your claim that we should have to deal with that special-case. That might be helpful..