November 14, 2005
On Sun, 13 Nov 2005 16:45:34 -0800, Kris <fu@bar.com> wrote:
> "Georg Wrede" <georg.wrede@nospam.org> wrote
>> Regan Heath wrote:
>>> On Mon, 14 Nov 2005 00:07:30 +0200, Georg Wrede <georg.wrede@nospam.org>
>>> wrote:
>>>
>>>> 6. There is _no_ reason for not having a default encoding for
>>>> undecorated string literals in source code.
>>>
>>> What if you have:
>>>
>>> void bob(char[] a)  { printf("1\n"); }
>>> void bob(wchar[] a) { printf("2\n"); }
>>> void bob(dchar[] a) { printf("3\n"); }
>>>
>>> void main()
>>> {
>>>     bob("test");
>>> }
>>>
>>> In other words 2 or 3 functions of the same name which do _different_
>>>  things, the compiler cannot correctly choose the function to call,
>>> right?
>>
>> I'd say "shoot the programmer!"
>>
>> When I was young, there was a law against overloading with different
>> semantics!
>
> Yes, indeed. Overloading method names with different semantics is silly,
> cantankerous, and/or asking for trouble.

I agree. I'm just saying "it can happen" and "the compiler cannot choose without the possibility that it chooses wrongly". Truth be told I'm sitting on the fence playing devils advocate. I can't say with any degree of certainty how likely this might occur, nor exactly how hard it would be to debug.

> Using it as the basis of an
> argument is therefore, IMO, either naiive or disingenuous.

<WARNING: off topic semi-rant>
When your opinions are insults please keep them to yourself. Your statement is a form of fallacy, whether intentional or otherwise it is an attack on the person as opposed to a response to the argument.

Allow me to give an example "resorting to such tactics is low and underhanded", as you can see this statement is nothing but a veiled insult, it doesn't advance an argument toward any useful conclusion and it will often cause a response which is an even less veiled insult if not a flat out personal attack.
</semi-rant>

Lets get back to the topic at hand...

> Hopefully, we can move forward without giving this aspect further deliberation.

We can move forward once we have an answer to the problem. So far it seems to be "that should never happen, lets ignore it" and I'm not so sure a compiler can do that, not without causing a headache for someone at some stage in the future.

Regan
November 14, 2005
On Mon, 14 Nov 2005 12:23:23 +1100, Derek Parnell <derek@psych.ward> wrote:
> On Sun, 13 Nov 2005 16:45:34 -0800, Kris wrote:
>> Yes, indeed. Overloading method names with different semantics is silly,
>> cantankerous, and/or asking for trouble. Using it as the basis of an
>> argument is therefore, IMO, either naiive or disingenuous. Hopefully, we can
>> move forward without giving this aspect further deliberation.
>
> Ummm ... but there are *some* valid uses for this.
>
>  void SendTextToFile(char[] a)
>     { SendBOM(utf8_bom);  Send(UTF8, a, 0); }
>  void SendTextToFile(wchar[] a)
>     { SendBOM(utf16_bom); Send(UTF16,a, LittleEndian); }
>  void SendTextToFile(dchar[] a)
>     { SendBOM(utf32_bom); Send(UTF32,a, LittleEndian); }
>
> The 'semantics' are identical but the implementation is necessarily
> different depending on the parameter's data type.
>
> So which variation of 'SendTextToFile' is to be assumed by the compiler
> here?
>
> void main()
> {
>      SendTextToFile("test");
> }
>
> Sure, it could choose any and be done with it, but would it hurt the coder to be made aware that a decision is required here?

If we examine the example above further.. I'd expect the example to have some sort of global flag to designate the output format, something must exist to tell it what to output as, right?

In which case the call to the function would be conditional on that flag, requiring the coder to be able to specify the literal type, i.e. with the suffix 'c', 'w' or 'd', eg.

if (flag==UTF8) SendTextToFile("test"c);
else if (flag==UTF8) SendTextToFile("test"w);
else if (flag==UTF8) SendTextToFile("test"d);

OR

SendTextToFile would contain the logic and thus the literal would be of the type specified by the parameter to the single SendTextToFile function (there wouldn't need to be 3 of them any more), eg.

SendTextToFile("test");

void SendTextToFile(char[] text) {
  if (flag==UTF8) ..
  else if (flag==UTF8) ..
  else if (flag==UTF8) ..
}

As to which way is better.. I think that's application/coder specific. Note that removal of 'c', 'w' and 'd' suffixes would rule out the first apprach.

> I'm in favour of D's current behaviour, even though it means I must
> decorate some string literals.

As am I. The current behaviour hasn't caused me any trouble and I dislike the compiler silently making choices for me when those choices have a chance of being wrong and causing bugs.

Regan
November 14, 2005
In article <dl904q$27e9$1@digitaldaemon.com>, Kris says...
>
> ...
>
> However, it is required unless one decorates each
>and every string literal with its type. NOTE: string literals are the only D type where this is currently unresolvable.
>
> ...

Kris, aren't the character literals ('c', 'w', and 'd') also in this same boat?
Plus since they don't have postsuffixs (but Walter did agree that he would fix
that at some point) like the string literals ("c"c, "w"w, "d"d)... so everyone
of them will need to be cast() first before being passed into the correct
function too.

David L.

David L.

-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
-------------------------------------------------------------------

MKoD: http://spottedtiger.tripod.com/D_Language/D_Main_XP.html
November 14, 2005
Yes, you are correct; I'd completely forgotton about them. I imagine the corresponding scenario would be unweildly at best ~ suffixing single characters with a type character <g>

The use of suffix characters is a specialized cast. Perhaps it would be more effective to avoid the need for such a cast in the first place?

This whole area could use further attention from Walter.


"David L. Davis" <SpottedTiger@yahoo.com> wrote in message news:dl92ju$2gaa$1@digitaldaemon.com...
> In article <dl904q$27e9$1@digitaldaemon.com>, Kris says...
>>
>> ...
>>
>> However, it is required unless one decorates each
>>and every string literal with its type. NOTE: string literals are the only
>>D
>>type where this is currently unresolvable.
>>
>> ...
>
> Kris, aren't the character literals ('c', 'w', and 'd') also in this same
> boat?
> Plus since they don't have postsuffixs (but Walter did agree that he would
> fix
> that at some point) like the string literals ("c"c, "w"w, "d"d)... so
> everyone
> of them will need to be cast() first before being passed into the correct
> function too.
>
> David L.
>
> David L.
>
> -------------------------------------------------------------------
> "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
> -------------------------------------------------------------------
>
> MKoD: http://spottedtiger.tripod.com/D_Language/D_Main_XP.html


November 14, 2005
[names witheld without request]

>> Hopefully, we can move forward without giving this aspect further
>>  deliberation.
> 
> We can move forward once we have an answer to the problem. So far it
>  seems  to be "that should never happen, lets ignore it" and I'm not
> so sure a  compiler can do that, not without causing a headache for
> someone at some  stage in the future.

I agree. We do have to dig to the bottom of this. We simply have no choice since we all want D to become a rock solid language.
November 14, 2005
Derek Parnell wrote:
> On Sun, 13 Nov 2005 22:12:36 +0200, Georg Wrede wrote:
> 
> [snip]
> 
> 
>>Then I saved the file as UTF-7, UTF-8, UTF-16, UCS-2, UCS-4. 
> 
> 
> I've just fixed the Build utility to read UTF-8, UTF-16le/be, UTF-32le/be
> encodings, but is there any reason I should support UTF-7? It seems a bit
> superfluous, and under-supported elsewhere too.
> 
It says on the docs D source text can only be UTF-8,UTF-16,UTF-32 and ASCII so it's not just superfluous, it's 100% useless, right?

-- 
Bruno Medeiros - CS/E student
"Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
November 14, 2005
Regan Heath wrote:
> On Sat, 12 Nov 2005 15:30:22 +0000, Bruno Medeiros  <daiphoenixNO@SPAMlycos.com> wrote:
> 
>> Regan Heath wrote:
>>
>>> On Fri, 11 Nov 2005 14:03:36 -0800, Kris <fu@bar.com> wrote:
>>>
>>>   Yes, but now a change in compiler options can change how the  application  behaves (i.e. calling a function that has the same name  but a different  purpose to the intended one)
>>>  I'm with Derek above and I can't think of a better solution than the   current behaviour. Take this example (similar to your original one):
>>>  void write (char[] x){}
>>> void write (wchar[] x){}
>>>  void main()
>>> {
>>>   write ("part 1");
>>> }
>>>  the compiler will error as it cannot decide which function to call.  It's  options that I can think of:
>>>  A - pick one at random
>>> B - pick one using some sort of rule i.e. char[] first, then wchar[],  then  dchar[]
>>> C - pick one based on string contents i.e. dchar literal, wchar  literal,  ascii
>>> D - pick one based on file encoding
>>> E - pick one based on compiler switch
>>> F - (current behaviour) require a string suffix of 'c', 'w' or 'd' to   disambiguate
>>>
>>  >...
>>
>> There is also the option of all undecorated string literals having a  default type (like char[] for instance), instead of "it being inferred  from the context."
> 
> 
> This is similar to B above, except that this rule would cause an error  here:
> 
> write(dchar[] str) {}
> void main() { write("a"); }
> 
>> This seems best to me, at first glance at least. What consequences could  there be from this approach?
> 
> 
> The same as for B. Say you wrote the code above, say another 'write'  function existed elsewhere, say it took a char[], the compiler would  silently call the other write function and not the one above. If the  functions do the same thing, no problem, if not silent bug.
> 

But here you would know with certainty that if the code was compiling then it was calling a char[] parameter function, and never a dchar[] one. In the current case, if you have only one function, you cannot tell the type of the string (and thus the parameter of the called function) just by looking at the function call.

>> Well, when passing an undecorated string literal argument to a dchar ou  wchar parameter, it would be an error and one would have to specify the  string type, however, I don't I see that as an inconvenient.
> 
> 
> It is more inconvenient than the current situation which is that you have to decorate only incases where a collision exists.
> 
> Regan
Yes, but which case is more frequent I wonder?

-- 
Bruno Medeiros - CS/E student
"Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
November 14, 2005
>> I'm in favour of D's current behaviour, even though it means I must
>>  decorate some string literals.
> 
> As am I. The current behaviour hasn't caused me any trouble and I dislike  the compiler silently making choices for me when those
> choices have a  chance of being wrong and causing bugs.

I think we've royally shot ourselves in the foot -- since way back!

We have these things called char, wchar and dchar. They are defined as UTF-8, UTF-16 and UTF-32. Fine.

C has this thing called char, too. Many of us work in both languages.

So? Well, what we do a lot is manipulating arrays of "char"s. While at it, in D we are doing most of this manipulating without further thinking, which results in code that essentially manipulates arrays of "ubyte". Since practically nobody here has a mother tongue that _needs_ characters of more than 8 bits, we "haven't had any trouble".

---

I think we've painted ourselves in the corner by calling the UTF-8 entity "char"!!

There's no end to the misunderstanding this proliferates and breeds.

For example, who would have thought that it is illegal to chop a char[] at an arbitrary point? I bet hardly anybody. Not 2 weeks ago anyway.

Parallax thinking. That's what we are doing with our (so called) strings. (Parallax thinking is a term in Bubblefield, which essentially means "thinking of a red car that is blue".)

   If we had the types
   utf8, utf16, utf32,
   while forbidding char,
   we could get somewhere.

When somebody wants to do array twiddling while disregarding the risk of encountering a multi-byte utf8, then one should have to be explicit about it.

Equally, when somebody wants to twiddle with arrays of "virtually just byte size entities, but in the rare case potentially multibyte" entities, then _this_ should have to be explicit. This would also keep the programmer himself remembering what he is twiddling.


November 14, 2005
Bruno Medeiros wrote:
> Derek Parnell wrote:
>> On Sun, 13 Nov 2005 22:12:36 +0200, Georg Wrede wrote:
>> 
>>> Then I saved the file as UTF-7, UTF-8, UTF-16, UCS-2, UCS-4.
>> 
>> I've just fixed the Build utility to read UTF-8, UTF-16le/be,
>> UTF-32le/be encodings, but is there any reason I should support
>> UTF-7? It seems a bit superfluous, and under-supported elsewhere
>> too.
>> 
> It says on the docs D source text can only be UTF-8,UTF-16,UTF-32 and
>  ASCII so it's not just superfluous, it's 100% useless, right?

Yes, for source code.

Why I initially included it was "just to see what happens".
November 14, 2005
Derek Parnell wrote:
> void main() { SendTextToFile("test"); }
> 
> Sure, it could choose any and be done with it, but would it hurt the
> coder to be made aware that a decision is required here?
> 
> I'm in favour of D's current behaviour, even though it means I must decorate some string literals.

Suppose D, one day behaves like this:

When using literal or other strings, D internally uses the width that is customary on that particular OS/hardware combination.

Any undecorated string literal, any array and other string capable data structure defaults to this width, any library routines default to this.

Still, if the programmer specifically wants a certain width, then that is used instead. (But it's ok with me if he then has to specify the width each time he defines a reference, data structure, input parameter, or literal string.)

Upsides:

 - normal program would get easier to write
 - not being width dependent becomes explicit

 - programs that need a certain width
    - would stand out
    - a width decision has been made

 What would the downsides be?