November 19, 2005
Kris wrote:
> 
> But, I have a feeling that cast([]) is not the right approach here? One reason is that structs/classes can have only one opCast() method. perhaps there's another approach for such syntax? That's assuming, however, that one does not create a special-case for char[] types (per above inconsistencies).

It may well not be.  A set of properties is another approach:

char[]  c = "abc";
dchar[] d = c.toDString();

but this would still only work for arrays.  Conversion between char types still only make sense if they are widening conversions.  Perrhaps I'm simply becoming spoiled by having so much built into D.  This may well be simply a job for library code.

> Transcoding is easy when the source content is reasonably small and fully contained within block of memory. It quickly becomes quite complex when streaming instead. That's really worth considering.

Good point.  One of the first things I had to do for readf/unFormat was rewrite std.utf to accept delegates.  There simply isn't any other good way to ensure that too much data isn't read from the stream by mistake.

 > Thus, I'd suspect it may be appropriate for D to add some transcoding sugar.
> But it would likely have to be highly constrained (per the simple case). Is it worth it?

Probably not :-)  But I suppose it's worth discussing.  I do like the idea of not having to rely on library code to do simple string transcoding, though this seems of limited use given the above concerns.


Sean
November 19, 2005
On Fri, 18 Nov 2005 15:29:23 -0800, Sean Kelly <sean@f4.ca> wrote:
> Regan Heath wrote:
>>  Making the cast explicit sounds like a good compromise to me.
>>  The way I see it casting from int to float is similar to casting from char[] to wchar[]. The data must be converted from one form to another for it to make sense, you'd never 'paint' and 'int' as a 'float' it would be meaningless, the same is true for char[] to wchar[].
>
> This is the comparison I was thinking of as well.  Though I've never tried casting an array of ints to floats.  I suspect it doesn't work, does it?

Nope. Kris's post has something about it, here:
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/30158

> My only other reservation is that the behavior could not be preserved for casting char types, and unlike narrowing conversions (such as float to int), meaning can't even be preserved in narrowing char conversions (such as wchar to char).

Indeed. Due to the fact that the meaning (the "character") may be represented as 1 wchar, but 2 char's.
The thread above has some more interesting stuff about this.

Regan
November 19, 2005
"Sean Kelly" <sean@f4.ca> wrote ...
> Kris wrote:
>>
>> But, I have a feeling that cast([]) is not the right approach here? One reason is that structs/classes can have only one opCast() method. perhaps there's another approach for such syntax? That's assuming, however, that one does not create a special-case for char[] types (per above inconsistencies).
>
> It may well not be.  A set of properties is another approach:
>
> char[]  c = "abc";
> dchar[] d = c.toDString();


I would agree, since it thoroughly isolates the special cases:

char[].utf16
char[].utf32

wchar[].utf8
wchar[].utf32

dchar[].utf8
dchar[].utf16

Generics might require the addition of 'identity' properties, like char[].utf8 ?


> but this would still only work for arrays.  Conversion between char types still only make sense if they are widening conversions.


Aye. If the above set of properties were for arrays only, then one may be able to make a case that it doesn't break consistency. There might be a second, somewhat distinct, set:

char.utf16
char.utf32

wchar.utf32

I think your approach is far more amenable than cast(), Sean. And properties don't eat up keyword space <g>


>> Transcoding is easy when the source content is reasonably small and fully contained within block of memory. It quickly becomes quite complex when streaming instead. That's really worth considering.
>
> Good point.  One of the first things I had to do for readf/unFormat was rewrite std.utf to accept delegates.  There simply isn't any other good way to ensure that too much data isn't read from the stream by mistake.
>
>  > Thus, I'd suspect it may be appropriate for D to add some transcoding
> sugar.
>> But it would likely have to be highly constrained (per the simple case). Is it worth it?
>
> Probably not :-)  But I suppose it's worth discussing.  I do like the idea of not having to rely on library code to do simple string transcoding, though this seems of limited use given the above concerns.


Yeah. It would be limited (e.g. no streaming), and would likely be implemented using the heap. Even then, as you note, it could be attractive to some.


November 20, 2005
Georg Wrede wrote:
> 
> If somebody wants to retain the bit pattern while storing the contents to something else, it should be done with a union. (Just as you can do with pointers, or even objects! To name a few "workarounds".)
> 
> A cast should do precisely what our toUTFxxx functions currently do.
> 
It should? Why?, what is the problem of using the toUTFxx functions?

-- 
Bruno Medeiros - CS/E student
"Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
November 20, 2005
Bruno Medeiros wrote:
> Georg Wrede wrote:
> 
>> 
>> If somebody wants to retain the bit pattern while storing the
>> contents to something else, it should be done with a union. (Just
>> as you can do with pointers, or even objects! To name a few
>> "workarounds".)
>> 
>> A cast should do precisely what our toUTFxxx functions currently
>> do.
>> 
> It should? Why?, what is the problem of using the toUTFxx functions?

Nothing wrong. But cast should not do the union thing.

Of course, we could have the toUTFxxx and no cast at all for UTF strings, no problem. But definitely _not_ have the cast do the "union thing".
November 20, 2005
Derek Parnell wrote:
> On Fri, 18 Nov 2005 15:31:48 +0200, Georg Wrede wrote:
>> Derek Parnell wrote:

>>> However, are you saying that D should change its behaviour such
>>> that it should always implicitly convert between encoding types?
>>> Should this happen only with assignments or should it also happen
>>> on function calls?
>>
>> Both. And everywhere else (in case we forgot to name some
>> situation).
> 
> We have problems with inout and out parameters.
> 
>   foo(inout wchar x) {}
> 
>   dchar[] y = "abc";
>   foo(y);
> 
> In this case, if automatic conversion took place, it would have to do
> it twice. It would be like doing ...
> 
>    auto wchar[] temp;
>    temp = toUTF16(y);
>    foo(temp);
>    y = toUTF32(temp);

Would you be surprised:

Foo[10] foo = new Foo;
for(ubyte i=0; i<10; i++) // Not short, int, or long, "save space"
{
    foo[i] = whatever;  // Gee, compiler silently casts to int!
}

He might either be stupid, uneducated, or then not coded since 1985.
And it happens.

>>>   foo(wchar[] x) { . . .  } // #1
>>>   foo(dchar[] x) { . . .  } // #2
>>>   dchar y;
>>>   foo(y);  // Obviously should call #2
>>>   foo("Some Test Data"); // Which one now?
>>
>> Test data is undecorated, hence char[]. Technically on the last
>> line above it could pick at random, when it has no "right"
>> alternative, but I think it would be Polite Manners to make the
>> compiler complain.
> 
> Yes, at that's what happens now.
> 
>>I'm still trying to get through the notion that it _really_does_not_matter_ what it chooses!
> 
> I disagree. Without know what the intention of the function is, one
> has no way of knowing which function to call.
> 
> Try it. Which one is the right one to call in the example above? It
> is quite possible that there is no right one.

If the overloaded functions purport to take UTF (of any width at all), then it is assumed that they do _semantically_ the same thing. Thus, one has the right to sleep at night.

The programmer shall not see any difference whichever is chosen:

 - if there's only one type, then there's no choice anyway.
 - if there's one that matches, then pick that, (not that it would be obligatory, but it's polite.)
 - if there are the two non-matching, then pick the one preferred by the compiler writer, or the OS vendor. If not, then just pick either one.
 - if there are no UTF versions, then it'd be okay to complain, at compile time.

> If we have automatic conversion and it choose one at random, there is
> no way of knowing that its doing the 'right' thing to the data we
> give it. In my opinion, its a coding error and the coder need to
> provide more information to the compiler.

I want everyone to understand that it makes just as little difference as when the compiler optimizer chooses a datatype for variable i in this:

for(ubyte i=0; i<256; i++)
{
    // do stuff
}

Can you honestly say that it makes a difference which type i is? (Except signed byte, of course. And we're not talking about performance.)

I wouldn't be surprised if DMD (haven't checked!) would sneak i to int instead of the explicitly asked-for ubyte, already in the default compile mode. And -release, and at -O probably should. (Again, haven't checked, and even if it does not do it, the issue is a matter of principle: would making it int make a difference in this example?)

>> (Of course performance is slower with a lot of unnecessary casts (
>> = conversions), but that's the programmer's fault, not ours.)
>>
>>> Given just the function signature and an undecorated string, it
>>> is not possible for the compiler to call the 'correct' function.
>>> In fact, it is not possible for a person (other than the original
>>> designer) to know which is the right one to call?
>>
>>That is (I'm sorry, no offense), based on a misconception.
>>
>> Please see my other posts today, where I try to clear (among other things) this very issue.
> 
> I challenge you, right here and now, to tell me which of those two functions above is the one that the coder intended to be called.

Suppose you're in a huge software project with D, and the customer has ordered it to do all arithmetic in long. After 1500000 lines it goes to the beta testers, and they report wierd behavior.

Three weeks of searching, and the boss is raving around with an axe.
One night the following code is found:

import std.stdio;

void main()
{
    long myvar;
...
    myvar = int.max / 47;
... 300 lines
    myvar = scale(myvar);
... 500 lines
}

... 50000 lines later

long scale(int v)
{
    long tmp = 1000 * v;
    return tmp / 3;
}

Folks suspect the bug is here, but what is wrong?
Does the compiler complain? Should it?

> If the coder had written 
> 
>     foo("Some Test Data"w);
> 
> then its pretty clear which function was intended.

Except that my example above is dangerous, while with UTF it can't get dangerous.

Hey, what sould the compiler complain if I write:

char[] a = "\U00000041"c;

(Do you think it currently complains? Saying what? Or doesn't it? And what do you say happens if one would get this currently compiled and run?)

>>> D has currently got the better solution to this problem; get the coder to identify the storage characteristics of the string!
>> 
>> He does, at assignment to a variable. And, up till that time, it
>> makes no difference. It _really_ does not.
> 
> But it *DOES* make a difference when doing signature matching. I'm
> not talking about assignments to variables.

Would it be correct to say that the undecorated string literal can't possibly be done anything with so that the type of the receiver is not known?

Apart from passing to overloaded functions (each of which does know "what it wants"), is there any situation where UTF is accepted, but the receiver does not itself know which it "wants", or even "prefers"?

Should there be such cases? Could there?
November 20, 2005
Regan Heath wrote:
> 
> Georg/Derek, I replied to Georg here:
> http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/5587
> 
> saying essentially the same things as Derek has above. I reckon we combine  these threads and continue in this one, as opposed to the one I linked  above. I or you can link the other thread to here with a post if you're in  agreement.

Good suggestion!

I actually intended that, but forgot about it while reading and thinking. :-/

So, the reply is to it directly.
November 20, 2005
On Sun, 20 Nov 2005 11:30:34 +0000, Bruno Medeiros wrote:

> Georg Wrede wrote:
>> 
>> If somebody wants to retain the bit pattern while storing the contents to something else, it should be done with a union. (Just as you can do with pointers, or even objects! To name a few "workarounds".)
>> 
>> A cast should do precisely what our toUTFxxx functions currently do.
>> 
> It should? Why?, what is the problem of using the toUTFxx functions?

Do we have a toReal(), toFloat(), toInt(), toDouble(), toLong(), toULong(),
... ?


-- 
Derek Parnell
Melbourne, Australia
21/11/2005 6:48:48 AM
November 20, 2005
On Sun, 20 Nov 2005 17:02:04 +0200, Georg Wrede wrote:

> Derek Parnell wrote:
>> On Fri, 18 Nov 2005 15:31:48 +0200, Georg Wrede wrote:
>>> Derek Parnell wrote:
> 
>>>> However, are you saying that D should change its behaviour such that it should always implicitly convert between encoding types? Should this happen only with assignments or should it also happen on function calls?
>>>
>>> Both. And everywhere else (in case we forgot to name some
>>> situation).
>> 
>> We have problems with inout and out parameters.
>> 
>>   foo(inout wchar x) {}
>> 
>>   dchar[] y = "abc";
>>   foo(y);
>> 
>> In this case, if automatic conversion took place, it would have to do it twice. It would be like doing ...
>> 
>>    auto wchar[] temp;
>>    temp = toUTF16(y);
>>    foo(temp);
>>    y = toUTF32(temp);
> 
> Would you be surprised:

Surprised about the two conversions? No, I just said that's what it would have to do, so no I wouldn't be surprised. I just said it would be a problem. In so far as the compiler would (currently) no warn coders about the performance hit until they profiled it, and even then it might not be obvious to some people.

> Foo[10] foo = new Foo;
> for(ubyte i=0; i<10; i++) // Not short, int, or long, "save space"
> {
>      foo[i] = whatever;  // Gee, compiler silently casts to int!
> }
> 
> He might either be stupid, uneducated, or then not coded since 1985. And it happens.

What on earth has the above example got to do with double conversions? And converting from ubyte to int is not exactly a performance drain.


>>>>   foo(wchar[] x) { . . .  } // #1
>>>>   foo(dchar[] x) { . . .  } // #2
>>>>   dchar y;
>>>>   foo(y);  // Obviously should call #2
>>>>   foo("Some Test Data"); // Which one now?
>>>
>>> Test data is undecorated, hence char[]. Technically on the last line above it could pick at random, when it has no "right" alternative, but I think it would be Polite Manners to make the compiler complain.
>> 
>> Yes, at that's what happens now.
>> 
>>>I'm still trying to get through the notion that it _really_does_not_matter_ what it chooses!
>> 
>> I disagree. Without know what the intention of the function is, one has no way of knowing which function to call.
>> 
>> Try it. Which one is the right one to call in the example above? It is quite possible that there is no right one.
> 
> If the overloaded functions purport to take UTF (of any width at all), then it is assumed that they do _semantically_ the same thing. Thus, one has the right to sleep at night.

Assumptions like that have a nasty habit of generating nightmares. It is *only* an assumption and not a decision based on actual knowledge.

> The programmer shall not see any difference whichever is chosen:
> 
>   - if there's only one type, then there's no choice anyway.

But is more than one.

>   - if there's one that matches, then pick that, (not that it would be
> obligatory, but it's polite.)

Sorry, no matches.

>   - if there are the two non-matching, then pick the one preferred by
> the compiler writer, or the OS vendor. If not, then just pick either one.

BANG! This is where we part company. My believe is that to assume that functions with the same name are going to do the same thing is a dangerous one and can lead to mistakes. Whereas you seem to be saying that this is a safe assumption to make.

>   - if there are no UTF versions, then it'd be okay to complain, at
> compile time.
> 
>> If we have automatic conversion and it choose one at random, there is no way of knowing that its doing the 'right' thing to the data we give it. In my opinion, its a coding error and the coder need to provide more information to the compiler.
> 
> I want everyone to understand that it makes just as little difference as when the compiler optimizer chooses a datatype for variable i in this:
> 
> for(ubyte i=0; i<256; i++)
> {
>      // do stuff
> }
> 
> Can you honestly say that it makes a difference which type i is? (Except signed byte, of course. And we're not talking about performance.)

No, but what's this got to do with the argument?

> I wouldn't be surprised if DMD (haven't checked!) would sneak i to int instead of the explicitly asked-for ubyte, already in the default compile mode. And -release, and at -O probably should. (Again, haven't checked, and even if it does not do it, the issue is a matter of principle: would making it int make a difference in this example?)

Red Herring Alert!

>>> (Of course performance is slower with a lot of unnecessary casts (
>>> = conversions), but that's the programmer's fault, not ours.)
>>>
>>>> Given just the function signature and an undecorated string, it is not possible for the compiler to call the 'correct' function. In fact, it is not possible for a person (other than the original designer) to know which is the right one to call?
>>>
>>>That is (I'm sorry, no offense), based on a misconception.
>>>
>>> Please see my other posts today, where I try to clear (among other things) this very issue.
>> 
>> I challenge you, right here and now, to tell me which of those two functions above is the one that the coder intended to be called.
> 
> Suppose you're in a huge software project with D, and the customer has ordered it to do all arithmetic in long. After 1500000 lines it goes to the beta testers, and they report wierd behavior.
> 
> Three weeks of searching, and the boss is raving around with an axe. One night the following code is found:
> 
> import std.stdio;
> 
> void main()
> {
>      long myvar;
> ...
>      myvar = int.max / 47;
> ... 300 lines
>      myvar = scale(myvar);
> ... 500 lines
> }
> 
> ... 50000 lines later
> 
> long scale(int v)
> {
>      long tmp = 1000 * v;
>      return tmp / 3;
> }
> 
> Folks suspect the bug is here, but what is wrong?
> Does the compiler complain? Should it?

No it doesn't and yes it should.

>> If the coder had written
>> 
>>     foo("Some Test Data"w);
>> 
>> then its pretty clear which function was intended.
> 
> Except that my example above is dangerous, while with UTF it can't get dangerous.

Assumptions can hurt too.

> Hey, what sould the compiler complain if I write:
> 
> char[] a = "\U00000041"c;
> 
> (Do you think it currently complains? Saying what? Or doesn't it? And what do you say happens if one would get this currently compiled and run?)

Of course not. Both 'a' and the literal are of the same data type.

>>>> D has currently got the better solution to this problem; get the coder to identify the storage characteristics of the string!
>>> 
>>> He does, at assignment to a variable. And, up till that time, it makes no difference. It _really_ does not.
>> 
>> But it *DOES* make a difference when doing signature matching. I'm not talking about assignments to variables.
> 
> Would it be correct to say that the undecorated string literal can't possibly be done anything with so that the type of the receiver is not known?
> 
> Apart from passing to overloaded functions (each of which does know "what it wants"), is there any situation where UTF is accepted, but the receiver does not itself know which it "wants", or even "prefers"?
> 
> Should there be such cases? Could there?

Again, I fail to see what this has to do with the issue.

Let's call a halt to this discussion. I suspect that you and I will not agree about this function signature matching issue anytime soon.

-- 
Derek Parnell
Melbourne, Australia
21/11/2005 6:55:57 AM
November 20, 2005
On Sun, 20 Nov 2005 17:28:33 +0200, Georg Wrede <georg.wrede@nospam.org> wrote:
> Regan Heath wrote:
>>  Georg/Derek, I replied to Georg here:
>> http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/5587
>>  saying essentially the same things as Derek has above. I reckon we combine  these threads and continue in this one, as opposed to the one I linked  above. I or you can link the other thread to here with a post if you're in  agreement.
>
> Good suggestion!
>
> I actually intended that, but forgot about it while reading and thinking. :-/
>
> So, the reply is to it directly.

Ok. I have taken your reply, clicked reply, and pasted it in here :)
(I hope this post isn't confusing for anyone)

-------------------------
Copied from: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/5607
-------------------------

On Sun, 20 Nov 2005 17:17:33 +0200, Georg Wrede <georg.wrede@nospam.org> wrote:

> Regan Heath wrote:
>> On Fri, 18 Nov 2005 13:02:05 +0200, Georg Wrede <georg.wrede@nospam.org>  wrote:
>>  Lets assume there is 2 functions of the same name (unintentionally), doing different things.
>>  In that source file the programmer writes:
>>  write("test");
>>  DMD tries to choose the storage type of "test" based on the available
>> overloads. There are 2 available overloads X and Y. It currently
>> fails and gives an error.
>>  If instead it picked an overload (X) and stored "test" in the type
>> for X, calling the overload for X, I agree, there would be
>> _absolutely no problems_ with the stored data.
>>  BUT
>>  the overload for X doesn't do the same thing as the overload for Y.
>
> Isn't that a problem with having overloading at all in a language?
> Sooner or later, most of us have done it. If not each already? Isn't this a problem with overloading in general, and not with UTF?

You're right. The problem is not limited to string literals, integer literals exhibit exactly the same problem, AFAICS. So, you've convinced me. Here is why...

http://www.digitalmars.com/d/lex.html#integerliteral
(see "The type of the integer is resolved as follows")

In essence integer literals _default_ to 'int' unless another type is specified or required.

This suggested change does that, and nothing else? (can anyone see a difference?)

If so and if I can accept the behaviour for integer literals why can't I for string literals?

The only logical reason I can think of for not accepting it, is if there exists a difference between integer literals and string literals which affects this behaviour.

I can think of differences, but none which affect the behaviour. So, it seems that if I accept the risk for integers, I have to accept the risk for string literals too.

---

Note that string promotion should occur just like integer promotion does, eg:

void foo(long i) {}
foo(5); //calls foo(long) with no error

void foo(wchar[] s) {}
foo("test"); //should call foo(wchar[]) with no error

this behaviour is current and should not change.

Regan