Thread overview
Human stupidity or is this a regression?
Dec 26, 2013
Lionello Lunesu
Dec 26, 2013
bearophile
Dec 26, 2013
Lionello Lunesu
Dec 26, 2013
bearophile
Dec 26, 2013
Andrej Mitrovic
Dec 26, 2013
Vladimir Panteleev
Dec 26, 2013
H. S. Teoh
December 26, 2013
Perhaps should have written "and/or" in the subject line since the two are not mutually exclusive.

I was showing off D to friends the other day:

import std.stdio;
void main()
{
  foreach (d; "你好")
    writeln(d);
}


IIRC, this used to work fine, with the variable "d" getting deduced as "dchar" and correctly reassembling the UTF-8 bytes into Unicode codepoints.

But when I run this code in OSX, dmd v2.064, I get this:

$ dmd -run uni.d
�
�
�
�
�
�

It's clearly printing the bytes. When I print the typeof(d) I get "immutable(char)", so that confirms the type is not deduced as "dchar".

I could have sworn this used to work. Is my memory failing me, or was this a deliberate change at some point? Perhaps a regression?

L.
December 26, 2013
Lionello Lunesu:

> I could have sworn this used to work. Is my memory failing me, or was this a deliberate change at some point? Perhaps a regression?

It's not a regression, it's a locked-in design mistake. Write it like this and try again:

foreach (dchar d; "你好")

Bye,
bearophile
December 26, 2013
On 12/26/13, 11:58, bearophile wrote:
> Lionello Lunesu:
>
>> I could have sworn this used to work. Is my memory failing me, or was
>> this a deliberate change at some point? Perhaps a regression?
>
> It's not a regression, it's a locked-in design mistake. Write it like
> this and try again:
>
> foreach (dchar d; "你好")
>
> Bye,
> bearophile

Yeah, that's what I ended up doing. But D being D, the default should be safe and correct.

I feel we could take this breaking change since it would not silently change the code to do something else. You'll get prompted and we could special case the error message to give a meaningful hint.

L
December 26, 2013
Lionello Lunesu:

> Yeah, that's what I ended up doing. But D being D, the default should be safe and correct.
>
> I feel we could take this breaking change since it would not silently change the code to do something else.

You have to explain such things in the main D newsgroup. D.learn newsgroup is not fit for such requests.

Bye,
bearophile
December 26, 2013
On 12/26/13, bearophile <bearophileHUGS@lycos.com> wrote:
> You have to explain such things in the main D newsgroup. D.learn newsgroup is not fit for such requests.

There have already been a million of these threads, it's worth doing a search as there's probably lots of answers there.
December 26, 2013
On Thursday, 26 December 2013 at 05:39:26 UTC, Lionello Lunesu wrote:
> On 12/26/13, 11:58, bearophile wrote:
>> Lionello Lunesu:
>>
>>> I could have sworn this used to work. Is my memory failing me, or was
>>> this a deliberate change at some point? Perhaps a regression?
>>
>> It's not a regression, it's a locked-in design mistake. Write it like
>> this and try again:
>>
>> foreach (dchar d; "你好")
>>
>> Bye,
>> bearophile
>
> Yeah, that's what I ended up doing. But D being D, the default should be safe and correct.

It is impossible for it to be "correct", unless with a very specific definition of "correct" which makes sense for some languages/locales and not others. As a challenge, try to define a "foreach" semantic that works "correctly" with the OP's code for Unicode composite characters, or Hebrew.
December 26, 2013
On Thu, Dec 26, 2013 at 09:38:02PM +0000, Vladimir Panteleev wrote:
> On Thursday, 26 December 2013 at 05:39:26 UTC, Lionello Lunesu wrote:
> >On 12/26/13, 11:58, bearophile wrote:
> >>Lionello Lunesu:
> >>
> >>>I could have sworn this used to work. Is my memory failing me, or was this a deliberate change at some point? Perhaps a regression?
> >>
> >>It's not a regression, it's a locked-in design mistake. Write it like this and try again:
> >>
> >>foreach (dchar d; "你好")
> >>
> >>Bye,
> >>bearophile
> >
> >Yeah, that's what I ended up doing. But D being D, the default should be safe and correct.
> 
> It is impossible for it to be "correct", unless with a very specific definition of "correct" which makes sense for some languages/locales and not others. As a challenge, try to define a "foreach" semantic that works "correctly" with the OP's code for Unicode composite characters, or Hebrew.

To be truly "correct" in the intuitive sense, use std.uni.byGrapheme. (Yes it's slow, but that's the price you pay for intuitive correctness.)


T

-- 
It always amuses me that Windows has a Safe Mode during bootup. Does that mean that Windows is normally unsafe?