Thread overview
is(ElementType!(char[2]) == dchar - why?
Dec 11, 2018
Denis Feklushkin
Dec 11, 2018
rikki cattermole
Dec 11, 2018
H. S. Teoh
Dec 11, 2018
bauss
OT (Was: Re: is(ElementType!(char[2]) == dchar - why?)
Dec 11, 2018
H. S. Teoh
Re: OT (Was: Re: is(ElementType!(char[2]) == dchar - why?)
Dec 11, 2018
Jonathan M Davis
Jun 14, 2019
Yatheendra
Dec 11, 2018
Paul Backus
December 11, 2018
import std.stdio;
import std.range.primitives;

void main()
{
    writeln(
        typeid(ElementType!(char[2]))
    );

    static assert(is(ElementType!(char[2]) == dchar)); // why?
}

?

https://run.dlang.io/is/Q74yHm
December 12, 2018
On 12/12/2018 6:51 AM, Denis Feklushkin wrote:
> import std.stdio;
> import std.range.primitives;
> 
> void main()
> {
>      writeln(
>          typeid(ElementType!(char[2]))
>      );
> 
>      static assert(is(ElementType!(char[2]) == dchar)); // why?
> }
> 
> ?
> 
> https://run.dlang.io/is/Q74yHm

Because docs: https://dlang.org/phobos/std_range_primitives.html#ElementType

What you probably want is: https://dlang.org/phobos/std_traits.html#ForeachType
December 11, 2018
On Wed, Dec 12, 2018 at 06:56:46AM +1300, rikki cattermole via Digitalmars-d-learn wrote:
> On 12/12/2018 6:51 AM, Denis Feklushkin wrote:
> > import std.stdio;
> > import std.range.primitives;
> > 
> > void main()
> > {
> >      writeln(
> >          typeid(ElementType!(char[2]))
> >      );
> > 
> >      static assert(is(ElementType!(char[2]) == dchar)); // why?
> > }
> > 
> > ?
> > 
> > https://run.dlang.io/is/Q74yHm
> 
> Because docs: https://dlang.org/phobos/std_range_primitives.html#ElementType
> 
> What you probably want is: https://dlang.org/phobos/std_traits.html#ForeachType

Autodecoding raises its ugly head again. :-/


T

-- 
It is the quality rather than the quantity that matters. -- Lucius Annaeus Seneca
December 11, 2018
On Tuesday, 11 December 2018 at 17:51:56 UTC, Denis Feklushkin wrote:
> import std.stdio;
> import std.range.primitives;
>
> void main()
> {
>     writeln(
>         typeid(ElementType!(char[2]))
>     );
>
>     static assert(is(ElementType!(char[2]) == dchar)); // why?
> }
>
> ?
>
> https://run.dlang.io/is/Q74yHm

This is a "feature" called auto decoding. It's explained here: https://tour.dlang.org/tour/en/gems/unicode

You can get around it by using .byChar or .byCodeUnit.
December 11, 2018
On Tuesday, 11 December 2018 at 18:10:48 UTC, H. S. Teoh wrote:
> On Wed, Dec 12, 2018 at 06:56:46AM +1300, rikki cattermole via Digitalmars-d-learn wrote:
>> On 12/12/2018 6:51 AM, Denis Feklushkin wrote:
>> > import std.stdio;
>> > import std.range.primitives;
>> > 
>> > void main()
>> > {
>> >      writeln(
>> >          typeid(ElementType!(char[2]))
>> >      );
>> > 
>> >      static assert(is(ElementType!(char[2]) == dchar)); // why?
>> > }
>> > 
>> > ?
>> > 
>> > https://run.dlang.io/is/Q74yHm
>> 
>> Because docs: https://dlang.org/phobos/std_range_primitives.html#ElementType
>> 
>> What you probably want is: https://dlang.org/phobos/std_traits.html#ForeachType
>
> Autodecoding raises its ugly head again. :-/
>
>
> T

Has it ever had anything else?
December 11, 2018
On Tue, Dec 11, 2018 at 09:02:41PM +0000, bauss via Digitalmars-d-learn wrote:
> On Tuesday, 11 December 2018 at 18:10:48 UTC, H. S. Teoh wrote:
[...]
> > Autodecoding raises its ugly head again. :-/
[...]
> Has it ever had anything else?

LOL... well, we (or some of us) were deceived by its pretty tail for a while, until we realized that it was just a façade, and Unicode really didn't work the way we thought it did.


T

-- 
Notwithstanding the eloquent discontent that you have just respectfully expressed at length against my verbal capabilities, I am afraid that I must unfortunately bring it to your attention that I am, in fact, NOT verbose.
December 11, 2018
On Tuesday, December 11, 2018 2:11:49 PM MST H. S. Teoh via Digitalmars-d- learn wrote:
> On Tue, Dec 11, 2018 at 09:02:41PM +0000, bauss via Digitalmars-d-learn
wrote:
> > On Tuesday, 11 December 2018 at 18:10:48 UTC, H. S. Teoh wrote:
> [...]
>
> > > Autodecoding raises its ugly head again. :-/
>
> [...]
>
> > Has it ever had anything else?
>
> LOL... well, we (or some of us) were deceived by its pretty tail for a while, until we realized that it was just a façade, and Unicode really didn't work the way we thought it did.

Yeah. Auto-decoding came about, because Andrei misunderstood Unicode and thought that code points were complete characters (likely because the Unicode standard weirdly likes to refer to them as characters), and he didn't know about graphemes. At the time, many of us were just as clueless as he was (in many cases, more so), and auto-decoding made sense. You supposedly got full correctness by default and could work around it for increased performance when you needed to (and the standard library did that for you where it mattered, reducing how much you had to care). Walter knew better, but he wasn't involved enough with Phobos development to catch on until it was too late. It's only later when more folks involved came to a fuller understanding of Unicode that auto-decoding started to be panned.

For instance, I very much doubt that you would find much from the D community talking about how horrible auto-decoding is back in 2010, whereas you probably could find plenty by 2015, and every time it comes up now, folks complain about it. Previously, folks would get annoyed about the restrictions, but the restrictions made sense with the understanding that code points were the actual characters, and you didn't want code to be chopping them up. But once it became more widely understood that code points were also potentially pieces of characters, you no longer had the same defense against the annoyances caused by how narrow strings are treated, and so it just became annoying. We went from newcomers getting annoyed, but those who understood the reasons behind auto-decoding being fine with it (because it supposedly made their code correct and prevented bugs) to almost everyone involved being annoyed about it. The newcomers who don't understand it still get annoyed by it, but instead of the ones who do understand it telling them about how it's helping keep Unicode handling correct, the folks who understand what's going now tell everyone how terrible auto-decoding is.

So, the narrative that auto-decoding is terrible has now become the status quo, whereas before, it was actually considered to be a good thing by the D community at large, because it supposedly ensured Unicode correctness. It was still annoying, but that was because Unicode is annoying. Now, Unicode is still annoying, but auto-decoding is understood to make it even more so without actually helping.

The one bright side out of all of this that makes it so that I don't think that auto-decoding is entirely bad is that it shoves the issue in everyone's faces so that everyone is forced to learn at least the basics about Unicode, whereas if we didn't have it, many folks would likely just treat char as a complete character and merrily write code that can't handle Unicode, since that's what usually happens in most programs in most languages (some languages do use the equivalent of wchar for their char, but most code still treats their char type as if it were a complete character). The fact that we have char, wchar, and dchar _does_ help raise the issue on its own, but auto-decoding makes it very hard to ignore. Now, that doesn't mean that I think that we should have auto-decoding (ideally, we'd figure out how to remove it), but the issues that it's caused have resulted in a lot of developers becoming much more knowledgeable about Unicode and therefore more likely to write code that handles Unicode correctly.

- Jonathan M Davis




June 14, 2019
Could this be rendered an aside for newbies, by way of documentation, specifically the Unicode portion of the Dlang tour? Just never bring up auto-decoding at all, point out UTF8/16/32 and point out the fast correct primitive (byCodeUnit) that lets you iterate over a string's contents simplistically as in other languages. Leave the "proper" handling of Unicode to content outside of the tour.

I know very little about Unicode/UTF. If I had only read about byCodeUnit, I wouldn't be bothered at all because I understand Hindi (and Mandarin, and Japanese, Russian too?) is very different from European languages & needs explicit care for proper handling. I would be very glad that D makes the situation clear, and provides the zero-barrier API to start newbies off on an equal footing with past languages.