Fix Phobos dependencies on autodecoding - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Fix Phobos dependencies on autodecoding

Thread overview

Fix Phobos dependencies on autodecoding
Aug 13, 2019 Walter Bright
Aug 13, 2019 a11e99z
Aug 13, 2019 Alexandru Ermicioi
Aug 13, 2019 a11e99z
Aug 13, 2019 Jonathan M Davis
Aug 13, 2019 a11e99z
Aug 13, 2019 Daniel Kozak
Aug 13, 2019 H. S. Teoh
Aug 13, 2019 jmh530
Aug 13, 2019 H. S. Teoh
Aug 13, 2019 jmh530
Aug 13, 2019 Jonathan M Davis
Aug 13, 2019 Gregor Mückl
Aug 13, 2019 Jonathan M Davis
Aug 13, 2019 H. S. Teoh
Aug 16, 2019 Nick Sabalausky (Abscissa)
Aug 13, 2019 jmh530
Aug 13, 2019 Dukc
Aug 13, 2019 Patrick Schluter
Aug 13, 2019 H. S. Teoh
Aug 13, 2019 a11e99z
Aug 14, 2019 dangbinghoo
Aug 13, 2019 matheus
Aug 13, 2019 jmh530
Aug 13, 2019 matheus
Aug 16, 2019 Nick Sabalausky (Abscissa)
Aug 14, 2019 Argolis
Aug 14, 2019 Gregor Mückl
Aug 14, 2019 H. S. Teoh
Aug 15, 2019 Argolis
Aug 14, 2019 H. S. Teoh
Aug 15, 2019 Argolis
Aug 15, 2019 nkm1
Aug 15, 2019 Gregor Mückl
Aug 15, 2019 Walter Bright
Aug 15, 2019 Gregor Mückl
Aug 15, 2019 Walter Bright
Aug 15, 2019 ag0aep6g
Aug 15, 2019 H. S. Teoh
Aug 15, 2019 Walter Bright
Aug 15, 2019 H. S. Teoh
Aug 16, 2019 Walter Bright
Aug 16, 2019 Patrick Schluter
Aug 16, 2019 Walter Bright
Aug 16, 2019 Patrick Schluter
Aug 16, 2019 Jonathan M Davis
Aug 16, 2019 Abdulhaq
Aug 16, 2019 H. S. Teoh
Aug 16, 2019 Walter Bright
Aug 16, 2019 lithium iodate
Aug 17, 2019 Walter Bright
Aug 16, 2019 H. S. Teoh
Aug 16, 2019 Walter Bright
Aug 16, 2019 H. S. Teoh
Aug 15, 2019 Jonathan M Davis
Aug 15, 2019 Walter Bright
Aug 15, 2019 a11e99z
Aug 15, 2019 Walter Bright
Aug 15, 2019 H. S. Teoh
Aug 16, 2019 Walter Bright
Aug 16, 2019 Patrick Schluter
Aug 16, 2019 Walter Bright
Aug 16, 2019 H. S. Teoh
Aug 16, 2019 H. S. Teoh
Aug 16, 2019 H. S. Teoh
Aug 15, 2019 H. S. Teoh
Aug 15, 2019 Walter Bright
Aug 16, 2019 xenon325
Aug 16, 2019 Walter Bright
Aug 17, 2019 Gregor Mückl
Aug 17, 2019 Patrick Schluter
Aug 16, 2019 sarn
Aug 15, 2019 Gregor Mückl
Aug 15, 2019 H. S. Teoh
Aug 15, 2019 Jonathan M Davis
Aug 15, 2019 H. S. Teoh
Aug 16, 2019 Argolis
Aug 13, 2019 GreatSam4sure
Aug 13, 2019 Andre Pany
Aug 15, 2019 Walter Bright
Aug 15, 2019 Vladimir Panteleev
Aug 15, 2019 Vladimir Panteleev
Aug 15, 2019 Vladimir Panteleev
Aug 15, 2019 Les De Ridder
Aug 15, 2019 Vladimir Panteleev
Aug 15, 2019 Walter Bright
Aug 15, 2019 Walter Bright
Aug 15, 2019 Vladimir Panteleev
Aug 15, 2019 Walter Bright

August 13, 2019

Fix Phobos dependencies on autodecoding

Posted by Walter Bright

Walter Bright

We don't yet have a good plan on how to remove autodecoding and yet provide backward compatibility with autodecoding-reliant projects, but one thing we can do is make Phobos work properly with and without autodecoding.

To that end, I created a build of Phobos that disables autodecoding:

https://github.com/dlang/phobos/pull/7130

Of course, it fails. If people want impactful things to work on, fixing each failure is worthwhile (each in separate PRs).

Note that this is neither trivial nor mindless code editing. Each case has to be examined as to why it is doing autodecoding, is autodecoding necessary, and deciding to replace it with byChar, byDchar, or simply hardcoding the decoding logic.

August 13, 2019

Re: Fix Phobos dependencies on autodecoding

Posted by a11e99z
in reply to Walter Bright

a11e99z

Posted in reply to Walter Bright

On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright wrote:
> We don't yet have a good plan on how to remove autodecoding and yet provide backward compatibility with autodecoding-reliant projects, but one thing we can do is make Phobos work properly with and without autodecoding.
>
> To that end, I created a build of Phobos that disables autodecoding:
>
> https://github.com/dlang/phobos/pull/7130
>
> Of course, it fails. If people want impactful things to work on, fixing each failure is worthwhile (each in separate PRs).
>
> Note that this is neither trivial nor mindless code editing. Each case has to be examined as to why it is doing autodecoding, is autodecoding necessary, and deciding to replace it with byChar, byDchar, or simply hardcoding the decoding logic.

imo autodecoding is one of right thing.
maybe will be better to leave it as is and just to add
> immutable(ubyte)[] bytes( string str ) @nogc nothrow {
>     return *cast( immutable(ubyte)[]* )&str;
> }
and use it as
> foreach( b; "Привет, Мир!".bytes) // Hello world in RU
>     writefln( "%x", b );          // 21 bytes, 12 runes
?

why u decide to fight with autodecoding?

August 13, 2019

Re: Fix Phobos dependencies on autodecoding

Posted by Alexandru Ermicioi
in reply to a11e99z

Alexandru Ermicioi

Posted in reply to a11e99z

On Tuesday, 13 August 2019 at 07:31:28 UTC, a11e99z wrote:
> On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright wrote:
>> We don't yet have a good plan on how to remove autodecoding and yet provide backward compatibility with autodecoding-reliant projects, but one thing we can do is make Phobos work properly with and without autodecoding.
>>
>> To that end, I created a build of Phobos that disables autodecoding:
>>
>> https://github.com/dlang/phobos/pull/7130
>>
>> Of course, it fails. If people want impactful things to work on, fixing each failure is worthwhile (each in separate PRs).
>>
>> Note that this is neither trivial nor mindless code editing. Each case has to be examined as to why it is doing autodecoding, is autodecoding necessary, and deciding to replace it with byChar, byDchar, or simply hardcoding the decoding logic.
>
> imo autodecoding is one of right thing.
> maybe will be better to leave it as is and just to add
>> immutable(ubyte)[] bytes( string str ) @nogc nothrow {
>>     return *cast( immutable(ubyte)[]* )&str;
>> }
> and use it as
>> foreach( b; "Привет, Мир!".bytes) // Hello world in RU
>>     writefln( "%x", b );          // 21 bytes, 12 runes
> ?
>
> why u decide to fight with autodecoding?

One of the reasons is that it adds unnecessary complexity for templated code that is working with ranges. Check function prototypes for some algorithms found in std.algorithm package, you're bound to find special treatment for autodecoding strings. It also messes up user expectation when suddenly applying a range function on a string instead of front char you're getting dchar.

Best regards,
Alexandru

August 13, 2019

Re: Fix Phobos dependencies on autodecoding

Posted by a11e99z
in reply to Alexandru Ermicioi

a11e99z

Posted in reply to Alexandru Ermicioi

On Tuesday, 13 August 2019 at 07:51:23 UTC, Alexandru Ermicioi wrote:
> On Tuesday, 13 August 2019 at 07:31:28 UTC, a11e99z wrote:
>> On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright
>
> One of the reasons is that it adds unnecessary complexity for templated code that is working with ranges. Check function prototypes for some algorithms found in std.algorithm package, you're bound to find special treatment for autodecoding strings. It also messes up user expectation when suddenly applying a range function on a string instead of front char you're getting dchar.
>

imo this is a contrived problem.
string contains chars, not in meaning "char" as type but runes or codepoints.
and world is not perfect so chars/runes are stored as utf8 codepoints.

in world where "char" is alias for "byte"/"ubyte" such vision was a problem:
  is this buffer string(seq of chars) or just raw bytes? how it should be enumerated?
but we have better world with different bytes and chars.

probably better was naming for "char" as "utf8cp"/orSomething (don't mix with C/C++ type)
and when u/anybody see string from that point everything falls into place.

I don't see problem that str.front returns codepoint from 0..0x10ffff and when str.length returns 21 and str.count=12. but somebody see problem here, so again this is a contrived problem. and for now this vision problem will recreate/recheck tons of code.
I thought that WB don't want change code peremptorily. Should be BIG problem when he does.

August 13, 2019

Re: Fix Phobos dependencies on autodecoding

Posted by Jonathan M Davis
in reply to a11e99z

Jonathan M Davis

Posted in reply to a11e99z

On Tuesday, August 13, 2019 2:52:58 AM MDT a11e99z via Digitalmars-d wrote:
> On Tuesday, 13 August 2019 at 07:51:23 UTC, Alexandru Ermicioi
>
> wrote:
> > On Tuesday, 13 August 2019 at 07:31:28 UTC, a11e99z wrote:
> >> On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright
> >
> > One of the reasons is that it adds unnecessary complexity for templated code that is working with ranges. Check function prototypes for some algorithms found in std.algorithm package, you're bound to find special treatment for autodecoding strings. It also messes up user expectation when suddenly applying a range function on a string instead of front char you're getting dchar.
>
> imo this is a contrived problem.
> string contains chars, not in meaning "char" as type but runes or
> codepoints.
> and world is not perfect so chars/runes are stored as utf8
> codepoints.
>
> in world where "char" is alias for "byte"/"ubyte" such vision was
> a problem:
>    is this buffer string(seq of chars) or just raw bytes? how it
> should be enumerated?
> but we have better world with different bytes and chars.
>
> probably better was naming for "char" as "utf8cp"/orSomething
> (don't mix with C/C++ type)
> and when u/anybody see string from that point everything falls
> into place.
>
> I don't see problem that str.front returns codepoint from
> 0..0x10ffff and when str.length returns 21 and str.count=12. but
> somebody see problem here, so again this is a contrived problem.
> and for now this vision problem will recreate/recheck tons of
> code.
> I thought that WB don't want change code peremptorily. Should be
> BIG problem when he does.

Code points are almost always the wrong level to be operating at. Many algorithms can operate at the code unit level with no problem, whereas those that require decoding usually need to operate at the grapheme level so that the actual, conceptual characters are being compared. Just like code units aren't necessarily full characters, code points aren't necessarily full characters.

Auto-decoding was introduced, because at the time, Andrei did not have a solid enough understanding of Unicode and thought that code points were always entire characters and didn't know about graphemes. Having auto-decoding has caused us tons of problems. It's inefficient, gives a false sense of code correctness, requires special-casing all over the place, and the whole "narrow string" concept causes all kinds of grief where algorithms don't work properly with strings, because they don't consider them to be random access, have a different type for their range element type than for their actual element type, etc. Pretty much all of the big D contributors have thought for years now that auto-decoding was a mistake, and we've wanted to get rid of it. Many of us actually thought that autodecoding was a good idea at first, but we've all come to understand how terrible it is. Walter is one of the few that understood from the get-go, but he wasn't paying much attention to Phobos (since he usually focuses on the compiler) and didn't catch Andrei's mistake. If he had, autodecoding would never have been a thing in Phobos.

The only reason that auto-decoding still exists in Phobos is because of how hard it is to remove without breaking code. Making Phobos not rely on autodecoding and making it so that it will work regardless of whether the character type for a range is char, wchar, dchar, or a grapheme is exactly what we need to be doing. Some work has been done in that direction already but nowhere near enough. Once that's done, then we can look at how to fully remove autodecoding, be it Phobos v2 (which Andrei has already proposed) or some other clever solution. But regardless of how we go about removing auto-decoding - or even if we ultimately end up leaving it in place - we need to make Phobos autodecoding-agnostic so that it's not forced on everything.

- Jonathan M Davis

August 13, 2019

Re: Fix Phobos dependencies on autodecoding

Posted by Daniel Kozak
in reply to a11e99z

Daniel Kozak

Posted in reply to a11e99z

On Tue, Aug 13, 2019 at 9:35 AM a11e99z via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>
> imo autodecoding is one of right thing.
> maybe will be better to leave it as is and just to add
> > immutable(ubyte)[] bytes( string str ) @nogc nothrow {
> >     return *cast( immutable(ubyte)[]* )&str;
> > }
> and use it as
> > foreach( b; "Привет, Мир!".bytes) // Hello world in RU
> >     writefln( "%x", b );          // 21 bytes, 12 runes
> ?
>
> why u decide to fight with autodecoding?
>

I hate autodecoding for many reason, one of them it is not done right:

https://run.dlang.io/is/IHECPf

```
import std.stdio;
void main()
{
    string strd = "é🜢🜢࠷❻𐝃";
    size_t cnt;
    foreach(i, wchar c; strd)
    {
        write(i);
    }

    writeln("");
    foreach(i, char c; strd)
    {
        write(i);
    }
    writeln("");
    foreach(i, dchar c; strd)
    {
        write(i);
    }
}
```

August 13, 2019

Re: Fix Phobos dependencies on autodecoding

Posted by a11e99z
in reply to Jonathan M Davis

a11e99z

Posted in reply to Jonathan M Davis

On Tuesday, 13 August 2019 at 09:15:30 UTC, Jonathan M Davis wrote:
> On Tuesday, August 13, 2019 2:52:58 AM MDT a11e99z via Digitalmars-d wrote:
>> On Tuesday, 13 August 2019 at 07:51:23 UTC, Alexandru Ermicioi
>>
> we've wanted to get rid of it. Many of us actually thought that autodecoding was a good idea at first, but we've all come to
>

thx for explanations.
probably I am on this stage too.
ok. I can live with .byRunes and .byBytes

August 13, 2019

Re: Fix Phobos dependencies on autodecoding

Posted by GreatSam4sure
in reply to Walter Bright

GreatSam4sure

Posted in reply to Walter Bright

On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright wrote:
> We don't yet have a good plan on how to remove autodecoding and yet provide backward compatibility with autodecoding-reliant projects, but one thing we can do is make Phobos work properly with and without autodecoding.
>
> To that end, I created a build of Phobos that disables autodecoding:
>
> https://github.com/dlang/phobos/pull/7130
>
> Of course, it fails. If people want impactful things to work on, fixing each failure is worthwhile (each in separate PRs).
>
> Note that this is neither trivial nor mindless code editing. Each case has to be examined as to why it is doing autodecoding, is autodecoding necessary, and deciding to replace it with byChar, byDchar, or simply hardcoding the decoding logic.

Thanks for your effort toward this direction I once a massive this discussion on auto decoding.

Recently I have witnessed a massive effort from you, Andrei and the entire community on the D language.

I must confess you have a beautiful language already. The D language promises a lot by its elegance, compilation speed, speed, generic and multiple programming techniques supported.

I don't have a problem with the language that much but with the libraries, tutorial, documentation, ide. Each time I download the library from fun packages almost 90% there must be one error or another.

I will be happy if the tools and library just work out of the box. The tools, the library should be set up that a novice like me can use them.

I don't have much expertise in programming so I can contribute to D for the now

August 13, 2019

Re: Fix Phobos dependencies on autodecoding

Posted by Andre Pany
in reply to GreatSam4sure

Andre Pany

Posted in reply to GreatSam4sure

On Tuesday, 13 August 2019 at 11:01:30 UTC, GreatSam4sure wrote:
> On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright wrote:
>> [...]
>
>
>
> Thanks for your effort toward this direction I once a massive this discussion on auto decoding.
>
> Recently I have witnessed a massive effort from you, Andrei and the entire community on the D language.
>
> I must confess you have a beautiful language already. The D language promises a lot by its elegance, compilation speed, speed, generic and multiple programming techniques supported.
>
> I don't have a problem with the language that much but with the libraries, tutorial, documentation, ide. Each time I download the library from fun packages almost 90% there must be one error or another.
>
>
> I will be happy if the tools and library just work out of the box. The tools, the library should be set up that a novice like me can use them.
>
> I don't have much expertise in programming so I can contribute to D for the now

I started to create github issues every time I see some errors on libraries. This already helps a lot.

What really would be useful is to see the build status of libraries on code.dlang.org.
With the new CI/CD functionality of Github, (free for open source projects), this becomes are lot more feasible and easy to setup.

Kind regards
Andre

August 13, 2019

Re: Fix Phobos dependencies on autodecoding

Posted by H. S. Teoh
in reply to a11e99z

H. S. Teoh

Posted in reply to a11e99z

On Tue, Aug 13, 2019 at 07:31:28AM +0000, a11e99z via Digitalmars-d wrote: [...]
> imo autodecoding is one of right thing.
[...]
> why u decide to fight with autodecoding?

Because it *appears* to be right, but it's actually wrong. For example:

	import std.range : retro;
	import std.stdio;

	void main() {
		writeln("привет".retro);
		writeln("приве́т".retro);
	}

Expected output:
	тевирп
	те́вирп

Actual output:
	тевирп
	т́евирп

The problem is that autodecoding makes the assumption that Unicode code point == grapheme, but this is not true. It's usually true for European languages, but it fails for many other languages.  So auto-decoding gives you the illusion of correctness, but when you ship your product to Asia suddenly you get a ton of bug reports.

To guarantee correctness you need to work with graphemes (see .byGrapheme). But we can't make that the default because it's a big performance hit, and many string algorithms don't actually need grapheme segmentation.

Ultimately, the correct solution is to put the onus on the programmer to select the iteration scheme (by code units, code points, or graphemes) depending on what's actually needed at the application level. Arbitrarily choosing one of them to be the default leads to a false sense of security.

T

-- 
That's not a bug; that's a feature!

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation