Jump to page: 1 29  
Page
Thread overview
Fix Phobos dependencies on autodecoding
Aug 13, 2019
Walter Bright
Aug 13, 2019
a11e99z
Aug 13, 2019
Alexandru Ermicioi
Aug 13, 2019
a11e99z
Aug 13, 2019
Jonathan M Davis
Aug 13, 2019
a11e99z
Aug 13, 2019
Daniel Kozak
Aug 13, 2019
H. S. Teoh
Aug 13, 2019
jmh530
Aug 13, 2019
H. S. Teoh
Aug 13, 2019
jmh530
Aug 13, 2019
Jonathan M Davis
Aug 13, 2019
Gregor Mückl
Aug 13, 2019
Jonathan M Davis
Aug 13, 2019
H. S. Teoh
Aug 13, 2019
jmh530
Aug 13, 2019
Dukc
Aug 13, 2019
Patrick Schluter
Aug 13, 2019
H. S. Teoh
Aug 13, 2019
a11e99z
Aug 14, 2019
dangbinghoo
Aug 13, 2019
matheus
Aug 13, 2019
jmh530
Aug 13, 2019
matheus
Aug 14, 2019
Argolis
Aug 14, 2019
Gregor Mückl
Aug 14, 2019
H. S. Teoh
Aug 15, 2019
Argolis
Aug 14, 2019
H. S. Teoh
Aug 15, 2019
Argolis
Aug 15, 2019
nkm1
Aug 15, 2019
Gregor Mückl
Aug 15, 2019
Walter Bright
Aug 15, 2019
Gregor Mückl
Aug 15, 2019
Walter Bright
Aug 15, 2019
ag0aep6g
Aug 15, 2019
H. S. Teoh
Aug 15, 2019
Walter Bright
Aug 15, 2019
H. S. Teoh
Aug 16, 2019
Walter Bright
Aug 16, 2019
Patrick Schluter
Aug 16, 2019
Walter Bright
Aug 16, 2019
Patrick Schluter
Aug 16, 2019
Jonathan M Davis
Aug 16, 2019
Abdulhaq
Aug 16, 2019
H. S. Teoh
Aug 16, 2019
Walter Bright
Aug 16, 2019
lithium iodate
Aug 17, 2019
Walter Bright
Aug 16, 2019
H. S. Teoh
Aug 16, 2019
Walter Bright
Aug 16, 2019
H. S. Teoh
Aug 15, 2019
Jonathan M Davis
Aug 15, 2019
Walter Bright
Aug 15, 2019
a11e99z
Aug 15, 2019
Walter Bright
Aug 15, 2019
H. S. Teoh
Aug 16, 2019
Walter Bright
Aug 16, 2019
Patrick Schluter
Aug 16, 2019
Walter Bright
Aug 16, 2019
H. S. Teoh
Aug 16, 2019
H. S. Teoh
Aug 16, 2019
H. S. Teoh
Aug 15, 2019
H. S. Teoh
Aug 15, 2019
Walter Bright
Aug 16, 2019
xenon325
Aug 16, 2019
Walter Bright
Aug 17, 2019
Gregor Mückl
Aug 17, 2019
Patrick Schluter
Aug 16, 2019
sarn
Aug 15, 2019
Gregor Mückl
Aug 15, 2019
H. S. Teoh
Aug 15, 2019
Jonathan M Davis
Aug 15, 2019
H. S. Teoh
Aug 16, 2019
Argolis
Aug 13, 2019
GreatSam4sure
Aug 13, 2019
Andre Pany
Aug 15, 2019
Walter Bright
Aug 15, 2019
Vladimir Panteleev
Aug 15, 2019
Vladimir Panteleev
Aug 15, 2019
Vladimir Panteleev
Aug 15, 2019
Les De Ridder
Aug 15, 2019
Vladimir Panteleev
Aug 15, 2019
Walter Bright
Aug 15, 2019
Walter Bright
Aug 15, 2019
Vladimir Panteleev
Aug 15, 2019
Walter Bright
August 13, 2019
We don't yet have a good plan on how to remove autodecoding and yet provide backward compatibility with autodecoding-reliant projects, but one thing we can do is make Phobos work properly with and without autodecoding.

To that end, I created a build of Phobos that disables autodecoding:

https://github.com/dlang/phobos/pull/7130

Of course, it fails. If people want impactful things to work on, fixing each failure is worthwhile (each in separate PRs).

Note that this is neither trivial nor mindless code editing. Each case has to be examined as to why it is doing autodecoding, is autodecoding necessary, and deciding to replace it with byChar, byDchar, or simply hardcoding the decoding logic.
August 13, 2019
On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright wrote:
> We don't yet have a good plan on how to remove autodecoding and yet provide backward compatibility with autodecoding-reliant projects, but one thing we can do is make Phobos work properly with and without autodecoding.
>
> To that end, I created a build of Phobos that disables autodecoding:
>
> https://github.com/dlang/phobos/pull/7130
>
> Of course, it fails. If people want impactful things to work on, fixing each failure is worthwhile (each in separate PRs).
>
> Note that this is neither trivial nor mindless code editing. Each case has to be examined as to why it is doing autodecoding, is autodecoding necessary, and deciding to replace it with byChar, byDchar, or simply hardcoding the decoding logic.

imo autodecoding is one of right thing.
maybe will be better to leave it as is and just to add
> immutable(ubyte)[] bytes( string str ) @nogc nothrow {
>     return *cast( immutable(ubyte)[]* )&str;
> }
and use it as
> foreach( b; "Привет, Мир!".bytes) // Hello world in RU
>     writefln( "%x", b );          // 21 bytes, 12 runes
?

why u decide to fight with autodecoding?

August 13, 2019
On Tuesday, 13 August 2019 at 07:31:28 UTC, a11e99z wrote:
> On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright wrote:
>> We don't yet have a good plan on how to remove autodecoding and yet provide backward compatibility with autodecoding-reliant projects, but one thing we can do is make Phobos work properly with and without autodecoding.
>>
>> To that end, I created a build of Phobos that disables autodecoding:
>>
>> https://github.com/dlang/phobos/pull/7130
>>
>> Of course, it fails. If people want impactful things to work on, fixing each failure is worthwhile (each in separate PRs).
>>
>> Note that this is neither trivial nor mindless code editing. Each case has to be examined as to why it is doing autodecoding, is autodecoding necessary, and deciding to replace it with byChar, byDchar, or simply hardcoding the decoding logic.
>
> imo autodecoding is one of right thing.
> maybe will be better to leave it as is and just to add
>> immutable(ubyte)[] bytes( string str ) @nogc nothrow {
>>     return *cast( immutable(ubyte)[]* )&str;
>> }
> and use it as
>> foreach( b; "Привет, Мир!".bytes) // Hello world in RU
>>     writefln( "%x", b );          // 21 bytes, 12 runes
> ?
>
> why u decide to fight with autodecoding?

One of the reasons is that it adds unnecessary complexity for templated code that is working with ranges. Check function prototypes for some algorithms found in std.algorithm package, you're bound to find special treatment for autodecoding strings. It also messes up user expectation when suddenly applying a range function on a string instead of front char you're getting dchar.

Best regards,
Alexandru
August 13, 2019
On Tuesday, 13 August 2019 at 07:51:23 UTC, Alexandru Ermicioi wrote:
> On Tuesday, 13 August 2019 at 07:31:28 UTC, a11e99z wrote:
>> On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright
>
> One of the reasons is that it adds unnecessary complexity for templated code that is working with ranges. Check function prototypes for some algorithms found in std.algorithm package, you're bound to find special treatment for autodecoding strings. It also messes up user expectation when suddenly applying a range function on a string instead of front char you're getting dchar.
>

imo this is a contrived problem.
string contains chars, not in meaning "char" as type but runes or codepoints.
and world is not perfect so chars/runes are stored as utf8 codepoints.

in world where "char" is alias for "byte"/"ubyte" such vision was a problem:
  is this buffer string(seq of chars) or just raw bytes? how it should be enumerated?
but we have better world with different bytes and chars.

probably better was naming for "char" as "utf8cp"/orSomething (don't mix with C/C++ type)
and when u/anybody see string from that point everything falls into place.

I don't see problem that str.front returns codepoint from 0..0x10ffff and when str.length returns 21 and str.count=12. but somebody see problem here, so again this is a contrived problem. and for now this vision problem will recreate/recheck tons of code.
I thought that WB don't want change code peremptorily. Should be BIG problem when he does.
August 13, 2019
On Tuesday, August 13, 2019 2:52:58 AM MDT a11e99z via Digitalmars-d wrote:
> On Tuesday, 13 August 2019 at 07:51:23 UTC, Alexandru Ermicioi
>
> wrote:
> > On Tuesday, 13 August 2019 at 07:31:28 UTC, a11e99z wrote:
> >> On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright
> >
> > One of the reasons is that it adds unnecessary complexity for templated code that is working with ranges. Check function prototypes for some algorithms found in std.algorithm package, you're bound to find special treatment for autodecoding strings. It also messes up user expectation when suddenly applying a range function on a string instead of front char you're getting dchar.
>
> imo this is a contrived problem.
> string contains chars, not in meaning "char" as type but runes or
> codepoints.
> and world is not perfect so chars/runes are stored as utf8
> codepoints.
>
> in world where "char" is alias for "byte"/"ubyte" such vision was
> a problem:
>    is this buffer string(seq of chars) or just raw bytes? how it
> should be enumerated?
> but we have better world with different bytes and chars.
>
> probably better was naming for "char" as "utf8cp"/orSomething
> (don't mix with C/C++ type)
> and when u/anybody see string from that point everything falls
> into place.
>
> I don't see problem that str.front returns codepoint from
> 0..0x10ffff and when str.length returns 21 and str.count=12. but
> somebody see problem here, so again this is a contrived problem.
> and for now this vision problem will recreate/recheck tons of
> code.
> I thought that WB don't want change code peremptorily. Should be
> BIG problem when he does.

Code points are almost always the wrong level to be operating at. Many algorithms can operate at the code unit level with no problem, whereas those that require decoding usually need to operate at the grapheme level so that the actual, conceptual characters are being compared. Just like code units aren't necessarily full characters, code points aren't necessarily full characters.

Auto-decoding was introduced, because at the time, Andrei did not have a solid enough understanding of Unicode and thought that code points were always entire characters and didn't know about graphemes. Having auto-decoding has caused us tons of problems. It's inefficient, gives a false sense of code correctness, requires special-casing all over the place, and the whole "narrow string" concept causes all kinds of grief where algorithms don't work properly with strings, because they don't consider them to be random access, have a different type for their range element type than for their actual element type, etc. Pretty much all of the big D contributors have thought for years now that auto-decoding was a mistake, and we've wanted to get rid of it. Many of us actually thought that autodecoding was a good idea at first, but we've all come to understand how terrible it is. Walter is one of the few that understood from the get-go, but he wasn't paying much attention to Phobos (since he usually focuses on the compiler) and didn't catch Andrei's mistake. If he had, autodecoding would never have been a thing in Phobos.

The only reason that auto-decoding still exists in Phobos is because of how hard it is to remove without breaking code. Making Phobos not rely on autodecoding and making it so that it will work regardless of whether the character type for a range is char, wchar, dchar, or a grapheme is exactly what we need to be doing. Some work has been done in that direction already but nowhere near enough. Once that's done, then we can look at how to fully remove autodecoding, be it Phobos v2 (which Andrei has already proposed) or some other clever solution. But regardless of how we go about removing auto-decoding - or even if we ultimately end up leaving it in place - we need to make Phobos autodecoding-agnostic so that it's not forced on everything.

- Jonathan M Davis



August 13, 2019
On Tue, Aug 13, 2019 at 9:35 AM a11e99z via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>
> imo autodecoding is one of right thing.
> maybe will be better to leave it as is and just to add
> > immutable(ubyte)[] bytes( string str ) @nogc nothrow {
> >     return *cast( immutable(ubyte)[]* )&str;
> > }
> and use it as
> > foreach( b; "Привет, Мир!".bytes) // Hello world in RU
> >     writefln( "%x", b );          // 21 bytes, 12 runes
> ?
>
> why u decide to fight with autodecoding?
>

I hate autodecoding for many reason, one of them it is not done right:

https://run.dlang.io/is/IHECPf

```
import std.stdio;
void main()
{
    string strd = "é🜢🜢࠷❻𐝃";
    size_t cnt;
    foreach(i, wchar c; strd)
    {
        write(i);
    }

    writeln("");
    foreach(i, char c; strd)
    {
        write(i);
    }
    writeln("");
    foreach(i, dchar c; strd)
    {
        write(i);
    }
}
```

August 13, 2019
On Tuesday, 13 August 2019 at 09:15:30 UTC, Jonathan M Davis wrote:
> On Tuesday, August 13, 2019 2:52:58 AM MDT a11e99z via Digitalmars-d wrote:
>> On Tuesday, 13 August 2019 at 07:51:23 UTC, Alexandru Ermicioi
>>
> we've wanted to get rid of it. Many of us actually thought that autodecoding was a good idea at first, but we've all come to
>

thx for explanations.
probably I am on this stage too.
ok. I can live with .byRunes and .byBytes

August 13, 2019
On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright wrote:
> We don't yet have a good plan on how to remove autodecoding and yet provide backward compatibility with autodecoding-reliant projects, but one thing we can do is make Phobos work properly with and without autodecoding.
>
> To that end, I created a build of Phobos that disables autodecoding:
>
> https://github.com/dlang/phobos/pull/7130
>
> Of course, it fails. If people want impactful things to work on, fixing each failure is worthwhile (each in separate PRs).
>
> Note that this is neither trivial nor mindless code editing. Each case has to be examined as to why it is doing autodecoding, is autodecoding necessary, and deciding to replace it with byChar, byDchar, or simply hardcoding the decoding logic.



Thanks for your effort toward this direction I once a massive this discussion on auto decoding.

Recently I have witnessed a massive effort from you, Andrei and the entire community on the D language.

I must confess you have a beautiful language already. The D language promises a lot by its elegance, compilation speed, speed, generic and multiple programming techniques supported.

I don't have a problem with the language that much but with the libraries, tutorial, documentation, ide. Each time I download the library from fun packages almost 90% there must be one error or another.


I will be happy if the tools and library just work out of the box. The tools, the library should be set up that a novice like me can use them.

I don't have much expertise in programming so I can contribute to D for the now



August 13, 2019
On Tuesday, 13 August 2019 at 11:01:30 UTC, GreatSam4sure wrote:
> On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright wrote:
>> [...]
>
>
>
> Thanks for your effort toward this direction I once a massive this discussion on auto decoding.
>
> Recently I have witnessed a massive effort from you, Andrei and the entire community on the D language.
>
> I must confess you have a beautiful language already. The D language promises a lot by its elegance, compilation speed, speed, generic and multiple programming techniques supported.
>
> I don't have a problem with the language that much but with the libraries, tutorial, documentation, ide. Each time I download the library from fun packages almost 90% there must be one error or another.
>
>
> I will be happy if the tools and library just work out of the box. The tools, the library should be set up that a novice like me can use them.
>
> I don't have much expertise in programming so I can contribute to D for the now

I started to create github issues every time I see some errors on libraries. This already helps a lot.

What really would be useful is to see the build status of libraries on code.dlang.org.
With the new CI/CD functionality of Github, (free for open source projects), this becomes are lot more feasible and easy to setup.

Kind regards
Andre
August 13, 2019
On Tue, Aug 13, 2019 at 07:31:28AM +0000, a11e99z via Digitalmars-d wrote: [...]
> imo autodecoding is one of right thing.
[...]
> why u decide to fight with autodecoding?

Because it *appears* to be right, but it's actually wrong. For example:

	import std.range : retro;
	import std.stdio;

	void main() {
		writeln("привет".retro);
		writeln("приве́т".retro);
	}

Expected output:
	тевирп
	те́вирп

Actual output:
	тевирп
	т́евирп

The problem is that autodecoding makes the assumption that Unicode code point == grapheme, but this is not true. It's usually true for European languages, but it fails for many other languages.  So auto-decoding gives you the illusion of correctness, but when you ship your product to Asia suddenly you get a ton of bug reports.

To guarantee correctness you need to work with graphemes (see .byGrapheme). But we can't make that the default because it's a big performance hit, and many string algorithms don't actually need grapheme segmentation.

Ultimately, the correct solution is to put the onus on the programmer to select the iteration scheme (by code units, code points, or graphemes) depending on what's actually needed at the application level. Arbitrarily choosing one of them to be the default leads to a false sense of security.


T

-- 
That's not a bug; that's a feature!
« First   ‹ Prev
1 2 3 4 5 6 7 8 9