Jump to page: 1 212  
Page
Thread overview
dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead
Nov 04, 2021
Walter Bright
Nov 04, 2021
Adam D Ruppe
Nov 04, 2021
Dukc
Nov 04, 2021
FeepingCreature
Nov 04, 2021
FeepingCreature
Nov 05, 2021
Walter Bright
Nov 05, 2021
Mathias LANG
Nov 05, 2021
Walter Bright
Nov 05, 2021
max haughton
Nov 05, 2021
Walter Bright
Nov 05, 2021
Imperatorn
Nov 05, 2021
max haughton
Nov 05, 2021
Walter Bright
Nov 05, 2021
Ali Çehreli
Nov 06, 2021
max haughton
Nov 06, 2021
Walter Bright
Nov 08, 2021
Atila Neves
Nov 08, 2021
max haughton
Nov 05, 2021
FeepingCreature
Nov 05, 2021
Paolo Invernizzi
Nov 05, 2021
FeepingCreature
Nov 05, 2021
FeepingCreature
Nov 05, 2021
norm
Nov 05, 2021
Dennis
Nov 05, 2021
Paolo Invernizzi
Nov 05, 2021
Dukc
Nov 07, 2021
Abdulhaq
Nov 07, 2021
Walter Bright
Nov 07, 2021
kdevel
Nov 07, 2021
Imperatorn
Nov 07, 2021
Walter Bright
Nov 08, 2021
Imperatorn
Nov 07, 2021
Walter Bright
Nov 08, 2021
FeepingCreature
Nov 08, 2021
FeepingCreature
Nov 08, 2021
FeepingCreature
Nov 08, 2021
kdevel
Nov 05, 2021
User
Nov 04, 2021
Elronnd
Nov 04, 2021
rikki cattermole
Nov 04, 2021
rikki cattermole
Nov 04, 2021
Adam D Ruppe
Nov 05, 2021
Walter Bright
Nov 05, 2021
deadalnix
Nov 05, 2021
Walter Bright
Nov 05, 2021
deadalnix
Nov 06, 2021
Walter Bright
Nov 06, 2021
Walter Bright
Nov 06, 2021
deadalnix
Nov 06, 2021
deadalnix
Nov 04, 2021
Elronnd
Nov 05, 2021
Walter Bright
Nov 04, 2021
zjh
Nov 07, 2021
zjh
Nov 07, 2021
jfondren
Nov 07, 2021
zjh
Nov 07, 2021
jfondren
Nov 04, 2021
jfondren
Nov 05, 2021
Walter Bright
Nov 05, 2021
deadalnix
Nov 05, 2021
Adam D Ruppe
Nov 05, 2021
deadalnix
Nov 05, 2021
Dukc
Nov 05, 2021
deadalnix
Nov 05, 2021
Adam D Ruppe
Nov 05, 2021
jfondren
Nov 05, 2021
ag0aep6g
Nov 05, 2021
Guillaume Piolat
Nov 05, 2021
Guillaume Piolat
Nov 05, 2021
Elronnd
Nov 05, 2021
Elronnd
Nov 10, 2021
Guillaume Piolat
Nov 11, 2021
Elronnd
Nov 12, 2021
kdevel
Nov 15, 2021
FeepingCreature
Nov 15, 2021
kdevel
Nov 15, 2021
user1234
Nov 15, 2021
user1234
Nov 15, 2021
FeepingCreature
Nov 15, 2021
user1234
Nov 15, 2021
kdevel
Nov 15, 2021
kdevel
Nov 15, 2021
Imperatorn
Nov 06, 2021
Alexey
Nov 06, 2021
Alexey
Nov 06, 2021
rikki cattermole
Nov 06, 2021
Alexey
Nov 06, 2021
H. S. Teoh
Nov 06, 2021
Alexey
Nov 06, 2021
Patrick Schluter
Nov 06, 2021
jfondren
Nov 06, 2021
Alexey
Nov 06, 2021
Vladimir Panteleev
Nov 07, 2021
Walter Bright
November 03, 2021
https://issues.dlang.org/show_bug.cgi?id=22473

I've tried to fix this before, but too many people objected.

Are we fed up with this yet? I sure am.

Who wants to take up this cudgel and fix the durned thing once and for all?

(It's unclear if it would even break existing code.)
November 04, 2021
On Thursday, 4 November 2021 at 02:26:20 UTC, Walter Bright wrote:
> I've tried to fix this before, but too many people objected.

I proposed a few days ago that Phobos autodecoding, if not completely removed, do this exact same thing too.

I agree it is a good idea. If you want an exception, it is easy enough to just check it in the loop and throw then.

Let's do it.
November 04, 2021
On Thursday, 4 November 2021 at 02:26:20 UTC, Walter Bright wrote:
> https://issues.dlang.org/show_bug.cgi?id=22473
>
> I've tried to fix this before, but too many people objected.
>
> Are we fed up with this yet? I sure am.
>
> Who wants to take up this cudgel and fix the durned thing once and for all?
>
> (It's unclear if it would even break existing code.)

I still think this is a mistake.

One may disagree about autodecoding; I for one think it's a sensible idea. However, a program should either process data correctly or, if that is impossible, not at all. It should not, ever, silently modify it "for you" while reading! I predict this will lead to cryptic, hair-pulling bugs in user code involving replacement characters appearing far downstream of the error site.
November 04, 2021
On Thursday, 4 November 2021 at 05:34:29 UTC, FeepingCreature wrote:
> One may disagree about autodecoding; I for one think it's a sensible idea. However, a program should either process data correctly or, if that is impossible, not at all. It should not, ever, silently modify it "for you" while reading! I predict this will lead to cryptic, hair-pulling bugs in user code involving replacement characters appearing far downstream of the error site.

(This is floating point NaN all over again!)
November 04, 2021
On Thursday, 4 November 2021 at 02:26:20 UTC, Walter Bright wrote:
> https://issues.dlang.org/show_bug.cgi?id=22473
>
> I've tried to fix this before, but too many people objected.
>
> Are we fed up with this yet? I sure am.
>
> Who wants to take up this cudgel and fix the durned thing once and for all?
>
> (It's unclear if it would even break existing code.)

Assuming the comment by Ali on the linked bug is right, I think the current behaviour is correct.

Your complaints:

> It can't be turned off

Sure it can.  You can choose to iterate in another fashion; say, by creating your own iterator which folds invalid utf8 into replacement characters.

> it throws

Is it better to produce an incorrect result?

A high-quality, non-throwing mechanism for error handling exists.  It consists of an _optional_ value which must be explicitly unwrapped.  It is also an out-of-band signal; how will I distinguish invalid utf8 from a correctly-encoded replacement character?

> it may allocate with the gc

So?  If that is the sort of thing you care about, then you will @nogc and find an alternate solution.  Lots of core language features allocate, like arrays and hash tables.

> it's slow

In the hot path it's the same speed.  In the slow path, performance doesn't matter.  In any case, it's useless to give an incorrect result faster.


(Notably, this is not exactly _auto_ decoding; it is explicitly requested decoding.  And your proposed modification doesn't change that fact.)


What is (potentially) questionable imo is that given foreach (c; a), c will be inferred to be dchar; you have to explicitly ask for char.  Perhaps that default should be reversed.  (This will definitely break code, though, and may not be worth it.)

If you want an iterator that generates replacement characters for invalid utf8, just create one.  But the default translation should be faithful, and that means not generating any result if none can be generated.
November 04, 2021
Part of the problem, as mentioned, is that this throws away information, because text may legitimately contain replacement characters.  (And this makes the 'check if replacement char and throw yourself' approach a non-starter).  But there are lossless encodings.  I think if we are really going to go this route, we should use something like raku's utf8-c8 (https://docs.raku.org/language/unicode#UTF8-C8).
November 04, 2021

On Thursday, 4 November 2021 at 02:26:20 UTC, Walter Bright wrote:

>

https://issues.dlang.org/show_bug.cgi?id=22473

string, as a language part, should not be encoded at all. It is (8-bit) byte directly.
The standard library implement the required 'coding string'.
In this way, other people need various "coding strings", so they import the "coding strings" in the "standard library"
Just because the code is not 'utf8', and then you can't write d's program, it's terrible.

November 05, 2021
On 04/11/2021 8:51 PM, Elronnd wrote:
> What is (potentially) questionable imo is that given foreach (c; a), c will be inferred to be dchar; you have to explicitly ask for char.  Perhaps that default should be reversed.  (This will definitely break code, though, and may not be worth it.)

I think this is the right answer.

Fix the default. Less surprises, less head aches, everyone is happy.
November 04, 2021
On Thursday, 4 November 2021 at 07:51:11 UTC, Elronnd wrote:
> What is (potentially) questionable imo is that given foreach (c; a), c will be inferred to be dchar; you have to explicitly ask for char.  Perhaps that default should be reversed.  (This will definitely break code, though, and may not be worth it.)

That's not true. It will always be the type of the thing:

void main() {
        foreach(a; "test")
                pragma(msg, typeof(a)); // immutable(char) NOT dchar
}

November 05, 2021
On 05/11/2021 12:59 AM, rikki cattermole wrote:
> 
> On 04/11/2021 8:51 PM, Elronnd wrote:
>> What is (potentially) questionable imo is that given foreach (c; a), c will be inferred to be dchar; you have to explicitly ask for char. Perhaps that default should be reversed.  (This will definitely break code, though, and may not be worth it.)
> 
> I think this is the right answer.
> 
> Fix the default. Less surprises, less head aches, everyone is happy.

Correction: the default is correct, I checked.
« First   ‹ Prev
1 2 3 4 5 6 7 8 9 10 11