dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead (page 3)

Settings

Help

Index » General » dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead (page 3)

November 05, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by deadalnix
in reply to Adam D Ruppe

Permalink

deadalnix

Posted in reply to Adam D Ruppe

Permalink

On Friday, 5 November 2021 at 02:38:51 UTC, Adam D Ruppe wrote:
> On Friday, 5 November 2021 at 02:06:01 UTC, deadalnix wrote:
>> On Thursday, 4 November 2021 at 02:26:20 UTC, Walter Bright wrote:
>>> https://issues.dlang.org/show_bug.cgi?id=22473
>>
>> For the love of god, if you are going to make a breaking change there, just remove autodecoding altogether.
>
> This post isn't about autodecoding. With foreach, you opt into the decoding by specifically asking for it.

Very clearly it is, because if you don't decode, then you don't do replacement chars or exceptions.

November 04, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by Walter Bright
in reply to Mathias LANG

Permalink

Walter Bright

Posted in reply to Mathias LANG

Permalink

On 11/4/2021 7:41 PM, Mathias LANG wrote:
> If you want to fix it, just deprecate the special case and tell people to use `foreach (dchar d; someString.byUTF!(dchar, No.useReplacementDchar))` and voilà.
> And if they don't want it to throw, it's shorter:
> `foreach (dchar d; someString.byUTF!dchar)` (or `byDChar`).

People will always gravitate towards the smaller, simpler syntax. Like [] instead of std::vector<>.

November 05, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by max haughton
in reply to Walter Bright

Permalink

max haughton

Posted in reply to Walter Bright

Permalink

On Friday, 5 November 2021 at 04:02:44 UTC, Walter Bright wrote:
> On 11/4/2021 7:41 PM, Mathias LANG wrote:
>> If you want to fix it, just deprecate the special case and tell people to use `foreach (dchar d; someString.byUTF!(dchar, No.useReplacementDchar))` and voilà.
>> And if they don't want it to throw, it's shorter:
>> `foreach (dchar d; someString.byUTF!dchar)` (or `byDChar`).
>
> People will always gravitate towards the smaller, simpler syntax. Like [] instead of std::vector<>.

I have never observed this mistake in any C++ cod, unless you mean as a point of language design.

This decision should be guided by how current D programmers act rather than a hyperreal ideal of someone encountering the language.

November 04, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by Walter Bright
in reply to max haughton

Permalink

Walter Bright

Posted in reply to max haughton

Permalink

On 11/4/2021 9:11 PM, max haughton wrote:
> On Friday, 5 November 2021 at 04:02:44 UTC, Walter Bright wrote:
>> On 11/4/2021 7:41 PM, Mathias LANG wrote:
>>> If you want to fix it, just deprecate the special case and tell people to use `foreach (dchar d; someString.byUTF!(dchar, No.useReplacementDchar))` and voilà.
>>> And if they don't want it to throw, it's shorter:
>>> `foreach (dchar d; someString.byUTF!dchar)` (or `byDChar`).
>>
>> People will always gravitate towards the smaller, simpler syntax. Like [] instead of std::vector<>.
> 
> I have never observed this mistake in any C++ cod,

You've never observed people write:

   int array[3];

in C++ code?

> unless you mean as a point of language design.

D (still) has a rather verbose way of doing lambdas. People constantly complained that D didn't have lambdas. Until the => syntax was added, and suddenly lambdas in D became noticed and useful.

> This decision should be guided by how current D programmers act rather than a hyperreal ideal of someone encountering the language.

The only reason D's associative arrays continue to exist is because they are so darned syntactically convenient.

I've seen over and over and over that syntactic convenience matters a lot.

November 05, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by FeepingCreature
in reply to Walter Bright

Permalink

FeepingCreature

Posted in reply to Walter Bright

Permalink

On Friday, 5 November 2021 at 00:38:59 UTC, Walter Bright wrote:
> On 11/3/2021 10:41 PM, FeepingCreature wrote:
>> On Thursday, 4 November 2021 at 05:34:29 UTC, FeepingCreature wrote:
>>> One may disagree about autodecoding; I for one think it's a sensible idea. However, a program should either process data correctly or, if that is impossible, not at all. It should not, ever, silently modify it "for you" while reading! I predict this will lead to cryptic, hair-pulling bugs in user code involving replacement characters appearing far downstream of the error site.
>
> Surprisingly, the reverse seems to be true. Suppose you're writing a text editor. Then read a file with some bad UTF in it. The editor dies with an exception. You can't even edit the file to fix it.
>
> If you need to display user provided text, like in a browser, or all sorts of tools, you don't want to die with an exception. What are you going to do in an exception handler? You're just going to replace the offending bytes with ReplacementChar and go render it anyway.
>
>> (This is floating point NaN all over again!)
>
> Poor NaNs are terribly misunderstood.
>
> Suppose you have an array of sensors. One goes bad. The "bad" value is 0.0. So now your data analyzer is happily averaging 0.0 into the results, silently skewing them.
>
> Now, if a NaN is returned instead, your "average" will be NaN. You know it's no good. It won't be hidden.
>
> Uninitialized variables are sensors giving bad data. Having a NaN in your result is a *good* thing.

I think the program should crash in all these cases. The text editor should crash. The browser should crash. The analyzer should see a NaN, and crash.

These programs are *wrong.* They thought they could only get Unicode and they've gotten non-Unicode. So we know they're written on wrong assumptions; why do we want to continue running code we know is untrustworthy? Let them crash, let them be fixed to make fewer assumptions. Automagically handling errors by propagating them in an inert form robs the developers and users of a chance to avoid a mistake. It's no better than 0.0.

November 05, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by Imperatorn
in reply to Walter Bright

Permalink

Imperatorn

Posted in reply to Walter Bright

Permalink

On Friday, 5 November 2021 at 06:15:44 UTC, Walter Bright wrote:

On 11/4/2021 9:11 PM, max haughton wrote:

On Friday, 5 November 2021 at 04:02:44 UTC, Walter Bright wrote:

On 11/4/2021 7:41 PM, Mathias LANG wrote:

If you want to fix it, just deprecate the special case and tell people to use foreach (dchar d; someString.byUTF!(dchar, No.useReplacementDchar)) and voilà.
And if they don't want it to throw, it's shorter:
foreach (dchar d; someString.byUTF!dchar) (or byDChar).

People will always gravitate towards the smaller, simpler syntax. Like [] instead of std::vector<>.

I have never observed this mistake in any C++ cod,

You've never observed people write:

int array[3];

in C++ code?

unless you mean as a point of language design.

D (still) has a rather verbose way of doing lambdas. People constantly complained that D didn't have lambdas. Until the => syntax was added, and suddenly lambdas in D became noticed and useful.

This decision should be guided by how current D programmers act rather than a hyperreal ideal of someone encountering the language.

The only reason D's associative arrays continue to exist is because they are so darned syntactically convenient.

I've seen over and over and over that syntactic convenience matters a lot.

The value of convenience should not be underestimated

It's what enables productivity, which in my opinion should be the main metric of success. Everything else is just "fluff".

In how many seconds can you transform idea A into program B.

That is how you measure success imo.

It doesn't matter if you have a cool or super interesting way of achieving something, if person X is still trying to figure out how to do some cool thing while person Y is already done and focusing on the next thing, person X has lost.

Because, person Y always optimize and refractor later (before the deadline), but person X can't because the deadline is already over.

The value of convenience should not be underestimated

November 05, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by Guillaume Piolat
in reply to Walter Bright

Permalink

Guillaume Piolat

Posted in reply to Walter Bright

Permalink

On Thursday, 4 November 2021 at 02:26:20 UTC, Walter Bright wrote:
> https://issues.dlang.org/show_bug.cgi?id=22473
>
> I've tried to fix this before, but too many people objected.
>
> Are we fed up with this yet? I sure am.
>
> Who wants to take up this cudgel and fix the durned thing once and for all?
>
> (It's unclear if it would even break existing code.)

How about just assert(false)? It is @nogc and foreach over invalid utf-8 is a logic error (as you didn't sanitize).

November 05, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by Ola Fosheim Grøstad
in reply to Guillaume Piolat

Permalink

Ola Fosheim Grøstad

Posted in reply to Guillaume Piolat

Permalink

On Friday, 5 November 2021 at 09:34:31 UTC, Guillaume Piolat wrote:
> How about just assert(false)? It is @nogc and foreach over invalid utf-8 is a logic error (as you didn't sanitize).

It is even worse, it is a type error. If "utf-8" is to be a meaningful type you should be allowed to assume that it follows the spec.

November 05, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by Dukc
in reply to deadalnix

Permalink

Dukc

Posted in reply to deadalnix

Permalink

On Friday, 5 November 2021 at 03:02:07 UTC, deadalnix wrote:
> On Friday, 5 November 2021 at 02:38:51 UTC, Adam D Ruppe wrote:
>> This post isn't about autodecoding. With foreach, you opt into the decoding by specifically asking for it.
>
> Very clearly it is, because if you don't decode, then you don't do replacement chars or exceptions.

It's about decoding, but not autodecoding. Or at least not the same autodecoding we usually refer to. Autodecoding is the way Phobos v1 treats character arrays when they are used as ranges.

This is about an implicit conversion in the language itself.

November 05, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by Paolo Invernizzi
in reply to FeepingCreature

Permalink

Paolo Invernizzi

Posted in reply to FeepingCreature

Permalink

On Friday, 5 November 2021 at 06:30:02 UTC, FeepingCreature wrote:
> On Friday, 5 November 2021 at 00:38:59 UTC, Walter Bright wrote:
>> [...]
>
> I think the program should crash in all these cases. The text editor should crash. The browser should crash. The analyzer should see a NaN, and crash.
>
> These programs are *wrong.* They thought they could only get Unicode and they've gotten non-Unicode. So we know they're written on wrong assumptions; why do we want to continue running code we know is untrustworthy? Let them crash, let them be fixed to make fewer assumptions. Automagically handling errors by propagating them in an inert form robs the developers and users of a chance to avoid a mistake. It's no better than 0.0.

+1000

Top | Forum index | About this forum

Forums