dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead (page 4)

On Friday, 5 November 2021 at 06:30:02 UTC, FeepingCreature wrote: > I think the program should crash in all these cases. The text editor should crash. The browser should crash. The analyzer should see a NaN, and crash. No, NaN is completely different. You have two types of NaN, one is for signalling that data is missing in a dataset (received from the outside). The other is to convey that a computation failed (often caused by roundoff errors). To remove NaN from floating point is unworkable in the general case.

On Friday, 5 November 2021 at 06:30:02 UTC, FeepingCreature wrote: > On Friday, 5 November 2021 at 00:38:59 UTC, Walter Bright wrote: >> [...] > > I think the program should crash in all these cases. The text editor should crash. The browser should crash. The analyzer should see a NaN, and crash. > > These programs are *wrong.* They thought they could only get Unicode and they've gotten non-Unicode. So we know they're written on wrong assumptions; why do we want to continue running code we know is untrustworthy? Let them crash, let them be fixed to make fewer assumptions. Automagically handling errors by propagating them in an inert form robs the developers and users of a chance to avoid a mistake. It's no better than 0.0. It isn't always that simple, e.g. working on medical devices crashing isn't an option when it comes to how we're going to deal with bad data.

On Friday, 5 November 2021 at 09:57:45 UTC, Ola Fosheim Grøstad wrote: > On Friday, 5 November 2021 at 09:34:31 UTC, Guillaume Piolat wrote: >> How about just assert(false)? It is @nogc and foreach over invalid utf-8 is a logic error (as you didn't sanitize). > > It is even worse, it is a type error. If "utf-8" is to be a meaningful type you should be allowed to assume that it follows the spec. Well you only know that it is meant to be utf8 in the context of the auto-decoding foreach (which must still exist). string in actual programs may contains binary files, strings in other codepages encodings.

On Friday, 5 November 2021 at 10:13:13 UTC, Guillaume Piolat wrote: > Well you only know that it is meant to be utf8 in the context of the auto-decoding foreach (which must still exist). string in actual programs may contains binary files, strings in other codepages encodings. D needs to rethink strings. Newbies going for "scripty" programming really need an encapsulated strongly typed string type, accessed only through functions that do-the-right-thing. I think @safe/@system distinction would be more useful if @safe was for those that wanted a more "scripty" programming style and @system was for those that wanted a more "low level" programming style. On a related note, I also think it would be useful to have something stronger than @safe, like a @non-trojan marker for libraries, which basically says that it is impossible for that library to do evil and have that statically checked by the compiler. Then you could import libraries without caring about bad code. One issue I have with packages in smaller languages is that you don't have enough eyeballs on them, too easy for "evil" code to slip through (intentionally or not).

On Friday, 5 November 2021 at 10:09:40 UTC, norm wrote: > It isn't always that simple, e.g. working on medical devices crashing isn't an option when it comes to how we're going to deal with bad data. You can always validate the UTF beforehand if you don't want to crash.

On Friday, 5 November 2021 at 10:27:05 UTC, Dennis wrote: > On Friday, 5 November 2021 at 10:09:40 UTC, norm wrote: >> It isn't always that simple, e.g. working on medical devices crashing isn't an option when it comes to how we're going to deal with bad data. > > Oh no, let's not go there again. See this 44-page discussion: > > [Program logic bugs vs input/environmental errors](https://forum.dlang.org/post/m07gf1$18jl$1@digitalmars.com) Ehehe, the old good times :-P

On Friday, 5 November 2021 at 10:08:30 UTC, Ola Fosheim Grøstad wrote: > On Friday, 5 November 2021 at 06:30:02 UTC, FeepingCreature wrote: >> I think the program should crash in all these cases. The text editor should crash. The browser should crash. The analyzer should see a NaN, and crash. > > No, NaN is completely different. You have two types of NaN, one is for signalling that data is missing in a dataset (received from the outside). The other is to convey that a computation failed (often caused by roundoff errors). > > To remove NaN from floating point is unworkable in the general case. When I have to do numeric work and suspect NaNs in play, I like to `feenableexcept(FE_INVALID)`. Then every time a NaN arises in a computation, I get a nice SIGFPE.

On Friday, 5 November 2021 at 11:44:42 UTC, FeepingCreature wrote: > When I have to do numeric work and suspect NaNs in play, I like to `feenableexcept(FE_INVALID)`. Then every time a NaN arises in a computation, I get a nice SIGFPE. Yes, and the IEEE spec suggests that ones should be able to choose whether you get exceptions or compute with NaNs based on the nature of the application/computation. Regardless, as long as hardware follow IEEE and supports using NaN in calculations, you are better off playing up to the IEEE standard (for a modern system level language that means you should have easy access to both approaches).

November 05, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by Ola Fosheim Grøstad
in reply to Ola Fosheim Grøstad

Permalink

Ola Fosheim Grøstad

Posted in reply to Ola Fosheim Grøstad

Permalink

On Friday, 5 November 2021 at 11:54:21 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 5 November 2021 at 11:44:42 UTC, FeepingCreature wrote:
>> When I have to do numeric work and suspect NaNs in play, I like to `feenableexcept(FE_INVALID)`. Then every time a NaN arises in a computation, I get a nice SIGFPE.
>
> Yes, and the IEEE spec suggests that ones should be able to choose whether you get exceptions or compute with NaNs based on the nature of the application/computation. Regardless, as long as hardware follow IEEE and supports using NaN in calculations, you are better off playing up to the IEEE standard (for a modern system level language that means you should have easy access to both approaches).

To put some meat on this. The ideal is that you can have two implementations for the same computation, one fast and one robust. So ideally you should be able to do the computations with NaNs in expressions where the NaNs can disappear and use exceptions where they cannot disappear. If an exception occurs you fall back to the slower robust implementation. In reality you have to weigh in performance characteristic of the hardware so… very much system level programming and not only a choice that can be done on the language level.

For instance in raytracing I would want NaNs. Then I can make a choice based on neighbouring pixels whether I want to compute it again using a slower method or simply fill it in with the average of the neighbours (if all the neighbours have roughly the same colour).

Forums