[Issue 17861] UTF Decode fails with exception (page 2)

https://issues.dlang.org/show_bug.cgi?id=17861 --- Comment #11 from Etienne <etcimon@gmail.com> --- If the current idea is to not fix the bugs due to possible breakage, why have a bug tracker for druntime in the first place? Also, what's the point of having unit tests if you can't rely on them? --

https://issues.dlang.org/show_bug.cgi?id=17861 --- Comment #12 from Jonathan M Davis <issues.dlang@jmdavisProg.com> --- (In reply to Etienne from comment #11) > If the current idea is to not fix the bugs due to possible breakage, why have a bug tracker for druntime in the first place? The current behavior is not a bug. The code is functioning exactly as designed. That design is arguably a bad design, and many of us would like to change it, but changing it would break existing code, so it is unlikely that it will be changed. There simply isn't a good deprecation path that would allow us to go from one behavior to the other - certainly no one has come up with one thus far. > Also, what's the point of having unit tests if you can't rely on them? What unit test are you referring to? Nothing about the current behavior of foreach and decoding code points should make it so that unit tests are unreliable. foreach is completely consistent in what it does. It's just that it's designed to do something that we wouldn't design it to do if we were doing things from scratch. You weren't previously aware that foreach threw when decoding invalid UTF. Now, you are, and you can write your code accordingly. The information about foreach throwing when decoding invalid UTF should be in the spec, but I don't know if it is or not. The spec doesn't always have the information that it should, but this is how foreach was designed and has worked ever since it was made so that it could decode code points. And it's the intended behavior until such time as we can figure out how to move to using the replacement character without breaking code in the process, which unfortunately, may very well be never. Right now, literally, our best option that would involve making the change would be to make the change and warn in the changelog that that's what we're doing, and anyone reading it would then have the opportunity to scour their code to see if they needed to change it as a result. The breakage would be silent and easy to miss even if in many cases, it wouldn't matter. And as such, thus far, that solution has been deemed unacceptable. So, if you know of a way to make it so that foreach can be changed to use the replacement character without silently breaking code, then great. We'd love to hear it. As it stands, this is one of those design decisions that we regret in retrospect but seem to be stuck with. --

https://issues.dlang.org/show_bug.cgi?id=17861 --- Comment #13 from Etienne <etcimon@gmail.com> --- You have to choose whether it's a bug or a feature. I think everyone is ready to live with that, but if you live up to it and consider it a feature it'll have to be documented. Just a 1 liner somewhere saying "Foreach (string) can throw unicode errors!" That'll be a good solution to this issue, because right now everyone is forced to learn it the hard way. This being said, I don't see Google Chrome crashing every time it sees an invalid code point. I'm not sure anyone would think about catching that on the first try if they were to do an Ajax call. I'm also pretty sure they'd be happy with the code path where it doesn't throw when the invalid code point comes up. If you know of anyone doing software specifically for unicode valiation, maybe they'd need to be warned but that's about it for me. So yeah, just wave it as a feature or squash the bug, but don't stay in between forever. --

https://issues.dlang.org/show_bug.cgi?id=17861 --- Comment #14 from Jon Degenhardt <jrdemail2000-dlang@yahoo.com> --- Changing the default behavior for the individual functions would cause backward compatibility issues. Any thoughts on having run-time selectable behavior that would override the defaults? The default behavior could be left unchanged. The two issues that come to mind: - Functions currently nothrow could lose that status if throw is an option. - Performance: Compile-time choices are faster than run-time. The advantage of a run-time selectable behavior is that it would support the need many programs have for an application specific behavior. There is no single default appropriate for all cases. --

October 03, 2017

[Issue 17861] UTF Decode fails with exception

Posted by Jonathan M Davis

Permalink

Jonathan M Davis

Permalink

https://issues.dlang.org/show_bug.cgi?id=17861

--- Comment #15 from Jonathan M Davis <issues.dlang@jmdavisProg.com> ---
(In reply to Jon Degenhardt from comment #14)
> Changing the default behavior for the individual functions would cause backward compatibility issues. Any thoughts on having run-time selectable behavior that would override the defaults? The default behavior could be left unchanged.
> 
> The two issues that come to mind:
> - Functions currently nothrow could lose that status if throw is an option.
> - Performance: Compile-time choices are faster than run-time.
> 
> The advantage of a run-time selectable behavior is that it would support the need many programs have for an application specific behavior. There is no single default appropriate for all cases.

In general, Walter is against having flags that determine the behavior of the language, and that's essentially what you're suggesting, even if it's set at runtime rather than at compile time. The reality of the matter is that as much as the current behavior sucks, it's trivial to work around it by calling decode yourself. So, I really don't see any reason to make it configurable. That would just make it so that you don't know what the code is designed to do when you look at it.

I think that it's far better to just be clear on how UTF decoding works in D than to try and make anything at the language level configurable. The standard library already provides the tools necessary to allow the programmer to choose how they want to handle invalid UTF, even if the defaults aren't exactly ideal.

(In reply to Etienne from comment #13)
> You have to choose whether it's a bug or a feature. I think everyone is ready to live with that, but if you live up to it and consider it a feature it'll have to be documented. Just a 1 liner somewhere saying "Foreach (string) can throw unicode errors!"
> 
> That'll be a good solution to this issue, because right now everyone is forced to learn it the hard way.
> 
> This being said, I don't see Google Chrome crashing every time it sees an invalid code point. I'm not sure anyone would think about catching that on the first try if they were to do an Ajax call. I'm also pretty sure they'd be happy with the code path where it doesn't throw when the invalid code point comes up. If you know of anyone doing software specifically for unicode valiation, maybe they'd need to be warned but that's about it for me.
> 
> So yeah, just wave it as a feature or squash the bug, but don't stay in between forever.

If the spec isn't clear about the fact that decoding invalid UTF with foreach will throw an exception, then the spec needs to be updated accordingly, but the current behavior is very much as designed and not a bug. I have no idea if the spec says anything about invalid UTF or not. I'd have to comb through it to know for sure. But the spec is often missing details that it should have, and sometimes, when it does say something, it's concise enough in what it says that it's easily missed. It wouldn't surprise me at all if it were stated somewhere in there, and you just missed it, and it wouldn't surprise me if it's not there. Regardless, I completely agree that the spec should be clear on the matter.

--

https://issues.dlang.org/show_bug.cgi?id=17861 Iain Buclaw <ibuclaw@gdcproject.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P1 |P4 --

https://issues.dlang.org/show_bug.cgi?id=17861 --- Comment #16 from dlangBugzillaToGithub <robert.schadek@posteo.de> --- THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/17351 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB --

Forums