July 17, 2015
https://issues.dlang.org/show_bug.cgi?id=14519

Martin Nowak <code@dawg.eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[Enh] foreach on strings    |Get rid of unicode
                   |should return               |validation in string
                   |replacementDchar rather     |processing
                   |than throwing               |

--
July 17, 2015
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #31 from Martin Nowak <code@dawg.eu> ---
(In reply to Martin Nowak from comment #30)
> Well, b/c they contain delimited binary and ASCII data, you'll have to find those delimiters, then validate and cast the ASCII part to a string, and can then use std.string functions.

BTW, this is what I already wrote in comment 23. Not sure why you only partially quoted my answer to suggest a contradiction.

--
July 17, 2015
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #32 from Martin Nowak <code@dawg.eu> ---
Summary:

We should adopt a new model of unicode validations.
The current one where every string processing function decodes unicode
characters and performs validation causes too much overhead.
A better alternative would be to perform unicode validation once when reading
raw data (ubyte[]) and then assume any char[]/wchar[]/dchar[] is a valid
unicode string.
Invalid encodings introduced by string processing algorithms are programming
bugs and thus do not warrant runtime checks in release builds.

Also see

https://github.com/D-Programming-Language/druntime/pull/1279

--
July 17, 2015
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #33 from Sobirari Muhomori <dfj1esp02@sneakemail.com> ---
Removing autodecoding is good, but this issue is about making autodecode @nothrow @nogc.

--
July 17, 2015
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #34 from Vladimir Panteleev <thecybershadow@gmail.com> ---
(In reply to Martin Nowak from comment #31)
> BTW, this is what I already wrote in comment 23. Not sure why you only partially quoted my answer to suggest a contradiction.

Err, well, to be fair, you did not state this clearly in comment 23, which is why I asked for a clarification. I was not trying to maliciously nitpick your words, just tried to understand your point.

--
July 17, 2015
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #35 from Jonathan M Davis <issues.dlang@jmdavisProg.com> ---
(In reply to Martin Nowak from comment #32)
> Summary:
> 
> We should adopt a new model of unicode validations.
> The current one where every string processing function decodes unicode
> characters and performs validation causes too much overhead.
> A better alternative would be to perform unicode validation once when
> reading raw data (ubyte[]) and then assume any char[]/wchar[]/dchar[] is a
> valid unicode string.
> Invalid encodings introduced by string processing algorithms are programming
> bugs and thus do not warrant runtime checks in release builds.

Exactly.

--
July 17, 2015
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #36 from Vladimir Panteleev <thecybershadow@gmail.com> ---
Question, is there any overhead in actually verifying the validity of UTF-8 streams, or is all overhead related to error handling (i.e. inability to be nothrow)?

--
August 19, 2015
https://issues.dlang.org/show_bug.cgi?id=14519

Vladimir Panteleev <thecybershadow@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://issues.dlang.org/sh
                   |                            |ow_bug.cgi?id=14919

--
May 18, 2016
https://issues.dlang.org/show_bug.cgi?id=14519

Jack Stouffer <jack@jackstouffer.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jack@jackstouffer.com

--- Comment #37 from Jack Stouffer <jack@jackstouffer.com> ---
This entire discussion is moot unless you get Andrei on board with a breaking change to a very fundamental part of the language.

--
May 20, 2016
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #38 from Martin Nowak <code@dawg.eu> ---
(In reply to Vladimir Panteleev from comment #36)
> Question, is there any overhead in actually verifying the validity of UTF-8 streams, or is all overhead related to error handling (i.e. inability to be nothrow)?

I think it's fairly measurable b/c you need to add lots of additional checks
and branches (though highly predictable ones).
While my initial decode implementation
https://github.com/MartinNowak/phobos/blob/1b0edb728c/std/utf.d#L577-L651 was
transmogrify into 200 lines in the meantime
https://github.com/dlang/phobos/blob/acafd848d8/std/utf.d#L1167-L1369, you can
still use it to benchmark validation.
I did run a lot of benchmarks when introducing that function, and the code path
for decoding just remains slow, even with the throwing code path removed out of
normal control flow.

--