July 17, 2015 [Issue 14519] Get rid of unicode validation in string processing | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=14519 Martin Nowak <code@dawg.eu> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|[Enh] foreach on strings |Get rid of unicode |should return |validation in string |replacementDchar rather |processing |than throwing | -- |
July 17, 2015 [Issue 14519] Get rid of unicode validation in string processing | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #31 from Martin Nowak <code@dawg.eu> --- (In reply to Martin Nowak from comment #30) > Well, b/c they contain delimited binary and ASCII data, you'll have to find those delimiters, then validate and cast the ASCII part to a string, and can then use std.string functions. BTW, this is what I already wrote in comment 23. Not sure why you only partially quoted my answer to suggest a contradiction. -- |
July 17, 2015 [Issue 14519] Get rid of unicode validation in string processing | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #32 from Martin Nowak <code@dawg.eu> --- Summary: We should adopt a new model of unicode validations. The current one where every string processing function decodes unicode characters and performs validation causes too much overhead. A better alternative would be to perform unicode validation once when reading raw data (ubyte[]) and then assume any char[]/wchar[]/dchar[] is a valid unicode string. Invalid encodings introduced by string processing algorithms are programming bugs and thus do not warrant runtime checks in release builds. Also see https://github.com/D-Programming-Language/druntime/pull/1279 -- |
July 17, 2015 [Issue 14519] Get rid of unicode validation in string processing | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #33 from Sobirari Muhomori <dfj1esp02@sneakemail.com> --- Removing autodecoding is good, but this issue is about making autodecode @nothrow @nogc. -- |
July 17, 2015 [Issue 14519] Get rid of unicode validation in string processing | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #34 from Vladimir Panteleev <thecybershadow@gmail.com> --- (In reply to Martin Nowak from comment #31) > BTW, this is what I already wrote in comment 23. Not sure why you only partially quoted my answer to suggest a contradiction. Err, well, to be fair, you did not state this clearly in comment 23, which is why I asked for a clarification. I was not trying to maliciously nitpick your words, just tried to understand your point. -- |
July 17, 2015 [Issue 14519] Get rid of unicode validation in string processing | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #35 from Jonathan M Davis <issues.dlang@jmdavisProg.com> --- (In reply to Martin Nowak from comment #32) > Summary: > > We should adopt a new model of unicode validations. > The current one where every string processing function decodes unicode > characters and performs validation causes too much overhead. > A better alternative would be to perform unicode validation once when > reading raw data (ubyte[]) and then assume any char[]/wchar[]/dchar[] is a > valid unicode string. > Invalid encodings introduced by string processing algorithms are programming > bugs and thus do not warrant runtime checks in release builds. Exactly. -- |
July 17, 2015 [Issue 14519] Get rid of unicode validation in string processing | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #36 from Vladimir Panteleev <thecybershadow@gmail.com> --- Question, is there any overhead in actually verifying the validity of UTF-8 streams, or is all overhead related to error handling (i.e. inability to be nothrow)? -- |
August 19, 2015 [Issue 14519] Get rid of unicode validation in string processing | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=14519 Vladimir Panteleev <thecybershadow@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://issues.dlang.org/sh | |ow_bug.cgi?id=14919 -- |
May 18, 2016 [Issue 14519] Get rid of unicode validation in string processing | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=14519 Jack Stouffer <jack@jackstouffer.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jack@jackstouffer.com --- Comment #37 from Jack Stouffer <jack@jackstouffer.com> --- This entire discussion is moot unless you get Andrei on board with a breaking change to a very fundamental part of the language. -- |
May 20, 2016 [Issue 14519] Get rid of unicode validation in string processing | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #38 from Martin Nowak <code@dawg.eu> --- (In reply to Vladimir Panteleev from comment #36) > Question, is there any overhead in actually verifying the validity of UTF-8 streams, or is all overhead related to error handling (i.e. inability to be nothrow)? I think it's fairly measurable b/c you need to add lots of additional checks and branches (though highly predictable ones). While my initial decode implementation https://github.com/MartinNowak/phobos/blob/1b0edb728c/std/utf.d#L577-L651 was transmogrify into 200 lines in the meantime https://github.com/dlang/phobos/blob/acafd848d8/std/utf.d#L1167-L1369, you can still use it to benchmark validation. I did run a lot of benchmarks when introducing that function, and the code path for decoding just remains slow, even with the throwing code path removed out of normal control flow. -- |
Copyright © 1999-2021 by the D Language Foundation