Thread overview
[Issue 4483] foreach over string or wstring, where element type not specified, does not support unicode
Jul 16, 2014
Sobirari Muhomori
Jul 16, 2014
Sobirari Muhomori
Aug 05, 2014
Denis Shelomovskij
Apr 06, 2016
Dmitry Olshansky
Apr 06, 2016
Sobirari Muhomori
Apr 06, 2016
Dmitry Olshansky
Dec 17, 2022
Iain Buclaw
July 16, 2014
https://issues.dlang.org/show_bug.cgi?id=4483

--- Comment #14 from github-bugzilla@puremagic.com ---
Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/eca3a10dd232541036c0bbc00cb7a4236d0f2276
Fix Issue 4483 - foreach over string or wstring, where element type not
specified

std.process does not support Unicode and assume it's iterating over ASCII.

https://github.com/D-Programming-Language/phobos/commit/e3cdb418ea175ae8c4020973be6587cfd66779cc Merge pull request #1873 from lionello/bug4483

Preparation for issue 4483, specifying foreach char iteration type

--
July 16, 2014
https://issues.dlang.org/show_bug.cgi?id=4483

--- Comment #15 from Sobirari Muhomori <dfj1esp02@sneakemail.com> ---
(In reply to Martin Nowak from comment #12)
> The problem with UTF is that you need to handle it correctly from A to B.

The problem is dchar makes it harder to handle UTF correctly, because it's more subtly incorrect, hence more difficult to fix. Without automatic decoding people are more likely to encounter failure during development and fix it before release.

--
July 16, 2014
https://issues.dlang.org/show_bug.cgi?id=4483

Sobirari Muhomori <dfj1esp02@sneakemail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|                            |http://forum.dlang.org/thre
                   |                            |ad/mailman.266.1319139465.2
                   |                            |4802.digitalmars-d@puremagi
                   |                            |c.com

--- Comment #16 from Sobirari Muhomori <dfj1esp02@sneakemail.com> ---
Added link to discussion.

--
August 05, 2014
https://issues.dlang.org/show_bug.cgi?id=4483

Denis Shelomovskij <verylonglogin.reg@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
                 CC|                            |verylonglogin.reg@gmail.com
         Resolution|FIXED                       |---

--- Comment #17 from Denis Shelomovskij <verylonglogin.reg@gmail.com> ---
Github commits closed this by a mistake. Reopened.

--
April 06, 2016
https://issues.dlang.org/show_bug.cgi?id=4483

Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh@gmail.com

--- Comment #18 from Dmitry Olshansky <dmitry.olsh@gmail.com> ---
(In reply to Walter Bright from comment #11)
> > You fail to recognize that it's broken from the begging.
> 
> and I know that std.regex suffered serious
> slowdowns because of it.

This turned out to be factually wrong, after I've spent a year and a half constructing a non-decoding version of std.regex for no significant gain ;) A brief exercise with a profiler shows decoding to be ~0.5% in a recent version, as Martin points out a single 100% predictable comparison is not a problem.

The only case where not decoding is faster is bulk-mode operations that can
take advantage of SIMD or auto-vectorization, such as:
a) Skipping comments (i.e. looping until '*' is hit then check for '/')
b) Comparing strings or searching for substring.

--
April 06, 2016
https://issues.dlang.org/show_bug.cgi?id=4483

--- Comment #19 from Sobirari Muhomori <dfj1esp02@sneakemail.com> ---
(In reply to Dmitry Olshansky from comment #18)
> b) Comparing strings or searching for substring.

Doesn't regex search for substring?

--
April 06, 2016
https://issues.dlang.org/show_bug.cgi?id=4483

--- Comment #20 from Dmitry Olshansky <dmitry.olsh@gmail.com> ---
(In reply to Sobirari Muhomori from comment #19)
> (In reply to Dmitry Olshansky from comment #18)
> > b) Comparing strings or searching for substring.
> 
> Doesn't regex search for substring?

No, it examines one codepoint at a time. There is a special prefix matcher that does indeed avoid decoding but is very limited in what it could do so it's used only to find possible start of a match.

--
December 17, 2022
https://issues.dlang.org/show_bug.cgi?id=4483

Iain Buclaw <ibuclaw@gdcproject.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P4

--