Thread overview | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
September 12, 2013 [Issue 11017] New: std.string/uni.toLower is very slow | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=11017 Summary: std.string/uni.toLower is very slow Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody@puremagic.com ReportedBy: peter.alexander.au@gmail.com --- Comment #0 from Peter Alexander <peter.alexander.au@gmail.com> 2013-09-12 10:52:33 PDT --- char[] s = new char[10_000_000]; s[] = 'A'; auto s2 = s.toLower; This takes 4.3 seconds on my machine. char[] s = new char[10_000_000]; s[] = 'A'; auto s2 = s.map!toLower.to!string; This only takes 1.1 seconds. Looking at the code for std.uni.toLower, it appears the string is constructed using repeated ~=. It should use an Appender of some sort. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
September 12, 2013 [Issue 11017] std.string/uni.toLower is very slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | http://d.puremagic.com/issues/show_bug.cgi?id=11017 Dmitry Olshansky <dmitry.olsh@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dmitry.olsh@gmail.com --- Comment #1 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-09-12 11:59:08 PDT --- (In reply to comment #0) > char[] s = new char[10_000_000]; > s[] = 'A'; > auto s2 = s.toLower; > > This takes 4.3 seconds on my machine. > > > char[] s = new char[10_000_000]; > s[] = 'A'; > auto s2 = s.map!toLower.to!string; > > This only takes 1.1 seconds. > There 2 things here to consider - first the 2nd one is not correct in general (1 codepoint can map to many e.g. german sharp S). > Looking at the code for std.uni.toLower, it appears the string is constructed using repeated ~=. It should use an Appender of some sort. This indeed could be fixed I do suspect put an optimisitc reserve(original.length) there would work even better. See also issue 10864: http://d.puremagic.com/issues/show_bug.cgi?id=10864 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
September 12, 2013 [Issue 11017] std.string/uni.toLower is very slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | http://d.puremagic.com/issues/show_bug.cgi?id=11017 --- Comment #2 from Peter Alexander <peter.alexander.au@gmail.com> 2013-09-12 12:45:45 PDT --- (In reply to comment #1) > There 2 things here to consider - first the 2nd one is not correct in general (1 codepoint can map to many e.g. german sharp S). Good point, although std.uni.toUpper doesn't handle it either :-) assert("ß".toUpper == "ß"); // passes -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
September 12, 2013 [Issue 11017] std.string/uni.toLower is very slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | http://d.puremagic.com/issues/show_bug.cgi?id=11017 --- Comment #3 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-09-12 12:50:37 PDT --- (In reply to comment #2) > (In reply to comment #1) > > There 2 things here to consider - first the 2nd one is not correct in general (1 codepoint can map to many e.g. german sharp S). > > Good point, although std.uni.toUpper doesn't handle it either :-) > > assert("ß".toUpper == "ß"); // passes To Lower will do. Sharp S is capital ;) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
September 12, 2013 [Issue 11017] std.string/uni.toLower is very slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | http://d.puremagic.com/issues/show_bug.cgi?id=11017 --- Comment #4 from Peter Alexander <peter.alexander.au@gmail.com> 2013-09-12 12:52:31 PDT --- (In reply to comment #3) > To Lower will do. Sharp S is capital ;) assert("ß".toLower == "ß"); assert("ß".toUpper == "ß"); Both pass. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
September 12, 2013 [Issue 11017] std.string/uni.toLower is very slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | http://d.puremagic.com/issues/show_bug.cgi?id=11017 --- Comment #5 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-09-12 14:01:05 PDT --- (In reply to comment #4) > (In reply to comment #3) > > To Lower will do. Sharp S is capital ;) > > assert("ß".toLower == "ß"); > assert("ß".toUpper == "ß"); > > Both pass. Something wicked have happend. I see that I've messed up toUpper in table generator while introducing toTitleCase (that isn't even yet exposed!). toLower is fine, toUpper is broken in half of cases apparently. How I missed that I've no idea ... gotta expand the test coverage around toLower/toUpper. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
September 12, 2013 [Issue 11017] std.string/uni.toLower is very slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | http://d.puremagic.com/issues/show_bug.cgi?id=11017 --- Comment #6 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-09-12 14:07:17 PDT --- (In reply to comment #5) > (In reply to comment #4) > > (In reply to comment #3) > > > To Lower will do. Sharp S is capital ;) > > > > assert("ß".toLower == "ß"); > > assert("ß".toUpper == "ß"); > > > > Both pass. > > Something wicked have happend. > I see that I've messed up toUpper in table generator while introducing > toTitleCase (that isn't even yet exposed!). toLower is fine, toUpper is broken > in half of cases apparently. > How I missed that I've no idea ... gotta expand the test coverage around > toLower/toUpper. P.S. And there are both kinds of sharp s ... \u1E9E and \u00df -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation