Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
September 30, 2013 [D-runtime] foreach (dchar c; s) is too slow | ||||
---|---|---|---|---|
| ||||
Attachments:
| The runtime support for string foreach with decoding has some serious performance issues. It's currently implemented by creating a delegate for the foreach body and calling _aApply*. I recently submitted this pull to vibe.d https://github.com/rejectedsoftware/vibe.d/pull/327. Using the explicit for (auto tmp = s; !tmp.empty; tmp.popFront()) { auto c = tmp.front; /*CODE*/ } is about 3 times faster. I'd like to avoid such hidden performance pitfalls. Any ideas how to transition to templated library code? For dchar iteration the compiler could simply prefer the range interface, but that wouldn't work for wchar (or char with w/dstrings). Is it something that should only be added to phobos, e.g. foreach (c; s.byChar!dchar())? Or should this be a combination of compiler and library support, i.e. when an explicit char type is given the compiler instantiates a template with that char type. |
October 01, 2013 Re: [D-runtime] foreach (dchar c; s) is too slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Martin Nowak Attachments:
| On 30 sep 2013, at 20:47, Martin Nowak <dawg@dawgfoto.de> wrote: > The runtime support for string foreach with decoding has some serious performance issues. > It's currently implemented by creating a delegate for the foreach body and calling _aApply*. > I recently submitted this pull to vibe.d https://github.com/rejectedsoftware/vibe.d/pull/327. > Using the explicit for (auto tmp = s; !tmp.empty; tmp.popFront()) { auto c = tmp.front; /*CODE*/ } > is about 3 times faster. > I'd like to avoid such hidden performance pitfalls. Any ideas how to transition to templated library code? > For dchar iteration the compiler could simply prefer the range interface, but that wouldn't work for wchar (or char with w/dstrings). > Is it something that should only be added to phobos, e.g. foreach (c; s.byChar!dchar())? > Or should this be a combination of compiler and library support, i.e. when an explicit char type is given the compiler > instantiates a template with that char type. I don't see why "byChar" should be added. It's better to fix the current implementation. If I recall correctly, also "foreach(c ; s)" is the same as foreach over dchar. Which is an even more hidden pitfall. -- /Jacob Carlborg |
October 01, 2013 Re: [D-runtime] foreach (dchar c; s) is too slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Martin Nowak | On Monday, September 30, 2013 20:47:52 Martin Nowak wrote: > The runtime support for string foreach with decoding has some serious > performance issues. > It's currently implemented by creating a delegate for the foreach body > and calling _aApply*. > I recently submitted this pull to vibe.d > https://github.com/rejectedsoftware/vibe.d/pull/327. > Using the explicit for (auto tmp = s; !tmp.empty; tmp.popFront()) { auto > c = tmp.front; /*CODE*/ } > is about 3 times faster. > I'd like to avoid such hidden performance pitfalls. Any ideas how to > transition to templated library code? > For dchar iteration the compiler could simply prefer the range > interface, but that wouldn't work for wchar (or char with w/dstrings). > Is it something that should only be added to phobos, e.g. foreach (c; > s.byChar!dchar())? > Or should this be a combination of compiler and library support, i.e. > when an explicit char type is given the compiler > instantiates a template with that char type. The range API won't even work on arrays unless std.array has been imported, and in the case of strings, it depends on std.utf. So, using the range interface underneat the hood for foreach(dchar c; str) would require Phobos. I'd argue for simply fixing how foreach for strings works so that it doesn't use opApply. Worst case, some druntime-specific functions could be written and used to do it, though that would probably require changes in the compiler. But if you want to use a range of some kind, just create one for each character type, and the fact that Phobos treats strings as ranges of dchar should be irrelevant. It does that by having traits like hasLength and isNarrowString treat them that way. There's no reason why you can't have a range of char or wchar if you want to, and if druntime is wrapping strings in ranges for foreach, then it can trivially create a range type which treats strings as ranges of whatever character type you want just so long as front and popFront do the appropriate conversion. Probably another thing to consider in all this is how to deal with foreach_reverse, but that can probably be dealt with as well. I don't particularly like the fact that we're forced to duplicate std.utf stuff in druntime in order to handle foreach, but I don't know of any way around that short of moving std.utf (or some portion of it) into druntime. It's pretty much the same problem that we have with the fact that std.traits isn't available in druntime, except that the implementation for decode and stride are a bit more complicated than most traits. - Jonathan M Davis _______________________________________________ D-runtime mailing list D-runtime@puremagic.com http://lists.puremagic.com/mailman/listinfo/d-runtime |
October 01, 2013 Re: [D-runtime] foreach (dchar c; s) is too slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jacob Carlborg | On Tuesday, October 01, 2013 19:47:41 Jacob Carlborg wrote: > "foreach(c ; s)" is the same as > foreach over dchar. Which is an even more hidden pitfall. foreach(c; s) iterates over the code unit type of the string, which is only dchar if it's an array of dchar. You only get decoding and conversion to other character types if you explicitly give a different character type. - Jonathan M Davis _______________________________________________ D-runtime mailing list D-runtime@puremagic.com http://lists.puremagic.com/mailman/listinfo/d-runtime |
October 02, 2013 Re: [D-runtime] foreach (dchar c; s) is too slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis Attachments:
|
On Oct 01, 2013, at 08:13 PM, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> foreach(c; s)
>
> iterates over the code unit type of the string, which is only dchar if it's an array of dchar. You only get decoding and conversion to other character types if you explicitly give a different character type.
I thought this was changed when it was decided that Phobos should treat strings as ranges of dchars.
--
/Jacob Carlborg
|
October 02, 2013 Re: [D-runtime] foreach (dchar c; s) is too slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis Attachments:
| On Oct 01, 2013, at 08:13 PM, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> The range API won't even work on arrays unless std.array has been imported, and in the case of strings, it depends on std.utf. So, using the range interface underneat the hood for
>
> foreach(dchar c; str)
>
> would require Phobos. I'd argue for simply fixing how foreach for strings works so that it doesn't use opApply. Worst case, some druntime-specific functions could be written and used to do it, though that would probably require changes in the compiler. But if you want to use a range of some kind, just create one for each character type, and the fact that Phobos treats strings as ranges of dchar should be irrelevant. It does that by having traits like hasLength and isNarrowString treat them that way. There's no reason why you can't have a range of char or wchar if you want to, and if druntime is wrapping strings in ranges for foreach, then it can trivially create a range type which treats strings as ranges of whatever character type you want just so long as front and popFront do the appropriate conversion. Probably another thing to consider in all this is how to deal with foreach_reverse, but that can probably be dealt with as well.
>
> I don't particularly like the fact that we're forced to duplicate std.utf stuff in druntime in order to handle foreach, but I don't know of any way around that short of moving std.utf (or some portion of it) into druntime. It's pretty much the same problem that we have with the fact that std.traits isn't available in druntime, except that the implementation for decode and stride are a bit more complicated than most traits.
Can't we move some of the functionality to druntime? It doesn't need to be duplicated. Move what we can to druntime and publicly import that in Phobos.
--
/Jacob Carlborg
|
October 02, 2013 Re: [D-runtime] foreach (dchar c; s) is too slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jacob Carlborg | On Wednesday, October 02, 2013 07:21:28 Jacob Carlborg wrote: > On Oct 01, 2013, at 08:13 PM, Jonathan M Davis <jmdavisProg@gmx.com> wrote: > > foreach(c; s) > > > > iterates over the code unit type of the string, which is only dchar if it's an array of dchar. You only get decoding and conversion to other character types if you explicitly give a different character type. > > I thought this was changed when it was decided that Phobos should treat strings as ranges of dchars. No. Only Phobos treats strings as ranges of dchar. The language doesn't care one whit about ranges save for foreach, and foreach treats all strings the same as any other arrays. It doesn't use the range API for them. So, it's arguably good practice to always give the element type when using foreach on a string. - Jonathan M Davis _______________________________________________ D-runtime mailing list D-runtime@puremagic.com http://lists.puremagic.com/mailman/listinfo/d-runtime |
October 02, 2013 Re: [D-runtime] foreach (dchar c; s) is too slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jacob Carlborg | On Wednesday, October 02, 2013 07:29:07 Jacob Carlborg wrote: > Can't we move some of the functionality to druntime? It doesn't need to be duplicated. Move what we can to druntime and publicly import that in Phobos. That's not impossible, but it's not exactly straightforward either. Not only would it require duplicating a number of traits from std.traits and std.range, but the code in std.utf is actually written to operate on ranges of code units so that code that needs to operate at the code unit level can use them to iterate through and decode ranges of code units, whereas all druntime cares about is strings. IIRC, semi-recently monarch dodra ported the current std.utf stuff to druntime, but that involved altering it so that it was string-specific. It's one of those cases where it would be nice to avoid code duplication but where druntime is missing stuff that it uses. It would be much easier if portions of std.traits and std.range were in druntime, but I'm not sure that we really want to do that. We do keep having to duplicate some of that stuff in druntime though, which is definitely annoying. So, I don't know what the best approach is. It will be very easy to start pulling in too much Phobos stuff into druntime if we're not careful. - Jonathan M Davis _______________________________________________ D-runtime mailing list D-runtime@puremagic.com http://lists.puremagic.com/mailman/listinfo/d-runtime |
October 02, 2013 Re: [D-runtime] foreach (dchar c; s) is too slow | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Oct 1, 2013, at 11:13 AM, Jonathan M Davis <jmdavisProg@gmx.com> wrote: > > I don't particularly like the fact that we're forced to duplicate std.utf stuff in druntime in order to handle foreach, but I don't know of any way around that short of moving std.utf (or some portion of it) into druntime. It's pretty much the same problem that we have with the fact that std.traits isn't available in druntime, except that the implementation for decode and stride are a bit more complicated than most traits. Unciode is such an intrinsic part of the language that I think a strong argument could be made for putting basic UTF handling into core. I've considered it in the past, but never spent the time to sort out a minimal set of functionality and appropriate API. I agree that duplicating these function inside rt.util is undesirable. _______________________________________________ D-runtime mailing list D-runtime@puremagic.com http://lists.puremagic.com/mailman/listinfo/d-runtime |
Copyright © 1999-2021 by the D Language Foundation