Thread overview
[D-runtime] foreach (dchar c; s) is too slow
Sep 30, 2013
Martin Nowak
Oct 01, 2013
Jacob Carlborg
Oct 01, 2013
Jonathan M Davis
Oct 02, 2013
Jacob Carlborg
Oct 02, 2013
Jonathan M Davis
Oct 01, 2013
Jonathan M Davis
Oct 02, 2013
Jacob Carlborg
Oct 02, 2013
Jonathan M Davis
Oct 02, 2013
Sean Kelly
September 30, 2013
The runtime support for string foreach with decoding has some serious
performance issues.
It's currently implemented by creating a delegate for the foreach body
and calling _aApply*.
I recently submitted this pull to vibe.d
https://github.com/rejectedsoftware/vibe.d/pull/327.
Using the explicit for (auto tmp = s; !tmp.empty; tmp.popFront()) { auto
c = tmp.front; /*CODE*/ }
is about 3 times faster.
I'd like to avoid such hidden performance pitfalls. Any ideas how to
transition to templated library code?
For dchar iteration the compiler could simply prefer the range
interface, but that wouldn't work for wchar (or char with w/dstrings).
Is it something that should only be added to phobos, e.g. foreach (c;
s.byChar!dchar())?
Or should this be a combination of compiler and library support, i.e.
when an explicit char type is given the compiler
instantiates a template with that char type.


October 01, 2013
On 30 sep 2013, at 20:47, Martin Nowak <dawg@dawgfoto.de> wrote:

> The runtime support for string foreach with decoding has some serious performance issues.
> It's currently implemented by creating a delegate for the foreach body and calling _aApply*.
> I recently submitted this pull to vibe.d https://github.com/rejectedsoftware/vibe.d/pull/327.
> Using the explicit for (auto tmp = s; !tmp.empty; tmp.popFront()) { auto c = tmp.front; /*CODE*/ }
> is about 3 times faster.
> I'd like to avoid such hidden performance pitfalls. Any ideas how to transition to templated library code?
> For dchar iteration the compiler could simply prefer the range interface, but that wouldn't work for wchar (or char with w/dstrings).
> Is it something that should only be added to phobos, e.g. foreach (c; s.byChar!dchar())?
> Or should this be a combination of compiler and library support, i.e. when an explicit char type is given the compiler
> instantiates a template with that char type.


I don't see why "byChar" should be added. It's better to fix the current implementation. If I recall correctly, also "foreach(c ; s)" is the same as foreach over dchar. Which is an even more hidden pitfall.

-- 
/Jacob Carlborg



October 01, 2013
On Monday, September 30, 2013 20:47:52 Martin Nowak wrote:
> The runtime support for string foreach with decoding has some serious
> performance issues.
> It's currently implemented by creating a delegate for the foreach body
> and calling _aApply*.
> I recently submitted this pull to vibe.d
> https://github.com/rejectedsoftware/vibe.d/pull/327.
> Using the explicit for (auto tmp = s; !tmp.empty; tmp.popFront()) { auto
> c = tmp.front; /*CODE*/ }
> is about 3 times faster.
> I'd like to avoid such hidden performance pitfalls. Any ideas how to
> transition to templated library code?
> For dchar iteration the compiler could simply prefer the range
> interface, but that wouldn't work for wchar (or char with w/dstrings).
> Is it something that should only be added to phobos, e.g. foreach (c;
> s.byChar!dchar())?
> Or should this be a combination of compiler and library support, i.e.
> when an explicit char type is given the compiler
> instantiates a template with that char type.

The range API won't even work on arrays unless std.array has been imported, and in the case of strings, it depends on std.utf. So, using the range interface underneat the hood for

foreach(dchar c; str)

would require Phobos. I'd argue for simply fixing how foreach for strings works so that it doesn't use opApply. Worst case, some druntime-specific functions could be written and used to do it, though that would probably require changes in the compiler. But if you want to use a range of some kind, just create one for each character type, and the fact that Phobos treats strings as ranges of dchar should be irrelevant. It does that by having traits like hasLength and isNarrowString treat them that way. There's no reason why you can't have a range of char or wchar if you want to, and if druntime is wrapping strings in ranges for foreach, then it can trivially create a range type which treats strings as ranges of whatever character type you want just so long as front and popFront do the appropriate conversion. Probably another thing to consider in all this is how to deal with foreach_reverse, but that can probably be dealt with as well.

I don't particularly like the fact that we're forced to duplicate std.utf stuff in druntime in order to handle foreach, but I don't know of any way around that short of moving std.utf (or some portion of it) into druntime. It's pretty much the same problem that we have with the fact that std.traits isn't available in druntime, except that the implementation for decode and stride are a bit more complicated than most traits.

- Jonathan M Davis
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

October 01, 2013
On Tuesday, October 01, 2013 19:47:41 Jacob Carlborg wrote:
> "foreach(c ; s)" is the same as
> foreach over dchar. Which is an even more hidden pitfall.

foreach(c; s)

iterates over the code unit type of the string, which is only dchar if it's an array of dchar. You only get decoding and conversion to other character types if you explicitly give a different character type.

- Jonathan M Davis
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

October 02, 2013

On Oct 01, 2013, at 08:13 PM, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> foreach(c; s)
>
> iterates over the code unit type of the string, which is only dchar if it's an array of dchar. You only get decoding and conversion to other character types if you explicitly give a different character type.

I thought this was changed when it was decided that Phobos should treat strings as ranges of dchars.

--
/Jacob Carlborg

October 02, 2013
On Oct 01, 2013, at 08:13 PM, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> The range API won't even work on arrays unless std.array has been imported, and in the case of strings, it depends on std.utf. So, using the range interface underneat the hood for
>
> foreach(dchar c; str)
>
> would require Phobos. I'd argue for simply fixing how foreach for strings works so that it doesn't use opApply. Worst case, some druntime-specific functions could be written and used to do it, though that would probably require changes in the compiler. But if you want to use a range of some kind, just create one for each character type, and the fact that Phobos treats strings as ranges of dchar should be irrelevant. It does that by having traits like hasLength and isNarrowString treat them that way. There's no reason why you can't have a range of char or wchar if you want to, and if druntime is wrapping strings in ranges for foreach, then it can trivially create a range type which treats strings as ranges of whatever character type you want just so long as front and popFront do the appropriate conversion. Probably another thing to consider in all this is how to deal with foreach_reverse, but that can probably be dealt with as well.
>
> I don't particularly like the fact that we're forced to duplicate std.utf stuff in druntime in order to handle foreach, but I don't know of any way around that short of moving std.utf (or some portion of it) into druntime. It's pretty much the same problem that we have with the fact that std.traits isn't available in druntime, except that the implementation for decode and stride are a bit more complicated than most traits.

Can't we move some of the functionality to druntime? It doesn't need to be duplicated. Move what we can to druntime and publicly import that in Phobos.

--
/Jacob Carlborg

October 02, 2013
On Wednesday, October 02, 2013 07:21:28 Jacob Carlborg wrote:
> On Oct 01, 2013, at 08:13 PM, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> > foreach(c; s)
> > 
> > iterates over the code unit type of the string, which is only dchar if it's an array of dchar. You only get decoding and conversion to other character types if you explicitly give a different character type.
> 
> I thought this was changed when it was decided that Phobos should treat strings as ranges of dchars.

No. Only Phobos treats strings as ranges of dchar. The language doesn't care one whit about ranges save for foreach, and foreach treats all strings the same as any other arrays. It doesn't use the range API for them. So, it's arguably good practice to always give the element type when using foreach on a string.

- Jonathan M Davis
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

October 02, 2013
On Wednesday, October 02, 2013 07:29:07 Jacob Carlborg wrote:
> Can't we move some of the functionality to druntime? It doesn't need to be duplicated. Move what we can to druntime and publicly import that in Phobos.

That's not impossible, but it's not exactly straightforward either. Not only would it require duplicating a number of traits from std.traits and std.range, but the code in std.utf is actually written to operate on ranges of code units so that code that needs to operate at the code unit level can use them to iterate through and decode ranges of code units, whereas all druntime cares about is strings. IIRC, semi-recently monarch dodra ported the current std.utf stuff to druntime, but that involved altering it so that it was string-specific.

It's one of those cases where it would be nice to avoid code duplication but where druntime is missing stuff that it uses. It would be much easier if portions of std.traits and std.range were in druntime, but I'm not sure that we really want to do that. We do keep having to duplicate some of that stuff in druntime though, which is definitely annoying. So, I don't know what the best approach is. It will be very easy to start pulling in too much Phobos stuff into druntime if we're not careful.

- Jonathan M Davis
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

October 02, 2013
On Oct 1, 2013, at 11:13 AM, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> 
> I don't particularly like the fact that we're forced to duplicate std.utf stuff in druntime in order to handle foreach, but I don't know of any way around that short of moving std.utf (or some portion of it) into druntime. It's pretty much the same problem that we have with the fact that std.traits isn't available in druntime, except that the implementation for decode and stride are a bit more complicated than most traits.

Unciode is such an intrinsic part of the language that I think a strong argument could be made for putting basic UTF handling into core.  I've considered it in the past, but never spent the time to sort out a minimal set of functionality and appropriate API.  I agree that duplicating these function inside rt.util is undesirable.
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime