Thread overview
Re: AliasSeq seems to compile slightly faster with static foreach
Jan 05, 2018
Jonathan M Davis
Jan 08, 2018
Jonathan M Davis
Jan 08, 2018
Ali Çehreli
Jan 08, 2018
Jonathan M Davis
Jan 08, 2018
H. S. Teoh
Jan 08, 2018
Jonathan M Davis
January 05, 2018
On Friday, January 05, 2018 06:10:25 Jonathan M Davis via Digitalmars-d wrote:
> There was a recent PR for Phobos where Seb added static to a bunch of foreach's that used AliasSeq. It hadn't actually occurred to me that that was legal (I've basically just been using static foreach where foreach with AliasSeq doesn't work), but it is legal (which I suppose isn't surprising when you think about it; I just hadn't). However, that got me to wondering if such a change was purely aesthetic or whether it might actually have an impact on build times - particularly since running dub test for one of my recent projects keeps taking longer and longer. So, I added static to a bunch of foreach's over AliasSeqs in that project to see if it would have any effect. The result was that dub test went from about 16.5 seconds on my system to about 15.8 seconds - and that's just by adding static to the foreach's over AliasSeqs, not fundamentally changing what any of the code did. That's not a huge speed up, but it's definitely something and far more than I was expecting.
>
> Of course, you have to be careful with such a change, because static foreach doesn't introduce a new scope, and double braces are potentially required where they weren't before, but given that I'd very much like to streamline that test build, adding static to those foreach's was surprisingly worthwhile.
>
> Taking it a step further, I tried switching some of the static foreach's over to using array literals, since they held values rather than types, and that seemed to have minimal impact on the time to run dub test. However, by switching to using std.range.only, it suddenly was taking more like 11.8 seconds. So, with a few small changes, I cut the time to run dub test down by almost a third.

And actually, by tracking down every stray foreach over an AliasSeq which was used with values (and thus could be converted to using static foreach with only or only with iota and lockstep if indices were needed), and the build time is down to about

8.6 seconds.

So, it's now down to a bit over half of what it was originally by switching all of the foreach's over AliasSeq's of values to using static foreach over only's of those same values.

- Jonathan M Davis

January 07, 2018
On Friday, January 05, 2018 06:10:25 Jonathan M Davis via Digitalmars-d wrote:
> There was a recent PR for Phobos where Seb added static to a bunch of foreach's that used AliasSeq. It hadn't actually occurred to me that that was legal (I've basically just been using static foreach where foreach with AliasSeq doesn't work), but it is legal (which I suppose isn't surprising when you think about it; I just hadn't). However, that got me to wondering if such a change was purely aesthetic or whether it might actually have an impact on build times - particularly since running dub test for one of my recent projects keeps taking longer and longer. So, I added static to a bunch of foreach's over AliasSeqs in that project to see if it would have any effect. The result was that dub test went from about 16.5 seconds on my system to about 15.8 seconds - and that's just by adding static to the foreach's over AliasSeqs, not fundamentally changing what any of the code did. That's not a huge speed up, but it's definitely something and far more than I was expecting.
>
> Of course, you have to be careful with such a change, because static foreach doesn't introduce a new scope, and double braces are potentially required where they weren't before, but given that I'd very much like to streamline that test build, adding static to those foreach's was surprisingly worthwhile.
>
> Taking it a step further, I tried switching some of the static foreach's over to using array literals, since they held values rather than types, and that seemed to have minimal impact on the time to run dub test. However, by switching to using std.range.only, it suddenly was taking more like 11.8 seconds. So, with a few small changes, I cut the time to run dub test down by almost a third.

LOL. Okay, I feel like an idiot now. Switching to static foreach resulted in a slight speed-up, and then switching to only resulted in a very large speed-up. However, it turns out that the main reason that I got the speed-up that I did with only was because I screwed up.

In a number of places, I was using only by itself, but in a bunch of places, I was using only with lockstep and iota so that I could have indices, and I screwed up with the iota call. I misremembered how iota worked with a single argument; I thought that iota(0) created an infinite range, whereas it creates a zero-length range. Once I fixed it so that it was iota(size_t.max), it was actually slightly slower than using array literals. So, while I definitely got a speed-up by switching away from using normal foreach with AliasSeq, ultimately, the really large speed-up I got by switching to only was because I was accidentally not even compiling in large sections of test code. :|

So, that explains why only was behaving so much more efficiently than folks like Timon thought that it should.

So, based on what I'm seeing, I'd still recommend using static foreach with either AliasSeq or array literals over using normal foreach with an AliasSeq, but using only is pointless. I was just being dumb in how I used it, and the result was not only a useless speed-up in compilation, but once I wrote more tests using iota incorrectly (as opposed to the ones that were compiled in and working before my changes), I thought that I was properly testing code whereas I wasn't testing anything at all. :|

Oh well, live and learn, I guess.

- Jonathan M Davis

January 08, 2018
On Sun, Jan 07, 2018 at 07:20:44PM -0700, Jonathan M Davis via Digitalmars-d wrote: [...]
> LOL. Okay, I feel like an idiot now. Switching to static foreach resulted in a slight speed-up, and then switching to only resulted in a very large speed-up. However, it turns out that the main reason that I got the speed-up that I did with only was because I screwed up.
> 
> In a number of places, I was using only by itself, but in a bunch of places, I was using only with lockstep and iota so that I could have indices, and I screwed up with the iota call. I misremembered how iota worked with a single argument; I thought that iota(0) created an infinite range, whereas it creates a zero-length range. Once I fixed it so that it was iota(size_t.max), it was actually slightly slower than using array literals.  So, while I definitely got a speed-up by switching away from using normal foreach with AliasSeq, ultimately, the really large speed-up I got by switching to only was because I was accidentally not even compiling in large sections of test code. :|
[...]

This is why sometimes I deliberately modify a unittest to make it fail, just so I'm sure that the code is actually being run. :-P  There's been at least one incident where I wrote a whole bunch of unittests and thought my code was OK because there was no runtime error, only to discover that it was only because I forgot to specify -unittest on the compiler command line. :-/

Or in another incident, the unittest was inside a template, and I had protected it with a static if on a specific combination of template arguments so that it will only be instantiated once, but then it turned out that that one instantiation never actually happened.


T

-- 
Guns don't kill people. Bullets do.
January 08, 2018
On 01/07/2018 06:20 PM, Jonathan M Davis wrote:

> I misremembered how iota worked with a single
> argument; I thought that iota(0) created an infinite range, whereas it
> creates a zero-length range.

I told you so! :)

I remember pointing out that specific inconsistency of iota, I think in the learn forum but the simplicity of n.iota was too attractive for others to agree with me. (I think it was that mythical character, bearophile, who did not agree with me but I can't find that newsgroup discussion now.)

Ali

January 08, 2018
On Monday, January 08, 2018 07:37:01 H. S. Teoh via Digitalmars-d wrote:
> On Sun, Jan 07, 2018 at 07:20:44PM -0700, Jonathan M Davis via Digitalmars-d wrote: [...]
>
> > LOL. Okay, I feel like an idiot now. Switching to static foreach resulted in a slight speed-up, and then switching to only resulted in a very large speed-up. However, it turns out that the main reason that I got the speed-up that I did with only was because I screwed up.
> >
> > In a number of places, I was using only by itself, but in a bunch of places, I was using only with lockstep and iota so that I could have indices, and I screwed up with the iota call. I misremembered how iota worked with a single argument; I thought that iota(0) created an infinite range, whereas it creates a zero-length range. Once I fixed it so that it was iota(size_t.max), it was actually slightly slower than using array literals.  So, while I definitely got a speed-up by switching away from using normal foreach with AliasSeq, ultimately, the really large speed-up I got by switching to only was because I was accidentally not even compiling in large sections of test code. :|
>
> [...]
>
> This is why sometimes I deliberately modify a unittest to make it fail, just so I'm sure that the code is actually being run. :-P  There's been at least one incident where I wrote a whole bunch of unittests and thought my code was OK because there was no runtime error, only to discover that it was only because I forgot to specify -unittest on the compiler command line. :-/
>
> Or in another incident, the unittest was inside a template, and I had protected it with a static if on a specific combination of template arguments so that it will only be instantiated once, but then it turned out that that one instantiation never actually happened.

I'll do stuff like put writelns in unit tests to verify that they're running (the same for normal code too) when I actually suspect that something is wrong, but if I don't suspect anything, then I won't, because I don't have any reason to. In most cases, this sort of thing doesn't happen with a unit test, because they pretty much always have at least one failure when I'm writing them (especially when I'm being thorough with the tests), but here, I had refactored existing tests, so a failure wasn't expected, and it wasn't until I was writing new tests later that it became clear that something was off.

Either way, managing to not even compile in code when using a foreach that's intended to be static is a new one for me. That sort of thing is a lot harder to do with a foreach (static or otherwise) over an AliasSeq, though I suppose that something like std.meta.Filter would make it easy enough.

- Jonathan M Davis

January 08, 2018
On Monday, January 08, 2018 10:40:41 Ali Çehreli via Digitalmars-d wrote:
> On 01/07/2018 06:20 PM, Jonathan M Davis wrote:
>  > I misremembered how iota worked with a single
>  > argument; I thought that iota(0) created an infinite range, whereas it
>  > creates a zero-length range.
>
> I told you so! :)
>
> I remember pointing out that specific inconsistency of iota, I think in the learn forum but the simplicity of n.iota was too attractive for others to agree with me. (I think it was that mythical character, bearophile, who did not agree with me but I can't find that newsgroup discussion now.)

I can see arguments both for having the single argument be the first element and for it being the one before the last, but the ambiguity makes it easy enough to screw it up either way if you're not thinking the same way as whoever picked that behavior, and you either assume which it is or misremember.

I suppose that in practice, there really isn't much difference between
iota(0) and iota(int.max). The range of values isn't actually infinite, so
iota(0) is either going to have to be an infinte range that wraps around
when it hits int.max, or it would just be identical to what iota(int.max) is
now (with some logical trickery, iota(0) could be made to also give the
value int.max, whereas iota(int.max) couldn't, but that would complicate
iota a bit). The main advantage to iota(size_t.max) over iota(0) is that it
forces size_t without a cast, whereas iota(0) will just infer as int. Either
way, by only have one argument, the behavior is non-obvious, and if I
remember it correctly now, it will mostly be because I screwed up so badly
with it. Because the behavior is not obvious, I suspect that plenty of folks
have to look it up (especially if they don't use it often). And clearly, I
should have looked it up. We probably would have been better off not
allowing a single argument version, but what's done is done, I guess. At
least, I'm a lot less likely to misremember how it works now.

- Jonathan M Davis