Thread overview
foreach - premature optimization vs cultivating good habits
Jan 30, 2015
Laeeth Isharc
Jan 30, 2015
BBaz
Jan 30, 2015
Adam D. Ruppe
Jan 30, 2015
Laeeth Isharc
Feb 01, 2015
Laeeth Isharc
Jan 30, 2015
John Colvin
January 30, 2015
Hi.

The standard advice is not to worry about memory usage and execution speed until profiling shows you where the problem is, and I respect Knuth greatly as a thinker.

Still, one may learn from others' experience and cultivate good habits early.  To say that one should not prematurely optimize is not to say that one should not try to avoid cases that tend to be really bad, and I would rather learn from others what these are then learn only the hard way.

For the time being I am still at early development stage and have not yet tested things with the larger data sets I anticipate eventually using.  It's cheap to make slightly different design decisions early, but much more painful further down the line, particularly given my context.

As I understand it, foreach allocates when a simple C-style for using an array index would not.  I would like to learn more about when this turns particularly expensive, and perhaps I could put this up on the wiki if people think it is a good idea.

What exactly does it allocate, and how often, and how large is this in relation to the size of the underlying data (structs/classes/ranges)?  Are there any cache effects to consider?  Happy to go to the source code if you can give me some pointers.

Thanks in advice for any thoughts.



Laeeth.
January 30, 2015
On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
> Hi.
>
> The standard advice is not to worry about memory usage and execution speed until profiling shows you where the problem is, and I respect Knuth greatly as a thinker.
>
> Still, one may learn from others' experience and cultivate good habits early.  To say that one should not prematurely optimize is not to say that one should not try to avoid cases that tend to be really bad, and I would rather learn from others what these are then learn only the hard way.
>
> For the time being I am still at early development stage and have not yet tested things with the larger data sets I anticipate eventually using.  It's cheap to make slightly different design decisions early, but much more painful further down the line, particularly given my context.
>
> As I understand it, foreach allocates when a simple C-style for using an array index would not.  I would like to learn more about when this turns particularly expensive, and perhaps I could put this up on the wiki if people think it is a good idea.
>
> What exactly does it allocate, and how often, and how large is this in relation to the size of the underlying data (structs/classes/ranges)?  Are there any cache effects to consider?  Happy to go to the source code if you can give me some pointers.
>
> Thanks in advice for any thoughts.
>
>
>
> Laeeth.

foreach() is good for linked list because it uses a delegate so it avoids to cross the items from 0 to i at each iteration.

for() is good for arrays because of the pointer arithmetic.

But the D way for foreach'es is to use ranges: popFront etc, although i often feel too lazy to use them...

You'll probably get more technical answers...

January 30, 2015
On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
> As I understand it, foreach allocates when a simple C-style for using an array index would not.

foreach is just syntax sugar over a for loop. If there's any allocations, it is because your code had some, it isn't inherit to the loop. The doc definition even lists the translation of foreach to for in the case of ranges explicitly:

http://dlang.org/statement.html#ForeachStatement


The most likely allocation would be to a user-defined opApply delegate, and you can prevent that by making it opApply(scope your_delegate) - the scope word prevents any closure allocation.

January 30, 2015
On Friday, 30 January 2015 at 12:55:20 UTC, Adam D. Ruppe wrote:
> On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
>> As I understand it, foreach allocates when a simple C-style for using an array index would not.
>
> foreach is just syntax sugar over a for loop. If there's any allocations, it is because your code had some, it isn't inherit to the loop. The doc definition even lists the translation of foreach to for in the case of ranges explicitly:
>
> http://dlang.org/statement.html#ForeachStatement
>
>
> The most likely allocation would be to a user-defined opApply delegate, and you can prevent that by making it opApply(scope your_delegate) - the scope word prevents any closure allocation.

Thanks, Adam.  That's what I had thought (your first paragraph), but something Ola on a different thread confused me and made me think I didn't understand it, and I wanted to pin it down.

The second paragraph is very helpful - appreciate it.


Laeeth.
January 30, 2015
On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
> Hi.
>
> The standard advice is not to worry about memory usage and execution speed until profiling shows you where the problem is, and I respect Knuth greatly as a thinker.
>
> Still, one may learn from others' experience and cultivate good habits early.  To say that one should not prematurely optimize is not to say that one should not try to avoid cases that tend to be really bad, and I would rather learn from others what these are then learn only the hard way.
>
> For the time being I am still at early development stage and have not yet tested things with the larger data sets I anticipate eventually using.  It's cheap to make slightly different design decisions early, but much more painful further down the line, particularly given my context.
>
> As I understand it, foreach allocates when a simple C-style for using an array index would not.  I would like to learn more about when this turns particularly expensive, and perhaps I could put this up on the wiki if people think it is a good idea.
>
> What exactly does it allocate, and how often, and how large is this in relation to the size of the underlying data (structs/classes/ranges)?  Are there any cache effects to consider?  Happy to go to the source code if you can give me some pointers.
>
> Thanks in advice for any thoughts.
>
>
>
> Laeeth.

I think you're thinking of java's foreach syntax, which IIRC does cause allocations.
January 31, 2015
On Friday, 30 January 2015 at 14:41:11 UTC, Laeeth Isharc wrote:
> Thanks, Adam.  That's what I had thought (your first paragraph), but something Ola on a different thread confused me and made me think I didn't understand it, and I wanted to pin it down.

There is always significant optimization effects in long running loops:
- SIMD
- cache locality / prefetching

For the former (SIMD) you need to make sure that good code is generated either by hand, by using vectorized libraries or by auto vectorization.

For the latter (cache) you need to make sure that the prefetcher is able to predict or is being told to prefetch explicitly and also that the working set is small enough to stay at the faster cache levels.

If you want good performance you cannot ignore any of these, and you have to design the data structures and algorithms for it. Prefetching has to happen maybe 100 instructions before the actual load from memory and AVX requires byte alignment and a layout that fits the algorithm. On next gen Xeon Skylake I think the alignment might go up to 64 byte and you have 512 bits wide registers (so you can do 8 64 bit floating point operations in parallel per core). The difference between issuing 1-4 ops and issuing 8-16 per time unit is noticable...

An of course, the closer your code is to theoretical throughput in the CPU, the more critical it becomes to not wait for memory loads.

This is also a moving target...
February 01, 2015
Thank you Adam, Bbaz and Ola for the helpful thoughts.  I dumped them in a wiki page off the sandbox but needs editing and refining.