foreach thoughts (page 4)

January 14, 2014

Re: foreach thoughts

Posted by Dicebot
in reply to Dicebot

Permalink

Dicebot

Posted in reply to Dicebot

Permalink

Some experimental data.

---- test program ----

auto input()
{
    int[] arr;

    foreach (i; 0..6)
    {
        import std.stdio;
        int x;
        readf(" %d ", &x);
        arr ~= x;
    }

    return arr;
}

int main()
{
    import std.algorithm;

    auto arr = input();
    int sum;

    // version algo
    foreach (elem; arr.filter!( x => x % 2 ))
    {
        sum += elem;
    }

    // version raw
    foreach (elem; arr)
    {
        if (elem % 2)
            sum += elem;
    }

    return sum;
}

------

Disassemblies for latest LDC (ldmd2 -O -release -inline):

$ cat test-ldc-algo.dump
0000000000402eb0 <_Dmain>:
  402eb0:	50                   	push   %rax
  402eb1:	e8 fa fe ff ff       	callq  402db0 <_D4test5inputFZAi>
  402eb6:	48 89 c1             	mov    %rax,%rcx
  402eb9:	31 c0                	xor    %eax,%eax
  402ebb:	48 85 c9             	test   %rcx,%rcx
  402ebe:	74 4d                	je     402f0d <_Dmain+0x5d>
  402ec0:	f6 02 01             	testb  $0x1,(%rdx)
  402ec3:	75 0b                	jne    402ed0 <_Dmain+0x20>
  402ec5:	48 83 c2 04          	add    $0x4,%rdx
  402ec9:	48 ff c9             	dec    %rcx
  402ecc:	75 f2                	jne    402ec0 <_Dmain+0x10>
  402ece:	eb 3d                	jmp    402f0d <_Dmain+0x5d>
  402ed0:	31 c0                	xor    %eax,%eax
  402ed2:	eb 34                	jmp    402f08 <_Dmain+0x58>
  402ed4:	66 66 66 2e 0f 1f 84 	data32 data32 nopw %cs:0x0(%rax,%rax,1)
  402edb:	00 00 00 00 00
  402ee0:	03 02                	add    (%rdx),%eax
  402ee2:	66 66 66 66 66 2e 0f 	data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
  402ee9:	1f 84 00 00 00 00 00
  402ef0:	48 85 c9             	test   %rcx,%rcx
  402ef3:	74 1a                	je     402f0f <_Dmain+0x5f>
  402ef5:	48 83 f9 01          	cmp    $0x1,%rcx
  402ef9:	74 12                	je     402f0d <_Dmain+0x5d>
  402efb:	48 ff c9             	dec    %rcx
  402efe:	f6 42 04 01          	testb  $0x1,0x4(%rdx)
  402f02:	48 8d 52 04          	lea    0x4(%rdx),%rdx
  402f06:	74 e8                	je     402ef0 <_Dmain+0x40>
  402f08:	48 85 c9             	test   %rcx,%rcx
  402f0b:	75 d3                	jne    402ee0 <_Dmain+0x30>
  402f0d:	5a                   	pop    %rdx
  402f0e:	c3                   	retq
  402f0f:	bf e0 0a 68 00       	mov    $0x680ae0,%edi
  402f14:	be 9e 01 00 00       	mov    $0x19e,%esi
  402f19:	e8 c2 c2 02 00       	callq  42f1e0 <_d_array_bounds>
  402f1e:	66 90                	xchg   %ax,%ax

$ cat test-ldc-raw.dump
0000000000402eb0 <_Dmain>:
  402eb0:	50                   	push   %rax
  402eb1:	e8 fa fe ff ff       	callq  402db0 <_D4test5inputFZAi>
  402eb6:	48 89 c1             	mov    %rax,%rcx
  402eb9:	31 c0                	xor    %eax,%eax
  402ebb:	48 85 c9             	test   %rcx,%rcx
  402ebe:	74 17                	je     402ed7 <_Dmain+0x27>
  402ec0:	8b 3a                	mov    (%rdx),%edi
  402ec2:	89 fe                	mov    %edi,%esi
  402ec4:	c1 e6 1f             	shl    $0x1f,%esi
  402ec7:	c1 fe 1f             	sar    $0x1f,%esi
  402eca:	21 fe                	and    %edi,%esi
  402ecc:	01 f0                	add    %esi,%eax
  402ece:	48 83 c2 04          	add    $0x4,%rdx
  402ed2:	48 ff c9             	dec    %rcx
  402ed5:	75 e9                	jne    402ec0 <_Dmain+0x10>
  402ed7:	5a                   	pop    %rdx
  402ed8:	c3                   	retq
  402ed9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)

So it does inline but end result is still less optimal assembly-wise.

On 01/14/2014 04:40 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote: > On Tuesday, 14 January 2014 at 15:34:40 UTC, Dominikus Dittes Scherkl > wrote: >> I think it is. It would be far worse if the semantics were related. > > No. Array sub script and array initialization are both array-like, so no > confusion. Same with "+" for concatenation (if you don't do implicit > casting) and addition, they are both additive-like. Nope. They are both monoid operations. Addition is commutative, and it is conventional in abstract algebra to use "+" to denote commutative operations only. eg. http://en.wikipedia.org/wiki/Abelian_group#Notation . assert("123abc" == "123" + "abc"); hurts my eyes. > The human mind categorize in a fuzzy fashion. >

On Tuesday, 14 January 2014 at 16:32:55 UTC, Timon Gehr wrote: > assert("123abc" == "123" + "abc"); hurts my eyes. Doesn't matter. The human brain does not work with algebra, it's not a logic engine. It is a fuzzy abstraction engine. Besides, in unary notation you get "111" for 3, "11111" for 5 etc, but this is rather off-topic...

On 01/14/2014 05:50 PM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>" wrote: > On Tuesday, 14 January 2014 at 16:32:55 UTC, Timon Gehr wrote: >> assert("123abc" == "123" + "abc"); hurts my eyes. > > Doesn't matter. Thanks.

On 1/14/14 1:04 AM, Manu wrote: > /agree completely. > This is nice, I didn't think of writing statements like that :) > That's precisely the sort of suggestion I was hoping for. I'll continue > like this. I think I died and got to heaven. Andrei

On Tuesday, 14 January 2014 at 09:07:43 UTC, Manu wrote: > Can anyone comment on the codegen when using these statements? Is it > identical to my reference 'if' statement? > Liberal use of loops like this will probably obliterate unoptimised > performance... :/ I can't comment for every compiler, but when using LLVM as backend, here is what to expect. LLVM is really good at inlining. However, it inline mostly bottom-up, which mean that the front of the subrange will be inlined in the outer front, popFront of the subrange will be inlined in the popFront of the outer range, etc... You'll generally have pretty good performance, but 2 bad scenarios can happen. For most ranges, the compiler will find good optimization between front/popFront/empty and friends. But the way this stuff will be inlined, you'll get bigger and bigger front/popFront/empty until they all are inlined in the loop and everything is simplified away by the optimizer. If you wrap enough range into one another, you'll reach a point where the compiler will consider that these method became too big to be good candidate to inline. Things should improve over time as LLVM team is working on a better top down inliner. The second caveat is automatic heap promotion of the closure frame pointer. Right now, optimizers aren't really good at heap to stack promotion in D as they mostly don't understand the runtime (LDC have some progress in that direction, but this is still quite simplistic). That mean that you can end up with unwanted heap allocation. Realistically, it is possible for a compiler to see through any reasonably size range code, but for now, it is kind of lacking in some directions.

On Tuesday, 14 January 2014 at 16:31:00 UTC, Dicebot wrote: > So it does inline but end result is still less optimal assembly-wise. The problem, by the way, seems to be that the program actually contains two loops that LLVM cannot merge: One in the filter() constructor (skipping the initial run of even elements), and the actual foreach loop in main(). David

On Tuesday, 14 January 2014 at 18:38:27 UTC, David Nadlinger wrote: > On Tuesday, 14 January 2014 at 16:31:00 UTC, Dicebot wrote: >> So it does inline but end result is still less optimal assembly-wise. > > The problem, by the way, seems to be that the program actually contains two loops that LLVM cannot merge: One in the filter() constructor (skipping the initial run of even elements), and the actual foreach loop in main(). > > David I wonder if there is some way the language could help with that in the common case. For example, provide an opApply for all non-infinite ranges that can be used with foreach instead of front/empty/popFront. I imagine opApply is much easier to inline and optimise than trying to piece range operations and state back together to see the underlying loop.

On Tuesday, 14 January 2014 at 19:05:22 UTC, Peter Alexander wrote: > On Tuesday, 14 January 2014 at 18:38:27 UTC, David Nadlinger wrote: >> On Tuesday, 14 January 2014 at 16:31:00 UTC, Dicebot wrote: >>> So it does inline but end result is still less optimal assembly-wise. >> >> The problem, by the way, seems to be that the program actually contains two loops that LLVM cannot merge: One in the filter() constructor (skipping the initial run of even elements), and the actual foreach loop in main(). >> >> David > > I wonder if there is some way the language could help with that in the common case. For example, provide an opApply for all non-infinite ranges that can be used with foreach instead of front/empty/popFront. I imagine opApply is much easier to inline and optimise than trying to piece range operations and state back together to see the underlying loop. Actually, you could give an opApply for infinite ranges too. foreach loops can have breaks, and opApply knows how to handle them.

Forums