[Issue 10226] New: core.simd bad codegen - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Issues » [Issue 10226] New: core.simd bad codegen

Thread overview

[Issue 10226] New: core.simd bad codegen
Jun 01, 2013 Benjamin Thaut
Jun 01, 2013 Benjamin Thaut
Jun 03, 2013 Manu
Jun 03, 2013 Benjamin Thaut
Jun 03, 2013 Manu
Jun 03, 2013 bearophile_hugs@eml.cc
Jun 03, 2013 Benjamin Thaut
Jun 03, 2013 Manu
Jun 04, 2013 Benjamin Thaut

June 01, 2013

[Issue 10226] New: core.simd bad codegen

Posted by Benjamin Thaut

Benjamin Thaut

http://d.puremagic.com/issues/show_bug.cgi?id=10226

           Summary: core.simd bad codegen
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: code@benjamin-thaut.de


--- Comment #0 from Benjamin Thaut <code@benjamin-thaut.de> 2013-06-01 03:12:12 PDT ---
The following testcode:

import core.simd;
import std.stdio;

void main(string[] args)
{
    float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f];
    float4 result = __simd(XMM.LODAPS, *cast(float4*)value1.ptr);
    result = __simd(XMM.ADDPS, result, result);
    __simd_sto(XMM.STOAPS, *cast(float4*)value1.ptr, result);
    writefln("%s", value1);
}

Will produce the following assembly

1 mov         rax,qword ptr [rbp-68h] 2 movaps      xmm0,xmmword ptr [rax] 3 movaps      xmmword ptr [rbp-60h],xmm0 4 movdqa      xmm0,xmmword ptr [rbp-60h] 5 addps       xmm0,xmmword ptr [rbp-60h] 6 movaps      xmmword ptr [rbp-60h],xmm0 7 movdqa      xmm0,xmmword ptr [rbp-60h] 8 mov         rax,qword ptr [rbp-68h] 9 movaps      xmmword ptr [rax],xmm0

The instructions 3 and 4 are completely useless, as well as the instructions 6 and 7. Instruction 8 has no effect because RAX already contains that value. Ideally the assembly should look as follows:

1 mov         rax,qword ptr [rbp-68h] 2 movaps      xmm0,xmmword ptr [rax] 5 addps       xmm0,xmmword ptr [rbp-60h] 9 movaps      xmmword ptr [rax],xmm0

This is a huge problem because SIMD does only start to get effective if you stay within the xmm registers as long as possible.

tested with dmd 2.063

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

June 01, 2013

[Issue 10226] core.simd bad codegen

Posted by Benjamin Thaut
in reply to Benjamin Thaut

Benjamin Thaut

Posted in reply to Benjamin Thaut

http://d.puremagic.com/issues/show_bug.cgi?id=10226



--- Comment #1 from Benjamin Thaut <code@benjamin-thaut.de> 2013-06-01 03:13:40 PDT ---
Small correction, ideal assembly should look like this:

1 mov         rax,qword ptr [rbp-68h]
2 movaps      xmm0,xmmword ptr [rax]
5 addps       xmm0,xmm0
9 movaps      xmmword ptr [rax],xmm0

Instruction 5 should use a xmm register as well and not add from memory.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

June 03, 2013

[Issue 10226] core.simd bad codegen

Posted by Manu
in reply to Benjamin Thaut

Manu

Posted in reply to Benjamin Thaut

http://d.puremagic.com/issues/show_bug.cgi?id=10226


Manu <turkeyman@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |turkeyman@gmail.com


--- Comment #2 from Manu <turkeyman@gmail.com> 2013-06-03 03:43:34 PDT ---
(In reply to comment #1)
> Small correction, ideal assembly should look like this:
> 
> 1 mov         rax,qword ptr [rbp-68h]
> 2 movaps      xmm0,xmmword ptr [rax]
> 5 addps       xmm0,xmm0
> 9 movaps      xmmword ptr [rax],xmm0
> 
> Instruction 5 should use a xmm register as well and not add from memory.

Interesting. I haven't scrutinised DMD's codegen as much as GDC yet.
I've been working on the std.simd API mostly against GDC, and once I'm happy
with that, I'll be logging codegen bugs in DMD accordingly.

What do you get if you do:

    float4 result = [1,2,3,4];
    result = __simd(XMM.ADDPS, result, result);
    writefln("%s", result.array);

Why do you need to issue the loads and stores explicitly?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

June 03, 2013

[Issue 10226] core.simd bad codegen

Posted by Benjamin Thaut
in reply to Benjamin Thaut

Benjamin Thaut

Posted in reply to Benjamin Thaut

http://d.puremagic.com/issues/show_bug.cgi?id=10226



--- Comment #3 from Benjamin Thaut <code@benjamin-thaut.de> 2013-06-03 05:37:15 PDT ---
(In reply to comment #2)
> (In reply to comment #1)
> > Small correction, ideal assembly should look like this:
> > 
> > 1 mov         rax,qword ptr [rbp-68h]
> > 2 movaps      xmm0,xmmword ptr [rax]
> > 5 addps       xmm0,xmm0
> > 9 movaps      xmmword ptr [rax],xmm0
> > 
> > Instruction 5 should use a xmm register as well and not add from memory.
> 
> Interesting. I haven't scrutinised DMD's codegen as much as GDC yet.
> I've been working on the std.simd API mostly against GDC, and once I'm happy
> with that, I'll be logging codegen bugs in DMD accordingly.
> 
> What do you get if you do:
> 
>     float4 result = [1,2,3,4];
>     result = __simd(XMM.ADDPS, result, result);
>     writefln("%s", result.array);
> 
> Why do you need to issue the loads and stores explicitly?

On modern processors unaligned loads come at almost no penalty, and it is a lot easier to use unaligned loads within a highly optimized functions compared to making all the data in the whole project 16 byte aligned and use aligned loads.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

June 03, 2013

[Issue 10226] core.simd bad codegen

Posted by Manu
in reply to Benjamin Thaut

Manu

Posted in reply to Benjamin Thaut

http://d.puremagic.com/issues/show_bug.cgi?id=10226



--- Comment #4 from Manu <turkeyman@gmail.com> 2013-06-03 08:42:51 PDT ---
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > Small correction, ideal assembly should look like this:
> > > 
> > > 1 mov         rax,qword ptr [rbp-68h]
> > > 2 movaps      xmm0,xmmword ptr [rax]
> > > 5 addps       xmm0,xmm0
> > > 9 movaps      xmmword ptr [rax],xmm0
> > > 
> > > Instruction 5 should use a xmm register as well and not add from memory.
> > 
> > Interesting. I haven't scrutinised DMD's codegen as much as GDC yet.
> > I've been working on the std.simd API mostly against GDC, and once I'm happy
> > with that, I'll be logging codegen bugs in DMD accordingly.
> > 
> > What do you get if you do:
> > 
> >     float4 result = [1,2,3,4];
> >     result = __simd(XMM.ADDPS, result, result);
> >     writefln("%s", result.array);
> > 
> > Why do you need to issue the loads and stores explicitly?
> 
> On modern processors unaligned loads come at almost no penalty, and it is a lot easier to use unaligned loads within a highly optimized functions compared to making all the data in the whole project 16 byte aligned and use aligned loads.

Umm, What?
This has nothing to do with alignment... where did that come from?
Your issue is the codegen right? You expect the codegen to be what you pasted
below, and I agree. I'm suggesting you remove the explicit load/store and let
the compiler generate them. What do you get using the code I suggest?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

June 03, 2013

[Issue 10226] core.simd bad codegen

Posted by bearophile_hugs@eml.cc
in reply to Benjamin Thaut

bearophile_hugs@eml.cc

Posted in reply to Benjamin Thaut

http://d.puremagic.com/issues/show_bug.cgi?id=10226


bearophile_hugs@eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs@eml.cc


--- Comment #5 from bearophile_hugs@eml.cc 2013-06-03 09:46:24 PDT ---
(In reply to comment #3)

> On modern processors unaligned loads come at almost no penalty, and it is a lot easier to use unaligned loads within a highly optimized functions compared to making all the data in the whole project 16 byte aligned and use aligned loads.

If this is true then is it worth reducing the smallest allocation size of the D garbage collector from 16 bytes to 8 bytes and reduce the alignment to 8 bytes? This saves some memory when you allocate many small objects, like small tree nodes.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

June 03, 2013

[Issue 10226] core.simd bad codegen

Posted by Benjamin Thaut
in reply to Benjamin Thaut

Benjamin Thaut

Posted in reply to Benjamin Thaut

http://d.puremagic.com/issues/show_bug.cgi?id=10226



--- Comment #6 from Benjamin Thaut <code@benjamin-thaut.de> 2013-06-03 10:07:12 PDT ---
Reducing alignment on all allocations to 8 byte is not a good idea. When I said
recent processors I meant i7 ivy bridge. On older processors this still has a
significant performance impact. Initially the manual load and store operation
were unaligned, but as of this bug
http://d.puremagic.com/issues/show_bug.cgi?id=10225
I had to replace them with aligned loads and stores for the time beeing. But
this doesn't help that when the unaligned store gets fixed, the generated code
will be equally bad as when using aligned operations.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

June 03, 2013

[Issue 10226] core.simd bad codegen

Posted by Manu
in reply to Benjamin Thaut

Manu

Posted in reply to Benjamin Thaut

http://d.puremagic.com/issues/show_bug.cgi?id=10226



--- Comment #7 from Manu <turkeyman@gmail.com> 2013-06-03 16:21:54 PDT ---
(In reply to comment #6)
> Reducing alignment on all allocations to 8 byte is not a good idea. When I said
> recent processors I meant i7 ivy bridge. On older processors this still has a
> significant performance impact. Initially the manual load and store operation
> were unaligned, but as of this bug
> http://d.puremagic.com/issues/show_bug.cgi?id=10225
> I had to replace them with aligned loads and stores for the time beeing. But
> this doesn't help that when the unaligned store gets fixed, the generated code
> will be equally bad as when using aligned operations.

Okay, I see. It occurred to me that maybe DMD promoted that movups in your other bug to a movaps because it was able to detect that the source was always aligned? (I have no hard reason to believe that, but just a thought. Walter?)

Yeah it looks kinda like the compiler is treating the mov*ps opcodes like a typical add or something, and the compiler is generating typical register allocation around the opcodes which performs explicit register allocation ;) My guess is the compiler doesn't actually understand what the mov*ps (and friends) opcodes do, and therefore erroneously tries to helpfully configure a register state to execute them ;)

What flags are you building with? If it's building un-optimised or in some debug mode, the extra mov's could be to try and keep stack synchronisation.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

June 04, 2013

[Issue 10226] core.simd bad codegen

Posted by Benjamin Thaut
in reply to Benjamin Thaut

Benjamin Thaut

Posted in reply to Benjamin Thaut

http://d.puremagic.com/issues/show_bug.cgi?id=10226



--- Comment #8 from Benjamin Thaut <code@benjamin-thaut.de> 2013-06-03 22:36:41 PDT ---
Building with debug symbols gives a ICE ;-)

I'm building with -release -O -inline

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation