Jump to page: 1 2
Thread overview
[Issue 2278] New: Guarantee alignment of stack-allocated variables on x86
Aug 11, 2008
d-bugmail
Aug 11, 2008
d-bugmail
Aug 12, 2008
Don
Aug 11, 2008
d-bugmail
Aug 12, 2008
Don
Aug 11, 2008
d-bugmail
Aug 12, 2008
d-bugmail
Aug 12, 2008
d-bugmail
Jan 15, 2010
Don
Dec 18, 2010
Nick Voronin
Sep 21, 2011
Benjamin Thaut
May 24, 2012
Manu
Aug 15, 2013
Temtaime
August 11, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2278

           Summary: Guarantee alignment of stack-allocated variables on x86
           Product: D
           Version: 1.034
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: clugdbug@yahoo.com.au


Use of SSE instructions in 32-bit Windows is problematic, since Windows and the
C calling convention only aligns the stack to 4 bytes, not 8.
It's too late for C and C++ to fix this problem. But D still has a chance, with
a simple addition to the ABI...

Insert the following line into the spec:
D functions must be called with a stack aligned to an 8 byte boundary.

And how to implement this:
(1) whenever a D function is called, insert a 'push EBP'/'pop EBP' around it,
if it has an odd-numbered number of (pushed arguments + pushed registers so far
in this function). Note that this applies to invoking a delegate, too.
(EBP is the best register to use, since it's guaranteed to be preserved, and
it's almost certainly been used recently. On Intel CPUs this means it won't
cause a register read stall).
(2) if local variables are created, make sure that the frame allocates an even
number of DWORDs. (Create a unused local int, if necessary).
(3) extern() functions need stack alignment code at the top of them, since they
could be called from other languages, with wrong stack alignment. Here's an
example.
---
void main()
{
    asm {
        naked;
        mov EBP, ESP;
        and ESP, 0xFFFF_FFC0;    // align to a 64 byte boundary.
        call alignedmain;
        mov ESP, EBP;
        ret;
    }
}
---
(4) alloca() also needs to ensure that it allocates an even number of DWORDs.

Note that a clever compiler could play games with the frame pointer to
eliminate the (tiny -- approx 1.5 cycles) overhead of (1) in almost all cases.
(eg, by converting one of the 'push reg's into 'mov [EBP+xx], reg' ).

The important thing to note about this solution (compared to using step(3) everywhere) is that it has lower overhead, and means that the innermost functions, which are most likely to need stack alignment, don't need to manually align it. Also note that when there's an even number of parameters, the overhead is _zero_.


-- 

August 11, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2278





------- Comment #1 from andrei@metalanguage.com  2008-08-11 10:29 -------
This looks like a broad change for a particular case. The particular case is short numeric arrays of constant size (because those get stack-allocated). So why not have the compiler align only those at 8-byte boundaries and leave everything else alone?

Copy semantics for constant-size arrays will certainly help too.


-- 

August 11, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2278





------- Comment #2 from bugzilla@digitalmars.com  2008-08-11 16:38 -------
Keeping the stack always aligned is not that simple. The code generator will also push/pop register pairs when it runs out of them.

Probably the most practical approach is to align static arrays by using the code to AND the ESP register, but this means that there will be two frame pointers for the function. Ug.


-- 

August 11, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2278


shro8822@vandals.uidaho.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |shro8822@vandals.uidaho.edu




------- Comment #3 from shro8822@vandals.uidaho.edu  2008-08-11 17:08 -------

IIRC there is a x86 (enter leave?) that moves the top of the stack in a way that can be undone. If that allows a non literal arguments, a pair of these around the scope would do it.

offset = FP
offset += ENTER_META_DATA.sizeof
offset &= 0x0f
offset -= ENTER_META_DATA.sizeof

enter offset // push offset space and some metadata
..... scope
leave // pop it all off


-- 

August 12, 2008
d-bugmail@puremagic.com wrote:
> http://d.puremagic.com/issues/show_bug.cgi?id=2278
> 
> 
> 
> 
> 
> ------- Comment #2 from bugzilla@digitalmars.com  2008-08-11 16:38 -------
> Keeping the stack always aligned is not that simple. The code generator will
> also push/pop register pairs when it runs out of them.

Yes, that's why I said it needs an extra push if and only if
(pushed arguments + pushed registers so far
in this function) is odd.

Code generator needs a counter which is incremented for every push, and decremented for every pop. This counter should be consulted before generating a function call.

> 
> Probably the most practical approach is to align static arrays by using the
> code to AND the ESP register, but this means that there will be two frame
> pointers for the function. Ug.


August 12, 2008
d-bugmail@puremagic.com wrote:
> http://d.puremagic.com/issues/show_bug.cgi?id=2278
> 
> 
> 
> 
> 
> ------- Comment #1 from andrei@metalanguage.com  2008-08-11 10:29 -------
> This looks like a broad change for a particular case. The particular case is
> short numeric arrays of constant size (because those get stack-allocated). So
> why not have the compiler align only those at 8-byte boundaries and leave
> everything else alone?

Yes, it would be possible to align only those functions which use arrays, or large structures. But, note that
(a) it's relevant for _any_ usage of SSE instructions, not just array operations.  Many C++ compilers are using SSE in place of general-purpose registers.
(b) It also makes a big difference to the speed of memcpy/memmove, even when no vector instructions are used. In some cases, it also speeds up floating point operations on 'real' operands; and
(c) as Walter notes, the procedure for aligning a stack frame is quite clumsy.
(d) if you want pass-by-value for constant-size arrays, you need to align them, too, and that is only possible by doing this kind of padding of the stack

> Copy semantics for constant-size arrays will certainly help too.

Yes.

======
A quote from Agner Fog's assembly programming manual:
---
All 64-bit operating systems, and some 32-bit operating systems (Mac OS and later
versions of Linux) keep the stack aligned by 16 at all CALL instructions. This eliminates the
need for the AND instruction and the frame pointer. It is necessary to propagate this
alignment from one CALL instruction to the next by proper adjustment of the stack pointer in
each function.
---

It's really a much nicer solution than multiple frame pointers.
August 12, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2278





------- Comment #4 from davidl@126.com  2008-08-12 12:06 -------
enter & leave just simple sugar for pushing and popping ebp or whatever.
if you can do it by enter & leave , you can do it simply by replacing it with
pushing & popping ebp.

align(8) void func()  // make sure the stack align to 8
{
}

void func(){} // align to 4 , this might be useful to cut the use of the stack.

align to 8 for all might result a lot stack memory unused(but i'm not sure
about this).

with instructions mentioned by W, it should be a fair enough trade-off of runtime efficiency & stack memory usage.


-- 

August 12, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2278





------- Comment #5 from bugzilla@digitalmars.com  2008-08-12 18:21 -------
The problem with entering a function and then aligning the stack is that the code in the function can no longer access the function parameters with a known offset.

Probably the best approach to this is to do the equivalent to alloca() - allocate the aligned data on the stack separately, and store a pointer to it in the regular stack frame. The compiler can sugar over all this.


-- 

January 15, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=2278


Don <clugdbug@yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |baryluk@smp.if.uj.edu.pl


--- Comment #6 from Don <clugdbug@yahoo.com.au> 2010-01-15 04:51:12 PST ---
*** Issue 1847 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
December 18, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=2278



--- Comment #7 from Nick Voronin <elfy.nv@gmail.com> 2010-12-17 18:10:27 PST ---
In D2 on entering main() stack may or may not be aligned to 8 bytes depending on length of command line with which program was ran. This may cause as much as x2 difference with no apparent reason for it. (Lack of alignment is a pity, but this particular case is plainly confusing).

Example. Run with different command lines, for example with and without extension.

import core.stdc.stdio: printf;
import std.date: getUTCtime, ticksPerSecond;
void main() {
    double d = 0.0;
    auto t0 = getUTCtime();
    for (size_t i = 0; i < 100_000_000; i++)
        d += 1;
    auto t1 = getUTCtime();
    printf("%lf\n", d);
    printf("%u\n", (cast(size_t)&d) % 8);
    printf("%lf\n", (cast(double)t1 - cast(double)t0) / ticksPerSecond);
}

Also this code shows that inside a frame variables are placed as if stack alignment was expected. (note that a & d are either both aligned on 8 or both unaligned)

import core.stdc.stdio: printf;

void main() {
    int a;
    double d;
    printf("%X:%u %X:%u\n", &a, (cast(size_t)&a) % 8, &d, (cast(size_t)&d) %
8);
}

Also +1 for some way to have locals aligned, be it explicit align(n) before
declaration of var, or before function (I like this one), or throughout whole
program.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2