The reason I ask is because 64 bit is 16 byte aligned, but aligning the stack in 32 bit code is inefficient for everything else.

Note: you only need to align the stack when a vector is actually stored on it by value. Probably very rare, more rare than you think.