Thread overview
Class instance alignment
Feb 19, 2021
tsbockman
Feb 20, 2021
kinke
Feb 20, 2021
tsbockman
Feb 20, 2021
rikki cattermole
Feb 20, 2021
kinke
Feb 22, 2021
kinke
Feb 23, 2021
tsbockman
Feb 23, 2021
tsbockman
February 19, 2021
How can I get the alignment of a class instance? I know how to get the size:
    __traits(classInstanceSize, T)
But, there doesn't appear to be any equivalent trait for the alignment.

(Knowing the alignment of a class instance is required to correctly use core.lifetime.emplace, or do any sort of manual allocation of class instances. I know that I can conservatively estimate the alignment as nextPow2(__traits(classInstanceSize, T)), but that is nearly always a wasteful over-estimate.

I've come across others' code online that simply assumes the alignment is size_t.alignof, but that's not right either because a class instance may contain SIMD vectors with higher alignments, or it may be aligned to a cache line size for efficient multi-threaded access.)
February 20, 2021
On Friday, 19 February 2021 at 23:53:55 UTC, tsbockman wrote:
> How can I get the alignment of a class instance? I know how to get the size:
>     __traits(classInstanceSize, T)
> But, there doesn't appear to be any equivalent trait for the alignment.

There's https://github.com/dlang/druntime/blob/728f1d9c3b7a37eba4d59ee2637fb924053cba6d/src/core/internal/traits.d#L261. But AFAIK, the GC only guarantees an alignment of 16 and doesn't respect any overaligned members or alignment spec for the class.
February 20, 2021
On Saturday, 20 February 2021 at 05:44:33 UTC, kinke wrote:
> There's https://github.com/dlang/druntime/blob/728f1d9c3b7a37eba4d59ee2637fb924053cba6d/src/core/internal/traits.d#L261.

Thanks! That's helpful.

> But AFAIK, the GC only guarantees an alignment of 16 and doesn't respect any overaligned members or alignment spec for the class.

Well, that's just another reason not to use the GC for my current project, then: I'm using 256-bit AVX vectors extensively.

That alignment limit really *needs* to be raised to at least 32 bytes, given that even DMD has some support for AVX. 64 bytes would be better, since AVX512 is going mainstream soon. And, 128 bytes is the largest common cache line size, I think?

If raising the limit is considered unacceptable for some reason, then trying to allocate something with an unsupported alignment should be an error instead of just silently doing the wrong thing.
February 20, 2021
On 20/02/2021 8:13 PM, tsbockman wrote:
> Well, that's just another reason not to use the GC for my current project, then: I'm using 256-bit AVX vectors extensively.

You can still use the GC.

You just can't use it to allocate the classes you care about.

https://dlang.org/phobos/core_memory.html#.GC.addRange
February 20, 2021
On 2/20/21 2:13 AM, tsbockman wrote:
> On Saturday, 20 February 2021 at 05:44:33 UTC, kinke wrote:
>> There's https://github.com/dlang/druntime/blob/728f1d9c3b7a37eba4d59ee2637fb924053cba6d/src/core/internal/traits.d#L261. 
>>
> 
> Thanks! That's helpful.
> 
>> But AFAIK, the GC only guarantees an alignment of 16 and doesn't respect any overaligned members or alignment spec for the class.
> 
> Well, that's just another reason not to use the GC for my current project, then: I'm using 256-bit AVX vectors extensively.
> 
> That alignment limit really *needs* to be raised to at least 32 bytes, given that even DMD has some support for AVX. 64 bytes would be better, since AVX512 is going mainstream soon. And, 128 bytes is the largest common cache line size, I think?

The GC should align anything over 16 bytes to 32 bytes (at least).

Last I checked*, the GC uses pools of 16-byte, 32-byte, 64-byte, etc blocks. And you don't have mixed allocations in those pools, e.g. a block is ALL 16-byte blocks, or ALL 32-byte blocks.

If you specify an alignment of a field in your class, I would expect the compiler to obey the layout. Which means, your class should be over 32-bytes in size since it has to pad it up to the end. This would align it to 32-bytes (or more) naturally.

What is the offset of your aligned member in the class? i.e. pragma(msg, Class.member.offsetof)

1. if classInstanceSize is >= 32, I presume it will always be 32-byte aligned on the GC (not sure about stack alignment for scope instances)
2. If the offsetof of your member is not a multiple of 32, then you might have problems.

-Steve

*Note, this was a long time ago I had anything to do with the GC, so things may have changed.
February 20, 2021
On Saturday, 20 February 2021 at 18:43:53 UTC, Steven Schveighoffer wrote:
> Last I checked*, the GC uses pools of 16-byte, 32-byte, 64-byte, etc blocks.

That has changed [to reduce wastage]; the new bin sizes are here and include sizes like 176 (11*16): https://github.com/dlang/druntime/blob/728f1d9c3b7a37eba4d59ee2637fb924053cba6d/src/core/internal/gc/impl/conservative/gc.d#L1166

> (not sure about stack alignment for scope instances)

This works with LDC at least. E.g., this:

class C
{
    align(64) int[2] data;
}

void foo()
{
    scope c = new C();
}

allocates 72 bytes aligned at a 64-bytes stack boundary. 72 bytes? :) Yes - vptr, monitor, then 48 padding bytes (for 64-bit target...), then 8 `data` bytes with .offsetof of 64. [And classes don't need tail padding, as you can't allocate arrays of class *instances* directly in the language.]

Structs are generally better suited for alignment purposes, but the same GC limitations apply when allocating them on the heap.
February 21, 2021
On 2/20/21 6:39 PM, kinke wrote:
> On Saturday, 20 February 2021 at 18:43:53 UTC, Steven Schveighoffer wrote:
>> Last I checked*, the GC uses pools of 16-byte, 32-byte, 64-byte, etc blocks.
> 
> That has changed [to reduce wastage]; the new bin sizes are here and include sizes like 176 (11*16): https://github.com/dlang/druntime/blob/728f1d9c3b7a37eba4d59ee2637fb924053cba6d/src/core/internal/gc/impl/conservative/gc.d#L1166 

Hm... but does TypeInfo detail alignment? If so, we can make this work anyway, just bump up the size needed to a power-of-2 pool.

I wasn't aware of the changes to the pool sizes...

-Steve
February 22, 2021
On Monday, 22 February 2021 at 02:23:27 UTC, Steven Schveighoffer wrote:
> Hm... but does TypeInfo detail alignment?

Apparently not for TypeInfo_Class; .talign() returns the alignment of a class *ref*, i.e., pointer size. TypeInfo_Struct.talign() does return the struct alignment though and could be used to select a larger bin size.
February 23, 2021
On Monday, 22 February 2021 at 02:23:27 UTC, Steven Schveighoffer wrote:
> Hm... but does TypeInfo detail alignment? If so, we can make this work anyway, just bump up the size needed to a power-of-2 pool.

It doesn't even need to be a power-of-2, assuming the pools themselves are properly aligned - just a multiple of the alignment:

size_t alignedSize(size_t typeSize, size_t typeAlignment) pure @safe nothrow @nogc {
    version(assert) {
        import core.bitop : bsr;
        assert(typeAlignment == (size_t(1) << bsr(typeAlignment)));
    }
    size_t ret = typeSize & ~(typeAlignment - 1);
    ret += (ret < typeSize)? typeAlignment : 0;
    return ret;
}

(This CTFE-able and can be memoized with a template, if desired. It's also just a few branchless instructions at runtime, if it's needed then for some reason.)
February 23, 2021
On Tuesday, 23 February 2021 at 03:53:00 UTC, tsbockman wrote:
> size_t alignedSize(size_t typeSize, size_t typeAlignment) pure @safe nothrow @nogc {
>     version(assert) {
>         import core.bitop : bsr;
>         assert(typeAlignment == (size_t(1) << bsr(typeAlignment)));
>     }
>     size_t ret = typeSize & ~(typeAlignment - 1);
>     ret += (ret < typeSize)? typeAlignment : 0;
>     return ret;
> }

Better:

size_t alignedSize(size_t typeSize, size_t typeAlignment) pure @safe nothrow @nogc {
    version(assert) {
        import core.bitop : bsr;
        assert(typeAlignment == (size_t(1) << bsr(typeAlignment)));
    }
    const alignMask = typeAlignment - 1;
    return (typeSize + alignMask) & ~alignMask;
}