Thread overview
How to do alligned allocation?
Sep 30, 2022
Quirin Schroll
Sep 30, 2022
mw
Sep 30, 2022
mw
Oct 01, 2022
tsbockman
Oct 01, 2022
tsbockman
Oct 01, 2022
tsbockman
September 30, 2022

When I do new void[](n), is that buffer allocated with an alignment of 1 or what are the guarantees? How can I set an alignment? Also, is the alignment of any type guaranteed to be a power of 2?

September 30, 2022

On Friday, 30 September 2022 at 15:57:22 UTC, Quirin Schroll wrote:

>

When I do new void[](n), is that buffer allocated with an alignment of 1 or what are the guarantees? How can I set an alignment? Also, is the alignment of any type guaranteed to be a power of 2?

https://dlang.org/library/core/stdc/stdlib/aligned_alloc.html

It's the C func, so check C lib doc.

September 30, 2022

On Friday, 30 September 2022 at 16:23:00 UTC, mw wrote:

>

On Friday, 30 September 2022 at 15:57:22 UTC, Quirin Schroll wrote:

>

When I do new void[](n), is that buffer allocated with an alignment of 1 or what are the guarantees? How can I set an alignment? Also, is the alignment of any type guaranteed to be a power of 2?

https://dlang.org/library/core/stdc/stdlib/aligned_alloc.html

It's the C func, so check C lib doc.

and then use emplace on the C-alloc-ed memory.

October 01, 2022

On Friday, 30 September 2022 at 15:57:22 UTC, Quirin Schroll wrote:

>

When I do new void[](n), is that buffer allocated with an alignment of 1 or what are the guarantees?

It is guaranteed an alignment of at least 1 because void.alignof == 1 (and because that is the lowest possible integer alignment). When I last checked, new T guaranteed a minimum alignment of min(T.alignof, 16), meaning that all basic scalar types (int, double, pointers, etc.), and SIMD __vectors up to 128 bits will be correctly aligned, while 256 bit (for example, AVX's __vector(double[4])) and 512 bit (AVX512) types might not be.

Arrays and aggregate types (structs and classes) by default use the maximum alignment required by any of their elements or fields (including hidden fields, like __vptr for classes). This can be overridden manually using the align attribute, which must be applied to the aggregate type as a whole. (Applying align to an individual field does something else.)

>

How can I set an alignment?

If the desired alignment is <= 16, you can specify a type with that .alignof.

However, if you may need higher alignment than the maximum guaranteed to be available from the allocator, or you are not writing strongly typed code to begin with, as implied by your use of void[], you can just align the allocation yourself:

void[] newAligned(const(size_t) alignment)(const(size_t) size) pure @trusted nothrow
    if(1 <= alignment && isPowerOf2(alignment))
{
    enum alignMask = alignment - 1;
    void[] ret = new void[size + alignMask];
    const misalign = (cast(size_t) ret.ptr) & alignMask;
    const offset = (alignment - misalign) & alignMask;
    ret = ret[offset .. offset + size];
    return ret;
}

However, aligning memory outside of the allocator itself like this does waste up to alignment - 1 bytes per allocation, so it's best to use as much of the allocator's internal alignment capability as possible:

import core.bitop : bsr;
import std.math : isPowerOf2;
import std.meta : AliasSeq;

void[] newAligned(const(size_t) alignment)(const(size_t) size) pure @trusted nothrow
    if(1 <= alignment && isPowerOf2(alignment))
{
    alias Aligned = .Aligned!alignment;
    void[] ret = new Aligned.Chunk[(size + Aligned.mask) >> Aligned.chunkShift];
    static if(Aligned.Chunk.alignof == alignment)
        enum size_t offset = 0;
    else {
        const misalign = (cast(size_t) ret.ptr) & Aligned.mask;
        const offset = (alignment - misalign) & Aligned.mask;
    }
    ret = ret[offset .. offset + size];
    return ret;
}
private {
    align(16) struct Chunk16 {
        void[16] data;
    }
    template Aligned(size_t alignment)
        if(1 <= alignment && isPowerOf2(alignment))
    {
        enum int shift = bsr(alignment);
        enum size_t mask = alignment - 1;

        static if(alignment <= 16) {
            enum chunkShift = shift, chunkMask = mask;
            alias Chunk = AliasSeq!(ubyte, ushort, uint, ulong, Chunk16)[shift];
        } else {
            enum chunkShift = Aligned!(16).shift, chunkMask = Aligned!(16).mask;
            alias Chunk = Aligned!(16).Chunk;
        }
    }
}
@safe unittest {
    static immutable(size_t[]) alignments =
        [ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 ];
    static immutable(size_t[]) sizes =
        [ 9, 31, 4, 57, 369, 3358 ];

    foreach(size; sizes) {
        static foreach(alignment; alignments) { {
            void[] memory = newAligned!alignment(size);
            assert(memory.length == size);
            assert((cast(size_t) &(memory[0])) % alignment == 0);
        } }
    }
}
>

Also, is the alignment of any type guaranteed to be a power of 2?

In practice, yes.

On Friday, 30 September 2022 at 16:23:00 UTC, mw wrote:

>

https://dlang.org/library/core/stdc/stdlib/aligned_alloc.html

It's the C func, so check C lib doc.

https://en.cppreference.com/w/c/memory/aligned_alloc

Note that common implementations place arbitrary restrictions on the alignments and sizes accepted by aligned_alloc, so to support the general case you would still need a wrapper function like the one I provided above.

(If this all seems overly complicated, that's because it is. I have no idea why allocators don't just build in the logic above; it's extremely simple compared to the rest of what a good general-purpose heap allocator does.)

October 01, 2022

On Saturday, 1 October 2022 at 00:32:28 UTC, tsbockman wrote:

>
        alias Chunk = AliasSeq!(ubyte, ushort, uint, ulong, Chunk16)[shift];

Oops, I forgot that ulong.alignof is platform dependent. It's probably best to just go ahead and explicitly specify the alignment for all Chunk types:

private template Aligned(size_t alignment)
    if(1 <= alignment && isPowerOf2(alignment))
{
    enum int shift = bsr(alignment);
    enum size_t mask = alignment - 1;

    static if(alignment <= 16) {
        enum chunkShift = shift, chunkMask = mask;
        align(alignment) struct Chunk {
            void[alignment] data;
        }
    } else {
        enum chunkShift = Aligned!(16).shift, chunkMask = Aligned!(16).mask;
        alias Chunk = Aligned!(16).Chunk;
    }
}

(This also eliminates the std.meta : AliasSeq dependency.)

September 30, 2022

On 9/30/22 11:57 AM, Quirin Schroll wrote:

>

When I do new void[](n), is that buffer allocated with an alignment of 1 or what are the guarantees? How can I set an alignment? Also, is the alignment of any type guaranteed to be a power of 2?

In practice, it's not necessarily a power of 2, but it's at least 16 bytes. In general there are very few types (maybe vectors?) that need alignment more than 16 bytes.

The list of bit sizes is currently here: https://github.com/dlang/dmd/blob/82870e890f6f0e0dca3e8f0032a7819416319124/druntime/src/core/internal/gc/impl/conservative/gc.d#L1392-L1414

-Steve

October 01, 2022

On Saturday, 1 October 2022 at 01:37:00 UTC, Steven Schveighoffer wrote:

>

On 9/30/22 11:57 AM, Quirin Schroll wrote:

>

Also, is the alignment of any type guaranteed to be a power of 2?

In practice, it's not necessarily a power of 2, but it's at least 16 bytes.

Types always require some power of 2 alignment (on any sensible platform, anyway), and it is usually less than 16 bytes - typically size_t.sizeof.

The fact that the current GC implementation apparently has a minimum block size of 16 bytes, and that minimum size blocks are always size-aligned, is not guaranteed by the public API and should not be when requesting memory for something that the type system says only requires an alignment of void.alignof == 1.

D and C both have formal ways to communicate alignment requirements to the allocator; people should use them and not constrain all future D GC development to conform to undocumented details of the current implementation.

>

In general there are very few types (maybe vectors?) that need alignment more than 16 bytes.

256 bit SIMD (AVX/AVX2) and 512 bit SIMD (AVX512) __vectors should be .sizeof aligned (32 and 64 bytes, respectively). Memory used for inter-thread communication (such as mutexes) may perform significantly better if cache line aligned (typically 64 bytes, but CPU dependent).

I don't know any other examples off the top of my head.

>

The list of bit sizes is currently here:

I'm pretty sure those are in bytes not bits.

>

https://github.com/dlang/dmd/blob/82870e890f6f0e0dca3e8f0032a7819416319124/druntime/src/core/internal/gc/impl/conservative/gc.d#L1392-L1414

That's not a list of alignments, it is block sizes for some GC memory pools. The alignment of each block depends on the alignment of its pool, not just its size.

It's not immediately obvious from the context, but I suspect the pools are actually page aligned, which would mean that the non power of 2 sized blocks are not consistently aligned to their own sizes.

Regardless, it's not part of the public API, so it could change without warning.

October 01, 2022

On 10/1/22 12:57 AM, tsbockman wrote:

>

On Saturday, 1 October 2022 at 01:37:00 UTC, Steven Schveighoffer wrote:

> >

The list of bit sizes is currently here:

I'm pretty sure those are in bytes not bits.

Yes, I meant bytes, sorry.

>

That's not a list of alignments, it is block sizes for some GC memory pools. The alignment of each block depends on the alignment of its pool, not just its size.

Pools are all page multiples. Each pool is split equally into bin sizes, from that list.

>

Regardless, it's not part of the public API, so it could change without warning.

Hence the "in practice" qualifier. Is it theoretically possible for a GC implementation to use smaller bin sizes, but it will never happen.

Consider that in small bins ( < 1 page ), no two bins are concatenated together. So if you had a bin of size 1, it means you would only allocate one byte blocks, never combining them.

Again, these are all implementation details, that likely will never change.

-Steve