Small Buffer Optimization for string and friends (page 6)

Le 08/04/2012 16:52, Andrei Alexandrescu a écrit : > On 4/8/12 4:54 AM, Manu wrote: >> On 8 April 2012 12:46, Vladimir Panteleev <vladimir@thecybershadow.net >> <mailto:vladimir@thecybershadow.net>> wrote: >> >> On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote: >> >> Walter and I discussed today about using the small string >> optimization in string and other arrays of immutable small objects. >> >> On 64 bit machines, string occupies 16 bytes. We could use the >> first byte as discriminator, which means that all strings under >> 16 chars need no memory allocation at all. >> >> >> Don't use the first byte. Use the last byte. >> >> The last byte is the highest-order byte of the length. Limiting >> arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much >> nicer limitation than making assumptions about the memory layout. >> >> >> What is the plan for 32bit? > > We can experiment with making strings shorter than 8 chars in-situ. The > drawback will be that length will be limited to 29 bits, i.e. 512MB. > > Andrei > > As it is a flag, why not limit the string size to 2GB instead of 512MB ?

Am Tue, 10 Apr 2012 10:50:24 +0200 schrieb Artur Skawina <art.08.09@gmail.com>: > Obviously, yes, but should wait until enough attribute support is in place and not be just a @inline hack. If you refer to the proposed user attributes, they wont change the operation of the compiler. Only your own program code will know how to use them. @inline, @safe, @property, final, nothrow, ... on the other hand are keywords that directly map to flags and hard wired logic in the compiler. Correct me if I'm wrong. -- Marco

On 04/10/12 19:25, Marco Leise wrote: > Am Tue, 10 Apr 2012 10:50:24 +0200 > schrieb Artur Skawina <art.08.09@gmail.com>: > >> Obviously, yes, but should wait until enough attribute support is in place and not be just a @inline hack. > > If you refer to the proposed user attributes, they wont change the operation of the compiler. Only your own program code will know how to use them. @inline, @safe, @property, final, nothrow, ... on the other hand are keywords that directly map to flags and hard wired logic in the compiler. Correct me if I'm wrong. I'm saying that introducing new function attributes like @inline to the language, when there's a real possibility of "generic" attributes being invented in the near future, may not be a good idea. Any generic scheme should also work for @inline and the many other attrs that i've mentioned before - there's no reason to artificially limit the support to *just* user attributes. artur

Am Tue, 10 Apr 2012 20:52:56 +0200 schrieb Artur Skawina <art.08.09@gmail.com>: > I'm saying that introducing new function attributes like @inline to the language, when there's a real possibility of "generic" attributes being invented in the near future, may not be a good idea. Any generic scheme should also work for @inline and the many other attrs that i've mentioned before - there's no reason to artificially limit the support to *just* user attributes. > > artur I had to read up on your older posts again. So you are not expecting compiler hooks that allow to change the language semantics and code gen through user attributes, but a common syntax especially for bundling multiple compiler/user attributes like "@attr(safe, nothrow, userattr(abc), inline, ...) my_attr_alias" in the event that there will be a lot of platform specific and other pragmas/attributes/keywords like in GCC in the future? Then I tend to agree. -- Marco

On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote: > Andrei Have anybody put together code that implements this idea in a library? That is, a small strings up to length 15 bytes unioned with a `string`.

On Sunday, 8 April 2012 at 09:46:28 UTC, Vladimir Panteleev wrote: > On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote: >> Walter and I discussed today about using the small string optimization in string and other arrays of immutable small objects. >> >> On 64 bit machines, string occupies 16 bytes. We could use the first byte as discriminator, which means that all strings under 16 chars need no memory allocation at all. > > Don't use the first byte. Use the last byte. > > The last byte is the highest-order byte of the length. Limiting arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation than making assumptions about the memory layout. If the length has multi purpose it would be even better to reserve more than just one bit. For all practical purpose 48 bits or 56 bits are more than enough to handle all possible lengths. This would liberate 8 or even 16 bits that can be used for other purposes.

April 17, 2018

Re: Small Buffer Optimization for string and friends

Posted by Per Nordlöw
in reply to Andrei Alexandrescu

Permalink

Per Nordlöw

Posted in reply to Andrei Alexandrescu

Permalink

On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:
> Walter and I discussed today about using the small string optimization in string and other arrays of immutable small objects.

I put together SSOString at

https://github.com/nordlow/phobos-next/blob/967eb1088fbfab8be5ccd811b66e7b5171b46acf/src/sso_string.d

that uses small-string-optimization on top of a normal D string (slice).

I'm satisfied with everything excepts that -dip1000 doesn't vorbids `f` from compiling.

I also don't understand why `x[0]` cannot be returned by ref in the function `g`.

Comments are welcome.

Contents of sso_string.d follows:

module sso_string;

/** Small-size-optimized string.
 *
 * Store on the stack if constructed with <= `smallCapacity` number of
 * characters, otherwise on the GC heap.
 */
struct SSOString
{
    private alias E = immutable(char); // immutable element type
    private alias ME = char;           // mutable element type

    pure nothrow:

    /** Construct from `elements`, with potential GC-allocation (iff
     * `elements.length > smallCapacity`).
     */
    this()(scope ME[] elements) @trusted // template-lazy
    {
        if (elements.length <= smallCapacity)
        {
            small.data[0 .. elements.length] = elements;
            small.length = cast(typeof(small.length))(2*elements.length);
        }
        else
        {
            large = elements.idup; // GC-allocate
            raw.length *= 2;  // shift up
            raw.length |= 1;  // tag as large
        }
    }

    @nogc:

    // TODO add @nogc overload to construct from mutable static array <= smallCapacity

    /** Construct from `elements` without any kind of heap allocation.
     */
    this()(immutable(E)[] elements) @trusted // template-lazy
    {
        if (elements.length <= smallCapacity)
        {
            small.data[0 .. elements.length] = elements;
            small.length = cast(typeof(small.length))(2*elements.length);
        }
        else
        {
            large = elements;   // @nogc
            raw.length *= 2;    // shift up
            raw.length |= 1;    // tag as large
        }
    }

    @property size_t length() const @trusted
    {
        if (isLarge)
        {
            return large.length/2; // skip first bit
        }
        else
        {
            return small.length/2; // skip fist bit
        }
    }

    scope ref inout(E) opIndex(size_t index) inout return @trusted
    {
        return opSlice()[index]; // automatic range checking
    }

    scope inout(E)[] opSlice() inout return @trusted
    {
        if (isLarge)
        {
            union RawLarge
            {
                Raw raw;
                Large large;
            }
            RawLarge copy = void;
            copy.large = cast(Large)large;
            copy.raw.length /= 2; // adjust length
            return copy.large;
        }
        else
        {
            return small.data[0 .. small.length/2]; // scoped
        }
    }

    private @property bool isLarge() const @trusted
    {
        return large.length & 1; // first bit discriminates small from large
    }

private:
    struct Raw                  // same memory layout as `E[]`
    {
        size_t length;          // can be bit-fiddled without GC allocation
        E* ptr;
    }

    alias Large = E[];

    enum smallCapacity = Large.sizeof - Small.length.sizeof;
    static assert(smallCapacity > 0, "No room for small elements for E being " ~ E.stringof);
    version(LittleEndian) // see: http://forum.dlang.org/posting/zifyahfohbwavwkwbgmw
    {
        struct Small
        {
            ubyte length;
            E[smallCapacity] data;
        }
    }
    else
    {
        static assert(0, "BigEndian support and test");
    }

    union
    {
        Raw raw;
        Large large;
        Small small;
    }
}

///
@safe pure nothrow @nogc unittest
{
    import container_traits : mustAddGCRange;
    alias S = SSOString;

    static assert(S.sizeof == 2*size_t.sizeof); // two words
    static assert(S.smallCapacity == 15);
    static assert(mustAddGCRange!S); // `Large large.ptr` must be scanned

    auto s0 = S.init;
    assert(s0.length == 0);
    assert(!s0.isLarge);
    assert(s0[] == []);

    const s7 = S("0123456");
    static assert(is(typeof(s7[]) == string));
    assert(!s7.isLarge);
    assert(s7.length == 7);
    assert(s7[] == "0123456");
    // TODO assert(s7[0 .. 4] == "0123");

    const s15 = S("012345678901234");
    static assert(is(typeof(s15[]) == string));
    assert(!s15.isLarge);
    assert(s15.length == 15);
    assert(s15[] == "012345678901234");

    const s16 = S("0123456789abcdef");
    static assert(is(typeof(s16[]) == string));
    assert(s16.isLarge);
    assert(s16.length == 16);
    assert(s16[] == "0123456789abcdef");
    assert(s16[0] == '0');
    assert(s16[10] == 'a');
    assert(s16[15] == 'f');

    // TODO static assert(!__traits(compiles, { auto _ = S((char[]).init); }));

    string f() @safe pure nothrow @nogc
    {
        S x;
        return x[];             // TODO should fail with -dip1000
    }

    // TODO activate
    // ref char g() @safe pure nothrow @nogc
    // {
    //     S x;
    //     return x[0];             // TODO should fail with -dip1000
    // }
}

On 4/8/2012 7:29 AM, Andrei Alexandrescu wrote: > On 4/8/12 1:33 AM, Daniel Murphy wrote: >> - Would generate false pointers > Fair point but we're also moving to precise collection :o). I don't know of a good generic way to do precise collection with unions.

On Wed., 18 Apr. 2018, 1:00 pm Walter Bright via Digitalmars-d, < digitalmars-d@puremagic.com> wrote: > On 4/8/2012 7:29 AM, Andrei Alexandrescu wrote: > > On 4/8/12 1:33 AM, Daniel Murphy wrote: > >> - Would generate false pointers > > Fair point but we're also moving to precise collection :o). > > > I don't know of a good generic way to do precise collection with unions. > I wonder if precise collectors could leverage runtime support for ambiguous cases? opPreciseCollect() which might return an array of pointers contained in T, which would allow runtime logic to determine how the union should be interpreted... Or maybe the function should receive a delegate which the function should call on each embedded pointer. I'm sure some standardised runtime support function can help out in these cases... >

On Wed., 18 Apr. 2018, 8:36 pm Manu, <turkeyman@gmail.com> wrote: > On Wed., 18 Apr. 2018, 1:00 pm Walter Bright via Digitalmars-d, < digitalmars-d@puremagic.com> wrote: > >> On 4/8/2012 7:29 AM, Andrei Alexandrescu wrote: >> > On 4/8/12 1:33 AM, Daniel Murphy wrote: >> >> - Would generate false pointers >> > Fair point but we're also moving to precise collection :o). >> >> >> I don't know of a good generic way to do precise collection with unions. >> > > I wonder if precise collectors could leverage runtime support for ambiguous cases? opPreciseCollect() which might return an array of pointers contained in T, which would allow runtime logic to determine how the union should be interpreted... Or maybe the function should receive a delegate which the function should call on each embedded pointer. > > I'm sure some standardised runtime support function can help out in these cases... > This would be useful too for applications that use bit-packed or encoded/implied pointers... >

Forums