April 10, 2012 Re: Small Buffer Optimization for string and friends | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Le 08/04/2012 16:52, Andrei Alexandrescu a écrit :
> On 4/8/12 4:54 AM, Manu wrote:
>> On 8 April 2012 12:46, Vladimir Panteleev <vladimir@thecybershadow.net
>> <mailto:vladimir@thecybershadow.net>> wrote:
>>
>> On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:
>>
>> Walter and I discussed today about using the small string
>> optimization in string and other arrays of immutable small objects.
>>
>> On 64 bit machines, string occupies 16 bytes. We could use the
>> first byte as discriminator, which means that all strings under
>> 16 chars need no memory allocation at all.
>>
>>
>> Don't use the first byte. Use the last byte.
>>
>> The last byte is the highest-order byte of the length. Limiting
>> arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much
>> nicer limitation than making assumptions about the memory layout.
>>
>>
>> What is the plan for 32bit?
>
> We can experiment with making strings shorter than 8 chars in-situ. The
> drawback will be that length will be limited to 29 bits, i.e. 512MB.
>
> Andrei
>
>
As it is a flag, why not limit the string size to 2GB instead of 512MB ?
|
April 10, 2012 Re: Small Buffer Optimization for string and friends | ||||
---|---|---|---|---|
| ||||
Posted in reply to Artur Skawina | Am Tue, 10 Apr 2012 10:50:24 +0200 schrieb Artur Skawina <art.08.09@gmail.com>: > Obviously, yes, but should wait until enough attribute support is in place and not be just a @inline hack. If you refer to the proposed user attributes, they wont change the operation of the compiler. Only your own program code will know how to use them. @inline, @safe, @property, final, nothrow, ... on the other hand are keywords that directly map to flags and hard wired logic in the compiler. Correct me if I'm wrong. -- Marco |
April 10, 2012 Re: Small Buffer Optimization for string and friends | ||||
---|---|---|---|---|
| ||||
Posted in reply to Marco Leise | On 04/10/12 19:25, Marco Leise wrote:
> Am Tue, 10 Apr 2012 10:50:24 +0200
> schrieb Artur Skawina <art.08.09@gmail.com>:
>
>> Obviously, yes, but should wait until enough attribute support is in place and not be just a @inline hack.
>
> If you refer to the proposed user attributes, they wont change the operation of the compiler. Only your own program code will know how to use them. @inline, @safe, @property, final, nothrow, ... on the other hand are keywords that directly map to flags and hard wired logic in the compiler. Correct me if I'm wrong.
I'm saying that introducing new function attributes like @inline to the language, when there's a real possibility of "generic" attributes being invented in the near future, may not be a good idea. Any generic scheme should also work for @inline and the many other attrs that i've mentioned before - there's no reason to artificially limit the support to *just* user attributes.
artur
|
April 10, 2012 Re: Small Buffer Optimization for string and friends | ||||
---|---|---|---|---|
| ||||
Posted in reply to Artur Skawina | Am Tue, 10 Apr 2012 20:52:56 +0200 schrieb Artur Skawina <art.08.09@gmail.com>: > I'm saying that introducing new function attributes like @inline to the language, when there's a real possibility of "generic" attributes being invented in the near future, may not be a good idea. Any generic scheme should also work for @inline and the many other attrs that i've mentioned before - there's no reason to artificially limit the support to *just* user attributes. > > artur I had to read up on your older posts again. So you are not expecting compiler hooks that allow to change the language semantics and code gen through user attributes, but a common syntax especially for bundling multiple compiler/user attributes like "@attr(safe, nothrow, userattr(abc), inline, ...) my_attr_alias" in the event that there will be a lot of platform specific and other pragmas/attributes/keywords like in GCC in the future? Then I tend to agree. -- Marco |
April 16, 2018 Re: Small Buffer Optimization for string and friends | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:
> Andrei
Have anybody put together code that implements this idea in a library?
That is, a small strings up to length 15 bytes unioned with a `string`.
|
April 16, 2018 Re: Small Buffer Optimization for string and friends | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | On Sunday, 8 April 2012 at 09:46:28 UTC, Vladimir Panteleev wrote:
> On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:
>> Walter and I discussed today about using the small string optimization in string and other arrays of immutable small objects.
>>
>> On 64 bit machines, string occupies 16 bytes. We could use the first byte as discriminator, which means that all strings under 16 chars need no memory allocation at all.
>
> Don't use the first byte. Use the last byte.
>
> The last byte is the highest-order byte of the length. Limiting arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation than making assumptions about the memory layout.
If the length has multi purpose it would be even better to reserve more than just one bit. For all practical purpose 48 bits or 56 bits are more than enough to handle all possible lengths. This would liberate 8 or even 16 bits that can be used for other purposes.
|
April 17, 2018 Re: Small Buffer Optimization for string and friends | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote: > Walter and I discussed today about using the small string optimization in string and other arrays of immutable small objects. I put together SSOString at https://github.com/nordlow/phobos-next/blob/967eb1088fbfab8be5ccd811b66e7b5171b46acf/src/sso_string.d that uses small-string-optimization on top of a normal D string (slice). I'm satisfied with everything excepts that -dip1000 doesn't vorbids `f` from compiling. I also don't understand why `x[0]` cannot be returned by ref in the function `g`. Comments are welcome. Contents of sso_string.d follows: module sso_string; /** Small-size-optimized string. * * Store on the stack if constructed with <= `smallCapacity` number of * characters, otherwise on the GC heap. */ struct SSOString { private alias E = immutable(char); // immutable element type private alias ME = char; // mutable element type pure nothrow: /** Construct from `elements`, with potential GC-allocation (iff * `elements.length > smallCapacity`). */ this()(scope ME[] elements) @trusted // template-lazy { if (elements.length <= smallCapacity) { small.data[0 .. elements.length] = elements; small.length = cast(typeof(small.length))(2*elements.length); } else { large = elements.idup; // GC-allocate raw.length *= 2; // shift up raw.length |= 1; // tag as large } } @nogc: // TODO add @nogc overload to construct from mutable static array <= smallCapacity /** Construct from `elements` without any kind of heap allocation. */ this()(immutable(E)[] elements) @trusted // template-lazy { if (elements.length <= smallCapacity) { small.data[0 .. elements.length] = elements; small.length = cast(typeof(small.length))(2*elements.length); } else { large = elements; // @nogc raw.length *= 2; // shift up raw.length |= 1; // tag as large } } @property size_t length() const @trusted { if (isLarge) { return large.length/2; // skip first bit } else { return small.length/2; // skip fist bit } } scope ref inout(E) opIndex(size_t index) inout return @trusted { return opSlice()[index]; // automatic range checking } scope inout(E)[] opSlice() inout return @trusted { if (isLarge) { union RawLarge { Raw raw; Large large; } RawLarge copy = void; copy.large = cast(Large)large; copy.raw.length /= 2; // adjust length return copy.large; } else { return small.data[0 .. small.length/2]; // scoped } } private @property bool isLarge() const @trusted { return large.length & 1; // first bit discriminates small from large } private: struct Raw // same memory layout as `E[]` { size_t length; // can be bit-fiddled without GC allocation E* ptr; } alias Large = E[]; enum smallCapacity = Large.sizeof - Small.length.sizeof; static assert(smallCapacity > 0, "No room for small elements for E being " ~ E.stringof); version(LittleEndian) // see: http://forum.dlang.org/posting/zifyahfohbwavwkwbgmw { struct Small { ubyte length; E[smallCapacity] data; } } else { static assert(0, "BigEndian support and test"); } union { Raw raw; Large large; Small small; } } /// @safe pure nothrow @nogc unittest { import container_traits : mustAddGCRange; alias S = SSOString; static assert(S.sizeof == 2*size_t.sizeof); // two words static assert(S.smallCapacity == 15); static assert(mustAddGCRange!S); // `Large large.ptr` must be scanned auto s0 = S.init; assert(s0.length == 0); assert(!s0.isLarge); assert(s0[] == []); const s7 = S("0123456"); static assert(is(typeof(s7[]) == string)); assert(!s7.isLarge); assert(s7.length == 7); assert(s7[] == "0123456"); // TODO assert(s7[0 .. 4] == "0123"); const s15 = S("012345678901234"); static assert(is(typeof(s15[]) == string)); assert(!s15.isLarge); assert(s15.length == 15); assert(s15[] == "012345678901234"); const s16 = S("0123456789abcdef"); static assert(is(typeof(s16[]) == string)); assert(s16.isLarge); assert(s16.length == 16); assert(s16[] == "0123456789abcdef"); assert(s16[0] == '0'); assert(s16[10] == 'a'); assert(s16[15] == 'f'); // TODO static assert(!__traits(compiles, { auto _ = S((char[]).init); })); string f() @safe pure nothrow @nogc { S x; return x[]; // TODO should fail with -dip1000 } // TODO activate // ref char g() @safe pure nothrow @nogc // { // S x; // return x[0]; // TODO should fail with -dip1000 // } } |
April 18, 2018 Re: Small Buffer Optimization for string and friends | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 4/8/2012 7:29 AM, Andrei Alexandrescu wrote:
> On 4/8/12 1:33 AM, Daniel Murphy wrote:
>> - Would generate false pointers
> Fair point but we're also moving to precise collection :o).
I don't know of a good generic way to do precise collection with unions.
|
April 19, 2018 Re: Small Buffer Optimization for string and friends | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright Attachments:
| On Wed., 18 Apr. 2018, 1:00 pm Walter Bright via Digitalmars-d, < digitalmars-d@puremagic.com> wrote: > On 4/8/2012 7:29 AM, Andrei Alexandrescu wrote: > > On 4/8/12 1:33 AM, Daniel Murphy wrote: > >> - Would generate false pointers > > Fair point but we're also moving to precise collection :o). > > > I don't know of a good generic way to do precise collection with unions. > I wonder if precise collectors could leverage runtime support for ambiguous cases? opPreciseCollect() which might return an array of pointers contained in T, which would allow runtime logic to determine how the union should be interpreted... Or maybe the function should receive a delegate which the function should call on each embedded pointer. I'm sure some standardised runtime support function can help out in these cases... > |
April 19, 2018 Re: Small Buffer Optimization for string and friends | ||||
---|---|---|---|---|
| ||||
Attachments:
| On Wed., 18 Apr. 2018, 8:36 pm Manu, <turkeyman@gmail.com> wrote: > On Wed., 18 Apr. 2018, 1:00 pm Walter Bright via Digitalmars-d, < digitalmars-d@puremagic.com> wrote: > >> On 4/8/2012 7:29 AM, Andrei Alexandrescu wrote: >> > On 4/8/12 1:33 AM, Daniel Murphy wrote: >> >> - Would generate false pointers >> > Fair point but we're also moving to precise collection :o). >> >> >> I don't know of a good generic way to do precise collection with unions. >> > > I wonder if precise collectors could leverage runtime support for ambiguous cases? opPreciseCollect() which might return an array of pointers contained in T, which would allow runtime logic to determine how the union should be interpreted... Or maybe the function should receive a delegate which the function should call on each embedded pointer. > > I'm sure some standardised runtime support function can help out in these cases... > This would be useful too for applications that use bit-packed or encoded/implied pointers... > |
Copyright © 1999-2021 by the D Language Foundation