Small Buffer Optimization for string and friends (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Small Buffer Optimization for string and friends (page 3)

April 08, 2012

Re: Small Buffer Optimization for string and friends

Posted by Andrei Alexandrescu
in reply to H. S. Teoh

Andrei Alexandrescu

Posted in reply to H. S. Teoh

On 4/8/12 8:54 AM, H. S. Teoh wrote:
> - Qualified keys not fully working: the current code has a few corner
>    cases that don't work with shared/immutable/inout keys. One major
>    roadblock is how to implement this:
>
> 	alias someType T;
> 	inout(T) myFunc(inout(T) arg, ...) {
> 		int[inout(T)] aa;
> 		...
> 	}

I wonder how frequently such code is used.

Andrei

April 08, 2012

Re: Small Buffer Optimization for string and friends

Posted by Manu
in reply to Andrei Alexandrescu

Manu

Posted in reply to Andrei Alexandrescu

Attachments:

text/html part

On 8 April 2012 17:52, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>wrote:

> On 4/8/12 4:54 AM, Manu wrote:
>
>> On 8 April 2012 12:46, Vladimir Panteleev <vladimir@thecybershadow.net <mailto:vladimir@**thecybershadow.net <vladimir@thecybershadow.net>>> wrote:
>>
>>    On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:
>>
>>        Walter and I discussed today about using the small string
>>        optimization in string and other arrays of immutable small objects.
>>
>>        On 64 bit machines, string occupies 16 bytes. We could use the
>>        first byte as discriminator, which means that all strings under
>>        16 chars need no memory allocation at all.
>>
>>
>>    Don't use the first byte. Use the last byte.
>>
>>    The last byte is the highest-order byte of the length. Limiting
>>    arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much
>>    nicer limitation than making assumptions about the memory layout.
>>
>>
>> What is the plan for 32bit?
>>
>
> We can experiment with making strings shorter than 8 chars in-situ. The drawback will be that length will be limited to 29 bits, i.e. 512MB.


29 bits? ...not 31?
How does this implementation actually work? On 32/64 bits, and little/big
endian?
I can only imagine it working with a carefully placed 1 bit. bit-0 of the
size on little endian, and bit-31 of the size on big endian. That should
only halve the address range (leaving 31 bits)... where did the other 2
bits go?

I also hope this only affects slices of chars? It will ignore this behaviour for anything other than char arrays right?

April 08, 2012

Re: Small Buffer Optimization for string and friends

Posted by Andrei Alexandrescu
in reply to Michel Fortin

Andrei Alexandrescu

Posted in reply to Michel Fortin

On 4/8/12 9:59 AM, Michel Fortin wrote:
> But as soon as you take a pointer to that string, you break the
> immutability guaranty:
>
> immutable(char)[] s = "abcd";
> immutable(char)* p = s.ptr;
> s = "defg"; // assigns to where?

Taking .ptr will engender a copy. A small regression will be that address of individual chars cannot be taken.

Andrei

April 08, 2012

Re: Small Buffer Optimization for string and friends

Posted by Andrei Alexandrescu
in reply to Manu

Andrei Alexandrescu

Posted in reply to Manu

On 4/8/12 10:03 AM, Manu wrote:
> 29 bits? ...not 31?

Sorry, 31 indeed.

> How does this implementation actually work? On 32/64 bits, and
> little/big endian?
> I can only imagine it working with a carefully placed 1 bit. bit-0 of
> the size on little endian, and bit-31 of the size on big endian. That
> should only halve the address range (leaving 31 bits)... where did the
> other 2 bits go?

Essentially it will use either the first or the last bit of the representation as discriminator. That bit is most likely "taken" from the length representation. Shifting and masking can easily account for it when computing length of large strings.

> I also hope this only affects slices of chars? It will ignore this
> behaviour for anything other than char arrays right?

It works for any arrays of sufficiently small immutable data type (e.g. immutable(byte)[]), but the most advantage is reaped for string.

Andrei

April 08, 2012

Re: Small Buffer Optimization for string and friends

Posted by Andrei Alexandrescu
in reply to Manu

Andrei Alexandrescu

Posted in reply to Manu

On 4/8/12 9:26 AM, Manu wrote:
> Is it realistic that anyone can actually use raw d-string's in an app
> that performs a lot of string manipulation?

Yes.

> I bet most people end up
> with a custom string class anyway...

That does happen, but much more rarely than one might think.

> Who's written a string-heavy app without their own string helper class?
> I ended up with a string class within about half an hour of trying to
> work with D strings (initially just to support stack strings, then it grew).

A lot of people write string-heavy apps with the built-in strings. "Heavy" actually describes a continuum. No matter how you put it, improving the performance of built-in strings is beneficial.

Andrei

April 08, 2012

Re: Small Buffer Optimization for string and friends

Posted by Jacob Carlborg
in reply to Andrei Alexandrescu

Jacob Carlborg

Posted in reply to Andrei Alexandrescu

On 2012-04-08 16:54, Andrei Alexandrescu wrote:
> On 4/8/12 5:45 AM, Jacob Carlborg wrote:
>> On 2012-04-08 07:56, Andrei Alexandrescu wrote:
>>
>>> For this to happen, we need to start an effort of migrating built-in
>>> arrays into runtime, essentially making them templates that the compiler
>>> lowers to. So I have two questions:
>>
>> Just don't make the same mistake as with AA.
>
> The mistake with AAs was done long ago, but it was forced as AAs
> predated templates.
>
> Andrei
>

I'm referring to the new template implementation of AAs that got reverted due everything breaking, if I recall correctly.

-- 
/Jacob Carlborg

April 08, 2012

Re: Small Buffer Optimization for string and friends

Posted by Michel Fortin
in reply to Andrei Alexandrescu

Michel Fortin

Posted in reply to Andrei Alexandrescu

On 2012-04-08 15:06:13 +0000, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> On 4/8/12 9:59 AM, Michel Fortin wrote:
>> But as soon as you take a pointer to that string, you break the
>> immutability guaranty:
>> 
>> immutable(char)[] s = "abcd";
>> immutable(char)* p = s.ptr;
>> s = "defg"; // assigns to where?
> 
> Taking .ptr will engender a copy. A small regression will be that address of individual chars cannot be taken.

You know, many people have been wary of hidden memory allocations in the past. That's not going to make them happy. I'm not complaining, but I think .ptr should return null in those cases. Let people use toStringz when they need a C string, and let people deal with the ugly details themselves if they're using .ptr to bypass array bound checking. Because if someone used .ptr somewhere to bypass bound checking and instead he gets a memory allocation at each loop iteration… it won't be pretty.

And what about implicit conversions to const(char)[]? That too will require a copy, because otherwise it could point to the local stack frame where your immutable(char)[] resides. That said, maybe copies of small-string optimized immutable(char)[] could be small-string optimized const(char)[]. That'd not have any side effect since no one can have a mutable pointer/slice to the const copy anyway.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

April 08, 2012

Re: Small Buffer Optimization for string and friends

Posted by Andrei Alexandrescu
in reply to Michel Fortin

Andrei Alexandrescu

Posted in reply to Michel Fortin

On 4/8/12 10:48 AM, Michel Fortin wrote:
> On 2012-04-08 15:06:13 +0000, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> said:
>
>> On 4/8/12 9:59 AM, Michel Fortin wrote:
>>> But as soon as you take a pointer to that string, you break the
>>> immutability guaranty:
>>>
>>> immutable(char)[] s = "abcd";
>>> immutable(char)* p = s.ptr;
>>> s = "defg"; // assigns to where?
>>
>> Taking .ptr will engender a copy. A small regression will be that
>> address of individual chars cannot be taken.
>
> You know, many people have been wary of hidden memory allocations in the
> past.

Well, the optimization makes for fewer allocations total. In fact .ptr does the allocation that was formerly mandatory.

> That's not going to make them happy. I'm not complaining, but I
> think .ptr should return null in those cases.

That would be too large a regression I think. And it's not detectable during compilation.

> Let people use toStringz
> when they need a C string, and let people deal with the ugly details
> themselves if they're using .ptr to bypass array bound checking. Because
> if someone used .ptr somewhere to bypass bound checking and instead he
> gets a memory allocation at each loop iteration… it won't be pretty.

Only one allocation. First invocation of .ptr effectively changes representation.

> And what about implicit conversions to const(char)[]? That too will
> require a copy, because otherwise it could point to the local stack
> frame where your immutable(char)[] resides. That said, maybe copies of
> small-string optimized immutable(char)[] could be small-string optimized
> const(char)[]. That'd not have any side effect since no one can have a
> mutable pointer/slice to the const copy anyway.

I think casting to const(char)[] should work without allocation.


Andrei

April 08, 2012

Re: Small Buffer Optimization for string and friends

Posted by Walter Bright
in reply to Andrei Alexandrescu

Walter Bright

Posted in reply to Andrei Alexandrescu

On 4/8/2012 7:53 AM, Andrei Alexandrescu wrote:
> Once anyone asks for .ptr a conservative copy will be made.

That could get expensive. You cannot just point into the small string part, because that may only exist temporarily on the stack. There are some pathological cases for this.

April 08, 2012

Re: Small Buffer Optimization for string and friends

Posted by H. S. Teoh
in reply to Jacob Carlborg

H. S. Teoh

Posted in reply to Jacob Carlborg

On Sun, Apr 08, 2012 at 05:35:50PM +0200, Jacob Carlborg wrote:
> On 2012-04-08 16:54, Andrei Alexandrescu wrote:
> >On 4/8/12 5:45 AM, Jacob Carlborg wrote:
[...]
> >>Just don't make the same mistake as with AA.
> >
> >The mistake with AAs was done long ago, but it was forced as AAs predated templates.
> >
> >Andrei
> >
> 
> I'm referring to the new template implementation of AAs that got reverted due everything breaking, if I recall correctly.
[...]

Huh? When was this?


T

-- 
"I suspect the best way to deal with procrastination is to put off the procrastination itself until later. I've been meaning to try this, but haven't gotten around to it yet. " -- swr

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation