Use case: eliminate hidden allocations in buildPath (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Use case: eliminate hidden allocations in buildPath (page 3)

December 05, 2013

Re: Use case: eliminate hidden allocations in buildPath

Posted by inout
in reply to Andrei Alexandrescu

inout

Posted in reply to Andrei Alexandrescu

On Wednesday, 4 December 2013 at 23:14:48 UTC, Andrei Alexandrescu wrote:
> Hello,
>
>
> Walter and I were talking about eliminating the surreptitious allocations in buildPath:
>
> http://dlang.org/phobos/std_path.html#.buildPath
>
> We'd need to keep the existing version working, so we're looking at adding one or more new overloads. We're looking at giving the user the option to control any needed memory allocation (or even arrange things such that there's no memory allocated at all).
>
> It's a generous design space, so although we have a couple of ideas let's hear others first.
>
>
> Thanks,
>
> Andrei

My approach in cases like that used to be to pass an optional char[] buffer = null as last argument (it doesn't work as nice in variadic functions though):

// constructs the result in a buffer if one is passed until there is no more space left, in which case it reallocates the buffer
char[] buildPath(const(char)[] path1, const(char)[] path2, char[] buffer = null) {
  Appender!(char) appender = useBuffer(buffer);
  ...
}

December 05, 2013

Re: Use case: eliminate hidden allocations in buildPath

Posted by Andrei Alexandrescu
in reply to monarch_dodra

Andrei Alexandrescu

Posted in reply to monarch_dodra

On 12/5/13 9:06 AM, monarch_dodra wrote:
> On Thursday, 5 December 2013 at 16:46:52 UTC, Andrei Alexandrescu wrote:
>> On 12/5/13 8:19 AM, monarch_dodra wrote:
>>> On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote:
>>>> Andrei
>>>
>>> Output range! :)
>>>
>>> Output range interface makes no linearity requirements. Just that:
>>> "out.put(this)" compiles.
>>
>> Hrm, construction of a hash table is linearizable so bad example on my
>> part. But I'm talking about general structured data such as objects
>> with allocated fields and connections to other objects etc. etc.
>>
>> Andrei
>
> Say, something like a graph? At that point, I'd say you'd have to pass
> an "allocating scheme" to the function, so an allocator, yes. Depending
> on the data being generated, the constructed item could carry the
> allocator itself. For example: auto newNode = myNode.GenerateNeighbor();
>
> In But I think that'd be a special case situation. For everything else,
> output range is an easy and intuitive, and fits well with the rest of
> phobos. We'd want (IMHO) to integrate the allocators directly inside the
> output ranges.

Trees and graphs are not special cases. They're bread and butter objects with fields that are in turn allocated etc.

Consider how difficult it would be (for both the implementer and the user) to fill via an output range an object (class or struct) that has a few fields that need allocation (such as strings and arrays). It becomes a mini-serialization project. We can't realistically propose that as a way to allow users to control memory allocation.


Andrei

December 05, 2013

Re: Use case: eliminate hidden allocations in buildPath

Posted by Dmitry Olshansky
in reply to Andrei Alexandrescu

Dmitry Olshansky

Posted in reply to Andrei Alexandrescu

05-Dec-2013 20:46, Andrei Alexandrescu пишет:
> On 12/5/13 8:19 AM, monarch_dodra wrote:
>> On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote:
>>> Andrei
>>
>> Output range! :)
>>
>> Output range interface makes no linearity requirements. Just that:
>> "out.put(this)" compiles.
>
> Hrm, construction of a hash table is linearizable so bad example on my
> part. But I'm talking about general structured data such as objects with
> allocated fields and connections to other objects etc. etc.

Pass desired container type that follows some implicit protocol such as isGraph!T. Then container type defines allocation scheme (=some container specific default).

As an extended version allow passing reference to an allocator to use with that container type.

-- 
Dmitry Olshansky

December 05, 2013

Re: Use case: eliminate hidden allocations in buildPath

Posted by Brad Anderson
in reply to monarch_dodra

Brad Anderson

Posted in reply to monarch_dodra

On Thursday, 5 December 2013 at 08:09:55 UTC, monarch_dodra wrote:
> On Wednesday, 4 December 2013 at 23:14:48 UTC, Andrei Alexandrescu wrote:
>> Hello,
>>
>>
>> Walter and I were talking about eliminating the surreptitious allocations in buildPath:
>>
>> http://dlang.org/phobos/std_path.html#.buildPath
>>
>> We'd need to keep the existing version working, so we're looking at adding one or more new overloads. We're looking at giving the user the option to control any needed memory allocation (or even arrange things such that there's no memory allocated at all).
>>
>> It's a generous design space, so although we have a couple of ideas let's hear others first.
>>
>>
>> Thanks,
>>
>> Andrei
>
> Use an output range. It's the generic D approach, and what we already do for the string functions such as std.string.translate:
> http://dlang.org/phobos/std_string.html#.translate
> (look down for the output range overloads).
>
> Anything "allocator" related should be carried by the output range itself. The function itself should not care nor know about any of that.

And thanks to your improvements[1] to put() output ranges should actually be much more usable than they ever were before. I've mentioned this a few times but I tried to add an output range overload of toUpper and toLower but immediately encountered problems with appending dchars to char arrays.  Your recently merged improvements to put() appear to have addressed that problem and many others and I'll soon have another go at adding these overloads now that that change is in place so thanks for doing the hard work.

Before this got merged I don't even think output ranges could be easily used for this improvement to buildPath so it got merged at a very convenient time.

(my memory is a bit poor but if I'm remembering correctly appender worked because it had done its own handling of narrow strings but you couldn't just use a static or dynamic narrow string array as an output range)

1. https://github.com/D-Programming-Language/phobos/pull/1569

December 05, 2013

Re: Use case: eliminate hidden allocations in buildPath

Posted by Brad Anderson
in reply to inout

Brad Anderson

Posted in reply to inout

On Thursday, 5 December 2013 at 17:30:37 UTC, inout wrote:
> My approach in cases like that used to be to pass an optional char[] buffer = null as last argument (it doesn't work as nice in variadic functions though):
>
> // constructs the result in a buffer if one is passed until there is no more space left, in which case it reallocates the buffer
> char[] buildPath(const(char)[] path1, const(char)[] path2, char[] buffer = null) {
>   Appender!(char) appender = useBuffer(buffer);
>   ...
> }

The nice thing about output ranges instead of this approach is the allocation strategy is left up to the caller rather than the function.  I think generalized SSO (a small static buffer that can fall back to a dynamic allocation) will become a very common idiom as more and more of phobos accepts output ranges.

December 05, 2013

Re: Use case: eliminate hidden allocations in buildPath

Posted by monarch_dodra
in reply to Andrei Alexandrescu

monarch_dodra

Posted in reply to Andrei Alexandrescu

On Thursday, 5 December 2013 at 18:34:40 UTC, Andrei Alexandrescu wrote:
> Trees and graphs are not special cases. They're bread and butter objects with fields that are in turn allocated etc.

Yes, and they aren't really linear containers either, but that doesn't mean we throw the input range concept out the window.

The graph itself could carry the allocator, and just the same you have a function "getNeighbors" that returns an input range of a node's neighboring nodes, the graph could also give a collection of different ouput ranges that takes care of allocation.

I'm just saying it's not something I'd like to see that *functions* take allocators as arguments. I think that should be abstracted by the containers/ranges themselves, in such a way that the functions only have to use things like "pushBack" (eg: std::back_inserter).

> Consider how difficult it would be (for both the implementer and the user) to fill via an output range an object (class or struct) that has a few fields that need allocation (such as strings and arrays). It becomes a mini-serialization project. We can't realistically propose that as a way to allow users to control memory allocation.

I'm out of my league on this one.

December 05, 2013

Re: Use case: eliminate hidden allocations in buildPath

Posted by monarch_dodra
in reply to Brad Anderson

monarch_dodra

Posted in reply to Brad Anderson

On Thursday, 5 December 2013 at 20:47:32 UTC, Brad Anderson wrote:
> And thanks to your improvements[1] to put() output ranges should actually be much more usable than they ever were before. I've mentioned this a few times but I tried to add an output range overload of toUpper and toLower but immediately encountered problems with appending dchars to char arrays.  Your recently merged improvements to put() appear to have addressed that problem and many others and I'll soon have another go at adding these overloads now that that change is in place so thanks for doing the hard work.
>
> Before this got merged I don't even think output ranges could be easily used for this improvement to buildPath so it got merged at a very convenient time.
>
> (my memory is a bit poor but if I'm remembering correctly appender worked because it had done its own handling of narrow strings but you couldn't just use a static or dynamic narrow string array as an output range)
>
> 1. https://github.com/D-Programming-Language/phobos/pull/1569

Yes, appender did its own trans-coding, which helped the situation, but it made things like output delegates absolutely useless.

My next step was to *remove* the on the fly trancoding in appender, to rely only "put", directly, but I fear that would cause un-necessary breakge, due to UFCS not triggering if the member function exists:

In the sense that
put(appender!string(), myDstring); //Yes
appender!string().put(myDstring); //Nope

December 07, 2013

Re: Use case: eliminate hidden allocations in buildPath

Posted by Michel Fortin
in reply to Andrei Alexandrescu

Michel Fortin

Posted in reply to Andrei Alexandrescu

On 2013-12-05 16:46:52 +0000, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> On 12/5/13 8:19 AM, monarch_dodra wrote:
>> Output range! :)
>> 
>> Output range interface makes no linearity requirements. Just that:
>> "out.put(this)" compiles.
> 
> Hrm, construction of a hash table is linearizable so bad example on my part. But I'm talking about general structured data such as objects with allocated fields and connections to other objects etc. etc.

You have to admit that a "bump the pointer" allocator is pretty much the same thing as an output range that fills a preexisting array.

So here's an interesting idea: perhaps typed allocators should *be* output ranges. The difference is that if you "put" something on the allocator, it copies/moves it to the allocator's heap and returns you a pointer. While if you "put" something on a generic output range, don't expect any pointer out of it.

Some functions, like buildPath, would only require an output range. Others will need a more complete allocator, that's those functions that want to refer back to elements they've "put" on the allocator's heap.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

December 07, 2013

Re: Use case: eliminate hidden allocations in buildPath

Posted by Andrei Alexandrescu
in reply to Michel Fortin

Andrei Alexandrescu

Posted in reply to Michel Fortin

On 12/7/13 5:13 AM, Michel Fortin wrote:
> On 2013-12-05 16:46:52 +0000, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> said:
>
>> On 12/5/13 8:19 AM, monarch_dodra wrote:
>>> Output range! :)
>>>
>>> Output range interface makes no linearity requirements. Just that:
>>> "out.put(this)" compiles.
>>
>> Hrm, construction of a hash table is linearizable so bad example on my
>> part. But I'm talking about general structured data such as objects
>> with allocated fields and connections to other objects etc. etc.
>
> You have to admit that a "bump the pointer" allocator is pretty much the
> same thing as an output range that fills a preexisting array.

Only of the output range is untyped. An allocator traffics in untyped chunks; an output range usually has an element type.

> So here's an interesting idea: perhaps typed allocators should *be*
> output ranges. The difference is that if you "put" something on the
> allocator, it copies/moves it to the allocator's heap and returns you a
> pointer. While if you "put" something on a generic output range, don't
> expect any pointer out of it.
>
> Some functions, like buildPath, would only require an output range.
> Others will need a more complete allocator, that's those functions that
> want to refer back to elements they've "put" on the allocator's heap.

I don't think that would work.


Andrei

December 07, 2013

Re: Use case: eliminate hidden allocations in buildPath

Posted by Michel Fortin
in reply to Andrei Alexandrescu

Michel Fortin

Posted in reply to Andrei Alexandrescu

On 2013-12-07 16:08:18 +0000, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> On 12/7/13 5:13 AM, Michel Fortin wrote:
>> You have to admit that a "bump the pointer" allocator is pretty much the
>> same thing as an output range that fills a preexisting array.
> 
> Only of the output range is untyped. An allocator traffics in untyped chunks; an output range usually has an element type.

If I remember well, you said before that you may also design "typed allocators" that'd work at a layer above untyped allocators. That's what I was referring to, although I wasn't very clear (the words "typed allocator" only appeared after the paragraph quoted above).

Anyway, here's another thing to consider in the Output Range vs. Allocator debate: allocators can be faster than output ranges because you can tell the allocator in advance how much data you need.

For instance, if you're joining multiple strings together you can quickly calculate the final length, allocate that, write to allocated memory and return the result. In fact, that's exactly what buildPath currently does when allocating from the GC.

With an output range that appends to a string the string needs to constantly grow to accommodate new content, with no insight as to the final length. The way to fix that with an output range would be to have have a "reserve" function telling the range "prepare yourself to receive that much data", which may or may not do something depending on the kind of output range it is. There's no such function in the definition of an output range right now.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation