December 05, 2013 Re: Use case: eliminate hidden allocations in buildPath | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Wednesday, 4 December 2013 at 23:14:48 UTC, Andrei Alexandrescu wrote:
> Hello,
>
>
> Walter and I were talking about eliminating the surreptitious allocations in buildPath:
>
> http://dlang.org/phobos/std_path.html#.buildPath
>
> We'd need to keep the existing version working, so we're looking at adding one or more new overloads. We're looking at giving the user the option to control any needed memory allocation (or even arrange things such that there's no memory allocated at all).
>
> It's a generous design space, so although we have a couple of ideas let's hear others first.
>
>
> Thanks,
>
> Andrei
My approach in cases like that used to be to pass an optional char[] buffer = null as last argument (it doesn't work as nice in variadic functions though):
// constructs the result in a buffer if one is passed until there is no more space left, in which case it reallocates the buffer
char[] buildPath(const(char)[] path1, const(char)[] path2, char[] buffer = null) {
Appender!(char) appender = useBuffer(buffer);
...
}
|
December 05, 2013 Re: Use case: eliminate hidden allocations in buildPath | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On 12/5/13 9:06 AM, monarch_dodra wrote:
> On Thursday, 5 December 2013 at 16:46:52 UTC, Andrei Alexandrescu wrote:
>> On 12/5/13 8:19 AM, monarch_dodra wrote:
>>> On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote:
>>>> Andrei
>>>
>>> Output range! :)
>>>
>>> Output range interface makes no linearity requirements. Just that:
>>> "out.put(this)" compiles.
>>
>> Hrm, construction of a hash table is linearizable so bad example on my
>> part. But I'm talking about general structured data such as objects
>> with allocated fields and connections to other objects etc. etc.
>>
>> Andrei
>
> Say, something like a graph? At that point, I'd say you'd have to pass
> an "allocating scheme" to the function, so an allocator, yes. Depending
> on the data being generated, the constructed item could carry the
> allocator itself. For example: auto newNode = myNode.GenerateNeighbor();
>
> In But I think that'd be a special case situation. For everything else,
> output range is an easy and intuitive, and fits well with the rest of
> phobos. We'd want (IMHO) to integrate the allocators directly inside the
> output ranges.
Trees and graphs are not special cases. They're bread and butter objects with fields that are in turn allocated etc.
Consider how difficult it would be (for both the implementer and the user) to fill via an output range an object (class or struct) that has a few fields that need allocation (such as strings and arrays). It becomes a mini-serialization project. We can't realistically propose that as a way to allow users to control memory allocation.
Andrei
|
December 05, 2013 Re: Use case: eliminate hidden allocations in buildPath | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | 05-Dec-2013 20:46, Andrei Alexandrescu пишет: > On 12/5/13 8:19 AM, monarch_dodra wrote: >> On Thursday, 5 December 2013 at 15:00:07 UTC, Andrei Alexandrescu wrote: >>> Andrei >> >> Output range! :) >> >> Output range interface makes no linearity requirements. Just that: >> "out.put(this)" compiles. > > Hrm, construction of a hash table is linearizable so bad example on my > part. But I'm talking about general structured data such as objects with > allocated fields and connections to other objects etc. etc. Pass desired container type that follows some implicit protocol such as isGraph!T. Then container type defines allocation scheme (=some container specific default). As an extended version allow passing reference to an allocator to use with that container type. -- Dmitry Olshansky |
December 05, 2013 Re: Use case: eliminate hidden allocations in buildPath | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Thursday, 5 December 2013 at 08:09:55 UTC, monarch_dodra wrote: > On Wednesday, 4 December 2013 at 23:14:48 UTC, Andrei Alexandrescu wrote: >> Hello, >> >> >> Walter and I were talking about eliminating the surreptitious allocations in buildPath: >> >> http://dlang.org/phobos/std_path.html#.buildPath >> >> We'd need to keep the existing version working, so we're looking at adding one or more new overloads. We're looking at giving the user the option to control any needed memory allocation (or even arrange things such that there's no memory allocated at all). >> >> It's a generous design space, so although we have a couple of ideas let's hear others first. >> >> >> Thanks, >> >> Andrei > > Use an output range. It's the generic D approach, and what we already do for the string functions such as std.string.translate: > http://dlang.org/phobos/std_string.html#.translate > (look down for the output range overloads). > > Anything "allocator" related should be carried by the output range itself. The function itself should not care nor know about any of that. And thanks to your improvements[1] to put() output ranges should actually be much more usable than they ever were before. I've mentioned this a few times but I tried to add an output range overload of toUpper and toLower but immediately encountered problems with appending dchars to char arrays. Your recently merged improvements to put() appear to have addressed that problem and many others and I'll soon have another go at adding these overloads now that that change is in place so thanks for doing the hard work. Before this got merged I don't even think output ranges could be easily used for this improvement to buildPath so it got merged at a very convenient time. (my memory is a bit poor but if I'm remembering correctly appender worked because it had done its own handling of narrow strings but you couldn't just use a static or dynamic narrow string array as an output range) 1. https://github.com/D-Programming-Language/phobos/pull/1569 |
December 05, 2013 Re: Use case: eliminate hidden allocations in buildPath | ||||
---|---|---|---|---|
| ||||
Posted in reply to inout | On Thursday, 5 December 2013 at 17:30:37 UTC, inout wrote:
> My approach in cases like that used to be to pass an optional char[] buffer = null as last argument (it doesn't work as nice in variadic functions though):
>
> // constructs the result in a buffer if one is passed until there is no more space left, in which case it reallocates the buffer
> char[] buildPath(const(char)[] path1, const(char)[] path2, char[] buffer = null) {
> Appender!(char) appender = useBuffer(buffer);
> ...
> }
The nice thing about output ranges instead of this approach is the allocation strategy is left up to the caller rather than the function. I think generalized SSO (a small static buffer that can fall back to a dynamic allocation) will become a very common idiom as more and more of phobos accepts output ranges.
|
December 05, 2013 Re: Use case: eliminate hidden allocations in buildPath | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Thursday, 5 December 2013 at 18:34:40 UTC, Andrei Alexandrescu wrote: > Trees and graphs are not special cases. They're bread and butter objects with fields that are in turn allocated etc. Yes, and they aren't really linear containers either, but that doesn't mean we throw the input range concept out the window. The graph itself could carry the allocator, and just the same you have a function "getNeighbors" that returns an input range of a node's neighboring nodes, the graph could also give a collection of different ouput ranges that takes care of allocation. I'm just saying it's not something I'd like to see that *functions* take allocators as arguments. I think that should be abstracted by the containers/ranges themselves, in such a way that the functions only have to use things like "pushBack" (eg: std::back_inserter). > Consider how difficult it would be (for both the implementer and the user) to fill via an output range an object (class or struct) that has a few fields that need allocation (such as strings and arrays). It becomes a mini-serialization project. We can't realistically propose that as a way to allow users to control memory allocation. I'm out of my league on this one. |
December 05, 2013 Re: Use case: eliminate hidden allocations in buildPath | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brad Anderson | On Thursday, 5 December 2013 at 20:47:32 UTC, Brad Anderson wrote: > And thanks to your improvements[1] to put() output ranges should actually be much more usable than they ever were before. I've mentioned this a few times but I tried to add an output range overload of toUpper and toLower but immediately encountered problems with appending dchars to char arrays. Your recently merged improvements to put() appear to have addressed that problem and many others and I'll soon have another go at adding these overloads now that that change is in place so thanks for doing the hard work. > > Before this got merged I don't even think output ranges could be easily used for this improvement to buildPath so it got merged at a very convenient time. > > (my memory is a bit poor but if I'm remembering correctly appender worked because it had done its own handling of narrow strings but you couldn't just use a static or dynamic narrow string array as an output range) > > 1. https://github.com/D-Programming-Language/phobos/pull/1569 Yes, appender did its own trans-coding, which helped the situation, but it made things like output delegates absolutely useless. My next step was to *remove* the on the fly trancoding in appender, to rely only "put", directly, but I fear that would cause un-necessary breakge, due to UFCS not triggering if the member function exists: In the sense that put(appender!string(), myDstring); //Yes appender!string().put(myDstring); //Nope |
December 07, 2013 Re: Use case: eliminate hidden allocations in buildPath | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 2013-12-05 16:46:52 +0000, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said: > On 12/5/13 8:19 AM, monarch_dodra wrote: >> Output range! :) >> >> Output range interface makes no linearity requirements. Just that: >> "out.put(this)" compiles. > > Hrm, construction of a hash table is linearizable so bad example on my part. But I'm talking about general structured data such as objects with allocated fields and connections to other objects etc. etc. You have to admit that a "bump the pointer" allocator is pretty much the same thing as an output range that fills a preexisting array. So here's an interesting idea: perhaps typed allocators should *be* output ranges. The difference is that if you "put" something on the allocator, it copies/moves it to the allocator's heap and returns you a pointer. While if you "put" something on a generic output range, don't expect any pointer out of it. Some functions, like buildPath, would only require an output range. Others will need a more complete allocator, that's those functions that want to refer back to elements they've "put" on the allocator's heap. -- Michel Fortin michel.fortin@michelf.ca http://michelf.ca |
December 07, 2013 Re: Use case: eliminate hidden allocations in buildPath | ||||
---|---|---|---|---|
| ||||
Posted in reply to Michel Fortin | On 12/7/13 5:13 AM, Michel Fortin wrote: > On 2013-12-05 16:46:52 +0000, Andrei Alexandrescu > <SeeWebsiteForEmail@erdani.org> said: > >> On 12/5/13 8:19 AM, monarch_dodra wrote: >>> Output range! :) >>> >>> Output range interface makes no linearity requirements. Just that: >>> "out.put(this)" compiles. >> >> Hrm, construction of a hash table is linearizable so bad example on my >> part. But I'm talking about general structured data such as objects >> with allocated fields and connections to other objects etc. etc. > > You have to admit that a "bump the pointer" allocator is pretty much the > same thing as an output range that fills a preexisting array. Only of the output range is untyped. An allocator traffics in untyped chunks; an output range usually has an element type. > So here's an interesting idea: perhaps typed allocators should *be* > output ranges. The difference is that if you "put" something on the > allocator, it copies/moves it to the allocator's heap and returns you a > pointer. While if you "put" something on a generic output range, don't > expect any pointer out of it. > > Some functions, like buildPath, would only require an output range. > Others will need a more complete allocator, that's those functions that > want to refer back to elements they've "put" on the allocator's heap. I don't think that would work. Andrei |
December 07, 2013 Re: Use case: eliminate hidden allocations in buildPath | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 2013-12-07 16:08:18 +0000, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said: > On 12/7/13 5:13 AM, Michel Fortin wrote: >> You have to admit that a "bump the pointer" allocator is pretty much the >> same thing as an output range that fills a preexisting array. > > Only of the output range is untyped. An allocator traffics in untyped chunks; an output range usually has an element type. If I remember well, you said before that you may also design "typed allocators" that'd work at a layer above untyped allocators. That's what I was referring to, although I wasn't very clear (the words "typed allocator" only appeared after the paragraph quoted above). Anyway, here's another thing to consider in the Output Range vs. Allocator debate: allocators can be faster than output ranges because you can tell the allocator in advance how much data you need. For instance, if you're joining multiple strings together you can quickly calculate the final length, allocate that, write to allocated memory and return the result. In fact, that's exactly what buildPath currently does when allocating from the GC. With an output range that appends to a string the string needs to constantly grow to accommodate new content, with no insight as to the final length. The way to fix that with an output range would be to have have a "reserve" function telling the range "prepare yourself to receive that much data", which may or may not do something depending on the kind of output range it is. There's no such function in the definition of an output range right now. -- Michel Fortin michel.fortin@michelf.ca http://michelf.ca |
Copyright © 1999-2021 by the D Language Foundation