October 01, 2014
On 9/30/14, 9:49 AM, Johannes Pfau wrote:
> I guess my point is that although RC is useful in some cases output
> ranges / sink delegates / pre-allocated buffers are still necessary in
> other cases and RC is not the solution for _everything_.

Agreed.

> As Manu often pointed out sometimes you do not want any dynamic
> allocation (toStringz in games is a good example) and here RC doesn't
> help.
>
> Another example is format which can already write to output ranges and
> uses sink delegates internally. That's a much better abstraction than
> simply returning a reference counted string (allocated with a thread
> local allocator). Using sink delegates internally is also more
> efficient than creating temporary RCStrings. And sometimes there's no
> allocation at all this way (directly writing to a socket/file).

Agreed.

>>> What if I don't want automated memory _management_? What if I want a
>>> function to use a stack buffer? Or if I want to free manually?
>>>
>>> If I want std.string.toStringz to put the result into a temporary
>>> stack buffer your solution doesn't help at all. Passing an ouput
>>> range, allocator or buffer would all solve this.
>>
>> Correct. The output of toStringz would be either a GC string or an RC
>> string.
>
> But why not provide 3 overloads then?
>
> toStringz(OutputRange)
> string toStringz(Policy) //char*, actually
> RCString toStringz(Policy)
>
> The notion I got from some of your posts is that you're opposed to such
> overloads, or did I misinterpret that?

I'm not opposed. Here's what I think.

As an approach to using Phobos without a GC, it's been suggested that we supplement garbage-creating functions with new functions that use output ranges everywhere, or lazy ranges everywhere.

I think a better approach is to make memory management a policy that makes convenient use of reference counting possible. So instead of garbage there'd be reference counted stuff.

Of course, to the extent using lazy computation and/or output ranges is a good thing to have for various reasons, they remain valid techniques that are and will continue being used in Phobos.

My point is that acknowledging and systematically using reference counted types is an essential part of the entire approach.


Andrei


October 01, 2014
On 9/30/14, 10:33 AM, H. S. Teoh via Digitalmars-d wrote:
> Yeah, this echoes my concern. This looks not that much different, from a
> user's POV, from C++ containers' allocator template parameters. Yes I
> know we're not talking about*allocators*  per se but about *memory
> management*, but I'm talking about the need to explicitly pass mmp to
> *every*  *single*  *function*  if you desire anything but the default. How
> many people actually*use*  the allocator parameter in STL? Certainly,
> many people do... but the code is anything but readable / maintainable.

The parallel with STL allocators is interesting, but I'm not worried about it that much. I don't want to go off on a tangent but I'm fairly certain std::allocator is hard to use for entirely different reasons than the intended use patterns of MemoryManagementPolicy.

> Not only that, but every single function will have to handle this
> parameter somehow, and if static if's at the top of the function is what
> we're starting with, I fear seeing what we end up with.

Apparently Sean's idea would take care of that.

> Furthermore, in order for this to actually work, it has to be percolated
> throughout the entire codebase -- any D library that even remotely uses
> Phobos for anything will have to percolate this parameter throughout its
> API -- at least, any part of the API that might potentially use a Phobos
> function.

Yes, but that's entirely expected. We're adding genuinely new functionality to Phobos.

> Otherwise, you still have the situation where a given D
> library doesn't allow the user to select a memory management scheme, and
> internally calls Phobos functions with the default settings.

Correct.

> So this
> still doesn't solve the problem that today, people who need to use @nogc
> can't use a lot of existing libraries because the library depends on the
> GC, even if it doesn't assume anything about the MM scheme, but just
> happens to call some obscure Phobos function with the default MM
> parameter. The only way this could work was if*every*  D library author
> voluntarily rewrites a lot of code in order to percolate this MM
> parameter through to the API, on the off-chance that some obscure user
> somewhere might have need to use it. I don't see much likelihood of this
> actually happening.

A simple way to put this is Libraries that use the GC will continue to use the GC. There's no way around that unless we choose to break them all.

> Then there's the matter of functions like parseJSON() that needs to
> allocate nodes and return a tree (or whatever) of these nodes. Note that
> they need to*allocate*, not just know what kind of memory management
> model is to be used. So how do you propose to address this? Via another
> parameter (compile-time or otherwise) to specify which allocator to use?
> So how does the memory management parameter solve anything then? And how
> would such a thing be implemented? Using a 3-way static-if branch in
> every single point in parseJSON where it needs to allocate nodes? We
> could just as well write it in C++, if that's the case.

parseJSON() would get a memory management policy parameter, and will use the currently installed memory allocator for allocation.

> This proposal has many glaring holes that need to be fixed before it can
> be viable.

Affirmative. That's why it's an RFC, very far from a proposal. I'm glad I got a bunch of good ideas.


Andrei

October 01, 2014
On 9/30/14, 11:06 AM, Dmitry Olshansky wrote:
> 29-Sep-2014 14:49, Andrei Alexandrescu пишет:
>> auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2
>> ext)
>> if (...)
>> {
>>      static if (mmp == gc) alias S = string;
>>      else alias S = RCString;
>>      S result;
>>      ...
>>      return result;
>> }
>
> Incredible code bloat? Boilerplate in each function for the win?
> I'm at loss as to how it would make things better.

Sean's idea to make string an alias of the policy takes care of this concern. -- Andrei

October 01, 2014
On 9/30/14, 12:10 PM, "Marc Schütz" <schuetzm@gmx.net>" wrote:
> I would argue that GC is at its core _only_ a memory management
> strategy. It just so happens that the one in D's runtime also comes with
> an allocator, with which it is tightly integrated. In theory, a GC can
> work with any (and multiple) allocators, and you could of course also
> call GC.free() manually, because, as you say, management and allocation
> are entirely distinct topics.

I'm not very sure. A GC might need to interoperate closely with the allocator. -- Andrei

October 01, 2014
On 9/30/14, 6:53 PM, Manu via Digitalmars-d wrote:
> I generally like the idea, but my immediate concern is that it implies
> that every function that may deal with allocation is a template.
> This interferes with C/C++ compatibility in a pretty big way. Or more
> generally, the idea of a lib. Does this mean that a lib will be
> required to produce code for every permutation of functions according
> to memory management strategy? Usually libs don't contain code for
> uninstantiated templates.

If a lib chooses one specific memory management policy, it can of course be non-templated with regard to that. If it wants to offer its users the choice, it would probably have to offer some templates.

> With this in place, I worry that traditional use of libs, separate
> compilation, external language linkage, etc, all become very
> problematic.
> Pervasive templates can only work well if all code is D code, and if
> all code is compiled together.
> Most non-OSS industry doesn't ship source, they ship libs. And if libs
> are to become impractical, then dependencies become a problem; instead
> of linking libphobos.so, you pretty much have to compile phobos
> together with your app (already basically true for phobos, but it's
> fairly unique).
> What if that were a much larger library? What if you have 10s of
> dependencies all distributed in this manner? Does it scale?
>
> I guess this doesn't matter if this is only a proposal for phobos...
> but I suspect the pattern will become pervasive if it works, and yeah,
> I'm not sure where that leads.

Thanks for the point. I submit that Phobos has and will be different from other D libraries; as the standard library, it has the role of supporting widely varying needs, and as such it makes a lot of sense to make it highly generic and configurable. Libraries that are for specific domains can avail themselves of a narrower design scope.


Andrei

October 01, 2014
On 9/30/14, 10:46 PM, "Nordlöw" wrote:
> On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu wrote:
>> Back when I've first introduced RCString I hinted that we have a
>> larger strategy in mind. Here it is.
>
> Slightly related :)
>
> https://github.com/D-Programming-Language/phobos/pull/2573

Nice, thanks! -- Andrei
October 01, 2014
On Wednesday, 1 October 2014 at 09:52:46 UTC, Andrei Alexandrescu wrote:
> On 9/30/14, 12:10 PM, "Marc Schütz" <schuetzm@gmx.net>" wrote:
>> I would argue that GC is at its core _only_ a memory management
>> strategy. It just so happens that the one in D's runtime also comes with
>> an allocator, with which it is tightly integrated. In theory, a GC can
>> work with any (and multiple) allocators, and you could of course also
>> call GC.free() manually, because, as you say, management and allocation
>> are entirely distinct topics.
>
> I'm not very sure. A GC might need to interoperate closely with the allocator. -- Andrei

It needs to know what to scan (ideally with type info), and which allocator to release memory with, but it doesn't need to be an allocator itself. It certainly helps with the implementation, but ideally there would be a well defined interface between allocators and GCs, so that both can be plugged in as desired, even with multiple GCs in parallel.
October 01, 2014
On Wednesday, 1 October 2014 at 08:55:55 UTC, Andrei Alexandrescu wrote:
> On 9/30/14, 9:10 AM, Sean Kelly wrote:
>>
>> Is this for exposition purposes or actually how you expect it to work?
>
> That's pretty much what it would take. The key here is that RCString is almost a drop-in replacement for string, so the code using it is almost identical. There will be places where code needs to be replaced, e.g.
>
> auto s = "literal";
>
> would need to become
>
> S s = "literal";
>
> So creation of strings will change a bit, but overall there's not a lot of churn.

I'm confused.  Is this a general-purpose solution or just one that switches between string and RCString?
October 01, 2014
On Wednesday, 1 October 2014 at 08:55:55 UTC, Andrei Alexandrescu wrote:
> On 9/30/14, 9:10 AM, Sean Kelly wrote:
>
>> Quite honestly, I can't imagine how I could write a template function in D that needs to work with this approach.
>
> You mean write a function that accepts a memory management policy, or a function that uses one?

Both, I suppose?  A static if block at the top of each function that must be aware of every RC type the user may expect?  What if it's a user-defined RC type and this function is in Phobos?


>> As much as I hate to say it, this is pretty much exactly what C++
>> allocators were designed for.  They handle allocation, sure, but they
>> also hold aliases for all relevant types for the data being allocated.
>> If the MemoryManagementPolicy enum were replaced with an alias to a type that I could use to at least obtain relevant aliases, that would be something.  But even that approach dramatically complicates code that uses it.
>
> I think making MemoryManagementPolicy a meaningful type is a great idea. It would e.g. define the string type, so the code becomes:
>
> auto setExtension(alias MemoryManagementPolicy = gc, R1, R2)(R1 path, R2 ext)
> if (...)
> {
>     MemoryManagementPolicy.string result;
>     ...
>     return result;
> }
>
> This is a lot more general and extensible. Thanks!
>
> Why do you think there'd be dramatic complication of code? (Granted, at some point we must acknowledge that some egg breaking is necessary for the proverbial omelette.)

From my experience with C++ containers.  Having an alias for a type is okay, but bank of aliases where one is a pointer to the type, one is a const pointer to the type, etc, makes writing the involved code feel really unnatural.


> The thing is, again, we must make some changes if we want D to be usable without a GC. One of them is e.g. to not allocate built-in slices all over the place.

So let the user supply a scratch buffer that will hold the result?  With the RC approach we're still allocating, they just aren't built-in slices, correct?


> That would be overreacting :o).

I hope it is :-)
October 01, 2014
On Tuesday, 30 September 2014 at 19:10:19 UTC, Marc Schütz wrote:
> [...]
>
> I'm convinced this isn't necessary. Let's take `setExtension()` as an example, standing in for any of a class of similar functions. This function allocates memory, returns it, and abandons it; it gives up ownership of the memory. The fact that the memory has been freshly allocated means that it is (head) unique, and therefore the caller (= library user) can take over the ownership. This, in turn, means that the caller can decide how she wants to manage it.

Bingo. Have some way to mark the function return type as a unique pointer. This does not imply full-fledged unique pointer type support in the language - just enough to have the caller ensure continuity of memory management policy from there.

One problem with actually implementing this is that using reference counting as a memory management policy requires extra space for the reference counter in the object, just as garbage collection requires support for scanning and identification of interior object memory range. While allocation and memory management may be quite independent in theory, practical high performance implementations tend to be intimately related.

> (I'll try to make a sketch on how this can be implemented in another post.)

Do elaborate!

> As a conclusion, I would say that APIs should strive for the following principles, in this order:
>
> 1. Avoid allocation altogether, for example by laziness (ranges), or by accepting sinks.
>
> 2. If allocations are necessary (or desirable, to make the API more easily usable), try hard to return a unique value (this of course needs to be expressed in the return type).
>
> 3. If both of the above fails, only then return a GCed pointer, or alternatively provide several variants of the function (though this shouldn't be necessary often). An interesting alternative: Instead of passing a flag directly describing the policy, pass the function a type that it should wrap it's return value in.
>
> As for the _allocation_ strategy: It indeed needs to be configurable, but here, the same objections against a template parameter apply. As the allocator doesn't necessarily need to be part of the type, a (thread) global variable can be used to specify it. This lends itself well to idioms like
>
>     with(MyAllocator alloc) {
>         // ...
>     }

Assuming there is some dependency between the allocator and the memory management policy I guess this would be initialized on thread start that cannot be modified later. All code running inside the thread would need to either match the configured policy, not handle any kind of pointers or use a limited subset of unique pointers. Another way to ensure that code can run on either RC or GC is to make certain objects (specifically, Exceptions) always allocate a reference counter, regardless of the currently configured policy.