September 30, 2014
On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu wrote:
>
> The policy is a template parameter to functions in Phobos (and elsewhere), and informs the functions e.g. what types to return. Consider:
>
> auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2 ext)
> if (...)
> {
>     static if (mmp == gc) alias S = string;
>     else alias S = RCString;
>     S result;
>     ...
>     return result;
> }

Is this for exposition purposes or actually how you expect it to work?  Quite honestly, I can't imagine how I could write a template function in D that needs to work with this approach.

As much as I hate to say it, this is pretty much exactly what C++ allocators were designed for.  They handle allocation, sure, but they also hold aliases for all relevant types for the data being allocated.  If the MemoryManagementPolicy enum were replaced with an alias to a type that I could use to at least obtain relevant aliases, that would be something.  But even that approach dramatically complicates code that uses it.

Having written standards-compliant containers in C++, I honestly can't imagine the average user writing code that works this way.  Once you assert that the reference type may be a pointer or it may be some complex proxy to data stored elsewhere, a lot of composability pretty much flies right out the window.

For example, I have an implementation of C++ unordered_map/set/etc designed to be a customizable cache, so one of its template arguments is a policy type that allows eviction behavior to be chosen at declaration time.  Maybe the cache is size-limited, maybe it's age-limited, maybe it's a combination of the two or something even more complicated.  The problem is that the container defines all the aliases relating to the underlying data, but the policy, which needs to be aware of these, is passed as a template argument to this container.

To make something that's fully aware of C++ allocators then, I'd have to define a small type that takes the container template arguments (the contained type and the allocator type) and generates the aliases and pass this to the policy, which in turn passes the type through to the underlying container so it can declare its public aliases and whatever else is true standards-compliant fashion (or let the container derive this itself, but then you run into the potential for disagreement).  And while this is possible, doing so would complicate the creation of the cache policies to the point where it subverts their intent, which was to make it easy for the user to tune the behavior of the cache to their own particular needs by defining a simple type which implements a few functions.  Ultimately, I decided against this approach for the cache container and decided to restrict the allocators to those which defined a pointer to T as T* so the policies could be coded with basically no knowledge of the underlying storage.

So... while I support the goal you're aiming at, I want to see a much more comprehensive example of how this will work and how it will affect code written by D *users*.  Because it isn't enough for Phobos to be written this way.  Basically all D code will have to take this into account for the strategy to be truly viable.  Simply outlining one of the most basic functions in Phobos, which already looks like it will have a static conditional at the beginning and *need to be aware of the fact that an RCString type exists* makes me terrified of what a realistic example will look like.
September 30, 2014
Am Tue, 30 Sep 2014 05:23:29 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:

> On 9/30/14, 1:34 AM, Johannes Pfau wrote:
> > So you propose RC + global/thread local allocators as the solution for all memory related problems as 'memory management is not allocation'. And you claim that using output ranges / providing buffers / allocators is not an option because it only works in some special cases?
> 
> Correct. I assume you meant an irony/sarcasm somewhere :o).

The sarcasm is supposed to be here: '_all_ memory related problems' ;-)

I guess my point is that although RC is useful in some cases output ranges / sink delegates / pre-allocated buffers are still necessary in other cases and RC is not the solution for _everything_.

As Manu often pointed out sometimes you do not want any dynamic allocation (toStringz in games is a good example) and here RC doesn't help.

Another example is format which can already write to output ranges and uses sink delegates internally. That's a much better abstraction than simply returning a reference counted string (allocated with a thread local allocator). Using sink delegates internally is also more efficient than creating temporary RCStrings. And sometimes there's no allocation at all this way (directly writing to a socket/file).

> 
> > What if I don't want automated memory _management_? What if I want a function to use a stack buffer? Or if I want to free manually?
> >
> > If I want std.string.toStringz to put the result into a temporary stack buffer your solution doesn't help at all. Passing an ouput range, allocator or buffer would all solve this.
> 
> Correct. The output of toStringz would be either a GC string or an RC string.

But why not provide 3 overloads then?

toStringz(OutputRange)
string toStringz(Policy) //char*, actually
RCString toStringz(Policy)

The notion I got from some of your posts is that you're opposed to such overloads, or did I misinterpret that?
September 30, 2014
Am Tue, 30 Sep 2014 05:29:55 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:

> 
> > Another thought: if we use a template parameter, what's the story for virtual functions (e.g. Object.toString)? They can't be templated.
> 
> Good point. We need to think about that.
> 

Passing buffers or sink delegates (like we already do for toString) is
possible for some functions. For toString it works fine. Then implement
to!RCString(object) using the toString(sink delegate) overload.

For all other functions RC is indeed difficult, probably only possible with different manually written overloads (and a dummy parameter as we can't overload on return type)?


September 30, 2014
On Tuesday, 30 September 2014 at 16:49:48 UTC, Johannes Pfau
wrote:
>
> I guess my point is that although RC is useful in some cases output
> ranges / sink delegates / pre-allocated buffers are still necessary in
> other cases and RC is not the solution for _everything_.

Yes, I'm hoping this is an adjunct to changes in Phobos to reduce
the frequency of implicit allocation in general.  The less
garbage that's generated, the less GC vs. RC actually matters.
September 30, 2014
On Tue, Sep 30, 2014 at 04:10:43PM +0000, Sean Kelly via Digitalmars-d wrote:
> On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu wrote:
> >
> >The policy is a template parameter to functions in Phobos (and elsewhere), and informs the functions e.g. what types to return. Consider:
> >
> >auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2
> >ext)
> >if (...)
> >{
> >    static if (mmp == gc) alias S = string;
> >    else alias S = RCString;
> >    S result;
> >    ...
> >    return result;
> >}
> 
> Is this for exposition purposes or actually how you expect it to work? Quite honestly, I can't imagine how I could write a template function in D that needs to work with this approach.
> 
> As much as I hate to say it, this is pretty much exactly what C++ allocators were designed for.  They handle allocation, sure, but they also hold aliases for all relevant types for the data being allocated.
[...]
> So... while I support the goal you're aiming at, I want to see a much more comprehensive example of how this will work and how it will affect code written by D *users*.  Because it isn't enough for Phobos to be written this way.  Basically all D code will have to take this into account for the strategy to be truly viable.  Simply outlining one of the most basic functions in Phobos, which already looks like it will have a static conditional at the beginning and *need to be aware of the fact that an RCString type exists* makes me terrified of what a realistic example will look like.

Yeah, this echoes my concern. This looks not that much different, from a user's POV, from C++ containers' allocator template parameters. Yes I know we're not talking about *allocators* per se but about *memory management*, but I'm talking about the need to explicitly pass mmp to *every* *single* *function* if you desire anything but the default. How many people actually *use* the allocator parameter in STL? Certainly, many people do... but the code is anything but readable / maintainable.

Not only that, but every single function will have to handle this parameter somehow, and if static if's at the top of the function is what we're starting with, I fear seeing what we end up with.

Furthermore, in order for this to actually work, it has to be percolated throughout the entire codebase -- any D library that even remotely uses Phobos for anything will have to percolate this parameter throughout its API -- at least, any part of the API that might potentially use a Phobos function. Otherwise, you still have the situation where a given D library doesn't allow the user to select a memory management scheme, and internally calls Phobos functions with the default settings. So this still doesn't solve the problem that today, people who need to use @nogc can't use a lot of existing libraries because the library depends on the GC, even if it doesn't assume anything about the MM scheme, but just happens to call some obscure Phobos function with the default MM parameter. The only way this could work was if *every* D library author voluntarily rewrites a lot of code in order to percolate this MM parameter through to the API, on the off-chance that some obscure user somewhere might have need to use it. I don't see much likelihood of this actually happening.

Then there's the matter of functions like parseJSON() that needs to allocate nodes and return a tree (or whatever) of these nodes. Note that they need to *allocate*, not just know what kind of memory management model is to be used. So how do you propose to address this? Via another parameter (compile-time or otherwise) to specify which allocator to use? So how does the memory management parameter solve anything then? And how would such a thing be implemented? Using a 3-way static-if branch in every single point in parseJSON where it needs to allocate nodes? We could just as well write it in C++, if that's the case.

This proposal has many glaring holes that need to be fixed before it can be viable.


T

-- 
EMACS = Extremely Massive And Cumbersome System
September 30, 2014
29-Sep-2014 14:49, Andrei Alexandrescu пишет:
> auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2 ext)
> if (...)
> {
>      static if (mmp == gc) alias S = string;
>      else alias S = RCString;
>      S result;
>      ...
>      return result;
> }

Incredible code bloat? Boilerplate in each function for the win?
I'm at loss as to how it would make things better.


-- 
Dmitry Olshansky
September 30, 2014
Ok, here are my few cents:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu wrote:
> Back when I've first introduced RCString I hinted that we have a larger strategy in mind. Here it is.
>
> The basic tenet of the approach is to reckon and act on the fact that memory allocation (the subject of allocators) is an entirely distinct topic from memory management, and more generally resource management. This clarifies that it would be wrong to approach alternatives to GC in Phobos by means of allocators. GC is not only an approach to memory allocation, but also an approach to memory management. Reducing it to either one is a mistake. In hindsight this looks rather obvious but it has caused me and many people better than myself a lot of headache.

I would argue that GC is at its core _only_ a memory management strategy. It just so happens that the one in D's runtime also comes with an allocator, with which it is tightly integrated. In theory, a GC can work with any (and multiple) allocators, and you could of course also call GC.free() manually, because, as you say, management and allocation are entirely distinct topics.

>
> That said allocators are nice to have and use, and I will definitely follow up with std.allocator. However, std.allocator is not the key to a @nogc Phobos.

Agreed.

>
> Nor are ranges. There is an attitude that either output ranges, or input ranges in conjunction with lazy computation, would solve the issue of creating garbage. https://github.com/D-Programming-Language/phobos/pull/2423 is a good illustration of the latter approach: a range would be lazily created by chaining stuff together. A range-based approach would take us further than the allocators, but I see the following issues with it:
>
> (a) the whole approach doesn't stand scrutiny for non-linear outputs, e.g. outputting some sort of associative array or really any composite type quickly becomes tenuous either with an output range (eager) or with exposing an input range (lazy);
>
> (b) makes the style of programming without GC radically different, and much more cumbersome, than programming with GC; as a consequence, programmers who consider changing one approach to another, or implementing an algorithm neutral to it, are looking at a major rewrite;
>
> (c) would make D/@nogc a poor cousin of C++. This is quite out of character; technically, I have long gotten used to seeing most elaborate C++ code like poor emulation of simple D idioms. But C++ has spent years and decades taking to perfection an approach without a tracing garbage collector. A departure from that would need to be superior, and that doesn't seem to be the case with range-based approaches.

I agree with this, too.

>
> ===========
>
> Now that we clarified that these existing attempts are not going to work well, the question remains what does. For Phobos I'm thinking of defining and using three policies:
>
> enum MemoryManagementPolicy { gc, rc, mrc }
> immutable
>     gc = ResourceManagementPolicy.gc,
>     rc = ResourceManagementPolicy.rc,
>     mrc = ResourceManagementPolicy.mrc;
>
> The three policies are:
>
> (a) gc is the classic garbage-collected style of management;
>
> (b) rc is a reference-counted style still backed by the GC, i.e. the GC will still be able to pick up cycles and other kinds of leaks.
>
> (c) mrc is a reference-counted style backed by malloc.
>
> (It should be possible to collapse rc and mrc together and make the distinction dynamically, at runtime. I'm distinguishing them statically here for expository purposes.)
>
> The policy is a template parameter to functions in Phobos (and elsewhere), and informs the functions e.g. what types to return. Consider:
>
> auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2 ext)
> if (...)
> {
>     static if (mmp == gc) alias S = string;
>     else alias S = RCString;
>     S result;
>     ...
>     return result;
> }
>
> On the caller side:
>
> auto p1 = setExtension("hello", ".txt"); // fine, use gc
> auto p2 = setExtension!gc("hello", ".txt"); // same
> auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc
>
> So by default it's going to continue being business as usual, but certain functions will allow passing in a (defaulted) policy for memory management.

This, however, I disagree with strongly. For one thing - this has already been noted by others - it would make the functions' implementation extremely ugly (`static if` hell), it would make them harder to unit test, and from a user's point of view, it's very tedious and might interfere badly with UFCS.

But more importantly, IMO, it's the wrong thing to do. These functions shouldn't know anything about memory management policy at all. They allocate, which means they need to know about _allocation_ policy, but memory _management_ policy needs to be decided by the user.

Now, your suggestion in a way still leaves that decision to the user, but does so in a very intrusive way, by passing a template flag. This is clearly a violation of the separation of concerns. Contrary to the typical case, implementation details of the user's code leak into the library code, and not the other way round, but that's just as bad.

I'm convinced this isn't necessary. Let's take `setExtension()` as an example, standing in for any of a class of similar functions. This function allocates memory, returns it, and abandons it; it gives up ownership of the memory. The fact that the memory has been freshly allocated means that it is (head) unique, and therefore the caller (= library user) can take over the ownership. This, in turn, means that the caller can decide how she wants to manage it.

(I'll try to make a sketch on how this can be implemented in another post.)

As a conclusion, I would say that APIs should strive for the following principles, in this order:

1. Avoid allocation altogether, for example by laziness (ranges), or by accepting sinks.

2. If allocations are necessary (or desirable, to make the API more easily usable), try hard to return a unique value (this of course needs to be expressed in the return type).

3. If both of the above fails, only then return a GCed pointer, or alternatively provide several variants of the function (though this shouldn't be necessary often). An interesting alternative: Instead of passing a flag directly describing the policy, pass the function a type that it should wrap it's return value in.

As for the _allocation_ strategy: It indeed needs to be configurable, but here, the same objections against a template parameter apply. As the allocator doesn't necessarily need to be part of the type, a (thread) global variable can be used to specify it. This lends itself well to idioms like

    with(MyAllocator alloc) {
        // ...
    }

>
> Destroy!

Done :-)
September 30, 2014
On Tuesday, 30 September 2014 at 19:10:19 UTC, Marc Schütz wrote:
> I'm convinced this isn't necessary. Let's take `setExtension()` as an example, standing in for any of a class of similar functions. This function allocates memory, returns it, and abandons it; it gives up ownership of the memory. The fact that the memory has been freshly allocated means that it is (head) unique, and therefore the caller (= library user) can take over the ownership. This, in turn, means that the caller can decide how she wants to manage it.
>
> (I'll try to make a sketch on how this can be implemented in another post.)

Ok. What we need for it:

1) @unique, or a way to expressly specify uniqueness on a function's return type, as well as restrict function params by it (and preferably overloading on uniqueness). DMD already has this concept internally, it just needs to be formalized.

2) A few modifications to RefCounted to be constructable from unique values.

3) A wrapper type similar to std.typecons.Unique, that also supports moving. Let's called it Owned(T).

4) Borrowing.

setExtension() can then look like this:

    Owned!string setExtension(in char[] path, in char[] ext);

To be used:

    void saveFileAs(in char[] name) {
        import std.path: setExtension;
        import std.file: write;
        name.                    // scope const(char[])
            setExtension("txt"). // Owned!string
            write(data);
    }

The Owned(T) value implicitly converts to `scope!this(T)` via alias this; it can therefore be conveniently passed to std.file.write() (which already takes the filename as `in`) without copying or moving. The value then is released automatically at the end of the statement, because it is only a temporary and is not assigned to a variable.

For transferring ownership:

    RefCounted!string[] filenames;
    // ...
    filenames ~= name.setExtension("txt").release;

`Owned!T.release()` returns the payload as a unique value, and resets the payload to it's init value (in this case `null`). RefCounted's constructor then accepts this unique value and takes ownership of it. When the Owned value's destructor is called, it finds the payload to be null and doesn't free the memory. Inlining and subsequent optimization can turn the destructor into a no-op in this case.

Optionally, Owned!T can provide an `alias this` to its release method; in this case, the method doesn't need to be called explicitly. It is however debatable whether being explicit with moving isn't the better choice.
September 30, 2014
Am 30.09.2014 14:55, schrieb "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>":
> On Tuesday, 30 September 2014 at 12:51:25 UTC, Paulo  Pinto wrote:
>>
>> It works when two big ifs come together.
>>
>> - inside the same scope (e.g. function level)
>>
>> - when the referece is not shared between threads.
>>
>> While it is of limited applicability, Objective-C (and eventually
>> Swift) codebases prove it helps in most real life use cases.
>
> But Objective-C has thread safe ref-counting?!
>
> If it isn't thread safe it is of very limited utility, you can usually
> get away with unique_ptr in single threaded scenarios.

Did you read my second bullet?
September 30, 2014
On Tuesday, 30 September 2014 at 20:13:38 UTC, Paulo Pinto wrote:
> Am 30.09.2014 14:55, schrieb "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang@gmail.com>":
>> On Tuesday, 30 September 2014 at 12:51:25 UTC, Paulo  Pinto wrote:
>>>
>>> It works when two big ifs come together.
>>>
>>> - inside the same scope (e.g. function level)
>>>
>>> - when the referece is not shared between threads.
>>>
>>> While it is of limited applicability, Objective-C (and eventually
>>> Swift) codebases prove it helps in most real life use cases.
>>
>> But Objective-C has thread safe ref-counting?!
>>
>> If it isn't thread safe it is of very limited utility, you can usually
>> get away with unique_ptr in single threaded scenarios.
>
> Did you read my second bullet?

Yes? I dont want builtin rc default for single threaded use cases. I do want it when references are shared between threads, e.g. for cache objects.