February 06, 2014
On 2/5/14, 4:57 PM, Adam D. Ruppe wrote:
> On Thursday, 6 February 2014 at 00:42:20 UTC, Andrei Alexandrescu wrote:
>> One other school of thought (to which I subscribe) is that one should
>> take advantage of reference counting where appropriate within a GC
>> milieu, regardless of more radical RC approaches that may be available.
>
> I agree with that stance, but I don't think there's a blanket rule
> there. I think RC freeing small slices will waste more time than it
> saves. Large allocations, on the other hand, might be worth it. So
> std.file.read for example returns a large block - that's a good
> candidate for refcounting since it might be accidentally subject to
> false pointers, or sit around too long creating needless memory
> pressure, etc.

That sounds reasonable. One possibility would be to define FreshSlice!T to mean this is a freshly-allocated slice; then it can be converted to a refcounted one or just a GC one.

> Anywho, I'd just go through on a case-by-case basis and tackle the big
> fish. Of course, a user could just do scope(exit) GC.free(ret); too.

That won't work because user code can't always know whether something received from the library had been freshly allocated or not.

>> Binding ref is also a related topic. All of these are complex matters,
>> and I think a few simple sketches don't do them justice.
>
> I'd rather discuss these details than adding RCSlice and toGC everywhere
> for more cost than benefit.

Have at it, of course. This is not constant sum, just don't hijack this discussion. I should warn you however we've been discussing this literally for years; your examples are just scratching the surface.


Andrei

February 06, 2014
On Wednesday, 5 February 2014 at 21:03:25 UTC, Brad Anderson wrote:
> On Wednesday, 5 February 2014 at 20:18:33 UTC, Adam D. Ruppe wrote:
>> [...]
>> A major source of little allocations in my code is std.conv and std.string. But these aren't difficult to change to external allocation, in theory at least:
>>
>> string s = to!string(50); // GC allocates (I'd keep this for convenience and compatibility)
>>
>> char[16] buffer;
>> char[] s = toBuffer(buffer[], 50); // same thing, using a buffer
>>
>> char[] s = toLowerBuffer(buffer[], "FOO");
>> assert(buffer.ptr is s);
>> assert(s == "foo");
>>
>>
>> That's not hard to use (though remembering that s is a borrowed reference to a stack buffer might be - escape analysis is something we should really have).
>>
>> And it gives full control over both allocation and deallocation. It'd take some changes in phobos, but so does the RCSlice sooo yeah, and this actually decouples it from the GC.
>
> Yeah, because RCSlice would require changes to Phobos too I'd much rather have this approach because it is just so much more flexible and hardly adds any inconvenience.
>
> Combined with the upcoming allocators it would be incredibly powerful. You could have an output range that uses an allocator which stores on the stack unless it grows too big (and the stack size could be completely customizable by the user who knows best). Or you could pass in an output range that reference counts its memory. Or an output range that must remain unique and frees its contents when it goes out of scope.
>
> I think three things would work together really well for addressing users that want to avoid the GC while making use of Phobos. 1) Increasing the support for output ranges, 2) Andrei's slick allocator design, and 3) @nogc. With those three I really think managing memory and avoiding the GC will be rather pleasant. @nogc would enable people trying to avoid all the tough to spot implicit GC allocations to identify them easily. Once uncovered, they just switch to the output range version of a function in Phobos and they then use std.allocator with the output range they feed in to create an ideal allocation strategy for their use case (whether it stack, GC, scope freed heap, reference counted, a memory pool, or some hybrid of those).
> [...]

My thinking as well. That combination of functionality looks very advantageous to me. It is more flexible than just providing two choices to the programmer: GC and RC. To me both GC and RC are useful, depending on the type of program being written. However, why limit to just the two? There are other styles of memory allocation/management I might need to make use of, perhaps even in the same program.

I really like the new allocator module. I had been thinking that a goal for its use was to allow replacing the compiler-supported allocation style with a custom one, either at the module level or on a function-by-function basis, as shown in the code above. In my opinion, this would give the necessary flexibility over memory allocation by giving final control to the programmer (i.e. control over external and internal allocation style). Doing so seems good to me as the programmer knows a priori the type of allocation pattern to support based on the type of program being produced (e.g. real-time, long-running process, batch system). Of course minimizing memory allocation in Phobos is an excellent goal and that work will proceed orthogonal to this effort. However, in the end, some memory will have to be allocated. Letting the programmer choose how that memory is to be allocated by giving full access to std.allocator seems the way to go.

Joseph
February 06, 2014
On Thursday, 6 February 2014 at 02:31:19 UTC, Ola Fosheim Grøstad wrote:
> On Thursday, 6 February 2014 at 00:42:20 UTC, Andrei Alexandrescu
> wrote:
>> One school of thought seems to be that D should be everything it is today, just with reference counting throughout instead of garbage collection. One build flag to rule them all would choose one or the other.
>>
>> One other school of thought (to which I subscribe) is that one should take advantage of reference counting where appropriate within a GC milieu, regardless of more radical RC approaches that may be available.
>
> The third school of thought is that one should be able to have
> different types of allocation schemes without changing the object
> type, but somehow tie it to the allocator and if needed to the
> pointer type/storage class.
>
> If you allocate as fully owned, it stays owned. If you allocate as shared with immediate release (RC) it stays shared. If you allocate as shared with delayed collection (GC) it stays that way.
>
> The RC/GC metadata is a hidden feature and
> allocator/runtime/compiler dependent component. Possibly you
> won't have GC or RC, but one pure GC runtime, one integrated
> RC/GC runtime, one pure ARC runtime, one integrated ARC/GC
> runtime etc. That's probably most realistic since the allocation
> metadata might be in conflict.
>
> You should be able to switch to the runtime you care about if
> needed as a compile time switch:
>
> 1. Pure Owned/borrowed: hard core performance, OS level
> development
>
> 2. Manual RC (+GC): high throughput, low latency
>
> 3. ARC (+GC): ease of use, low throughput, low latency
>
> 4. GC: ease of use, high throughput, higher latency, long lived
>
> 5. Realtime GC
>
> 6. ??
>
> I see no reason for having objects treated differently if they are "owned", just because they have a different type of ownership. If the function dealing with it does not own it, but borrows it, then it should not matter. The object should have the same layout, the ownership/allocation metadata should be encapsulated and hidden.
>
> It is only when you transfer ownership that you need to know if the object is under RC or not.  You might not  even want to use counters in a particular implementation, maybe it is better to use a linked list in some scenarios. "reference counting" is a misnomer, it should be called "ownership tracker".
>
> The current default is that all pointers are shared. What D needs is defined semantics for ownership. Then you can start switching one runtime for another one and have the compiler/runtime act as an efficient unit.

That won't play ball with third party libraries distributed in binary form.

This is one of the reasons why Apple's Objective-C GC failed.

--
Paulo
February 06, 2014
On 5.2.2014. 0:51, Andrei Alexandrescu wrote:
> Consider we add a library slice type called RCSlice!T. It would have the same primitives as T[] but would use reference counting through and through. When the last reference count is gone, the buffer underlying the slice is freed. The underlying allocator will be the GC allocator.
> 
> Now, what if someone doesn't care about the whole RC thing and aims at convenience? There would be a method .toGC that just detaches the slice and disables the reference counter (e.g. by setting it to uint.max/2 or whatever).
> 
> Then people who want reference counting say
> 
> auto x = fun();
> 
> and those who don't care say:
> 
> auto x = fun().toGC();
> 
> 
> Destroy.
> 
> Andrei

Here is a thought:

Let's say we have class A and class B, and class A accepts references to B as children:

class A {
  B child1;
  B child2;
  B child3;
}

I think that the ultimate goal is to allow the user to choose between kinds of memory management, especially between automatic and manual. The problem here is that class A needs to be aware whether memory management is manual or automatic. And it seems to me that a new type qualifier is a way to go:

class A {
  garbageCollected(B) child1;
  referenceCounted(B) child2;
  manualMemory(B) child3;
}

Now suppose we want to have only one child but we want to support compatibility with other kinds of memory management:

class A {
  manualMemory(B) child;

  this (B newChild) {
    child = newChild.toManualMemory();
  }

  this (referenceCounted(B) newChild) {
    child = newChild.toManualMemory();
  }

  this (manualMemory(B) newChild) {
    child = newChild;
  }

  ~this () {
    delete child;
  }

}

This way we could write code that supports multiple models, and let the user choose which one to use. The this that I would like to point out is that this suggestion would work with existing code as garbageCollected memory management model would be a default:

auto b = new B();
auto a = new A(b);

Another thing to note is that in this way a garbage collector would know that we now have two references to one object (instance of class B). One is variable b and another is child in object a. And because of the notation garbage collector is aware that if could free this object when variable b goes out of scope but it should not do it because there is a still a manually managed reference to that object.

I am sure that there are many more possible loopholes but maybe it will give someone a better idea :)

February 06, 2014
On Thursday, 6 February 2014 at 08:06:54 UTC, Paulo Pinto wrote:
> That won't play ball with third party libraries distributed in binary form.

That is not obvious, you specify the runtime. Anyway, whole program analysis also does not play well with binary libraries without detailed semantic metadata.

Does shared_ptr in C++11 work with binary libraries that use it, if the it is compiled with a compiler from another vendor?
February 06, 2014
On 2/6/14, 12:28 AM, luka8088 wrote:
> On 5.2.2014. 0:51, Andrei Alexandrescu wrote:
>> Consider we add a library slice type called RCSlice!T. It would have the
>> same primitives as T[] but would use reference counting through and
>> through. When the last reference count is gone, the buffer underlying
>> the slice is freed. The underlying allocator will be the GC allocator.
>>
>> Now, what if someone doesn't care about the whole RC thing and aims at
>> convenience? There would be a method .toGC that just detaches the slice
>> and disables the reference counter (e.g. by setting it to uint.max/2 or
>> whatever).
>>
>> Then people who want reference counting say
>>
>> auto x = fun();
>>
>> and those who don't care say:
>>
>> auto x = fun().toGC();
>>
>>
>> Destroy.
>>
>> Andrei
>
> Here is a thought:
>
> Let's say we have class A and class B, and class A accepts references to
> B as children:
>
> class A {
>    B child1;
>    B child2;
>    B child3;
> }
>
> I think that the ultimate goal is to allow the user to choose between
> kinds of memory management, especially between automatic and manual. The
> problem here is that class A needs to be aware whether memory management
> is manual or automatic. And it seems to me that a new type qualifier is
> a way to go:
>
> class A {
>    garbageCollected(B) child1;
>    referenceCounted(B) child2;
>    manualMemory(B) child3;
> }

There common theme here is that the original post introduces two distinct types of slices, depending on how they are to be freed (by refcounting or tracing).

Andrei


February 06, 2014
On Thursday, 6 February 2014 at 08:28:34 UTC, luka8088 wrote:
> is manual or automatic. And it seems to me that a new type qualifier is
> a way to go:
>
> class A {
>   garbageCollected(B) child1;
>   referenceCounted(B) child2;
>   manualMemory(B) child3;
> }


class A {
@shared @delayedrelease @nodestructor @cycles B child1;
@shared @immediaterelease @nocycles B child2;
@owned @nocycles B child3;
}

Based on the required qualities, static analysis and profiling the compiler choose the most efficient storage that meet the constraints and match it up to the available runtime.


February 06, 2014
On Thursday, 6 February 2014 at 08:38:57 UTC, Ola Fosheim Grøstad wrote:
> On Thursday, 6 February 2014 at 08:06:54 UTC, Paulo Pinto wrote:
>> That won't play ball with third party libraries distributed in binary form.
>
> That is not obvious, you specify the runtime. Anyway, whole program analysis also does not play well with binary libraries without detailed semantic metadata.

So what do you do when different libraries require different runtimes?

To be more specific to my previous comment. Objective-C GC required special compilation flags and care needed to be taken in GC enabled code, like in C GCs.

This did not played well when mixing code that used the GC enabled runtime, with code that did not.

Thus the endless causes of core dumps in Objective-C code that made use of the GC.

The Apple decision to create ARC and dump the GC wasn't because it is better as they later sold it, but because the compiler inserts for the developer the usual [... retain]/[... release] calls that they were already writing since the NeXT days.

So no distinct runtimes were required as the generated code is no different than an Objective-C developer would have written by hand.

This was the best way to achieve some form of automatic memory management, while preserving compatibility across libraries delivered in binary form.

>
> Does shared_ptr in C++11 work with binary libraries that use it, if the it is compiled with a compiler from another vendor?

As far as I am aware, no.

In any case there isn't a standard C++ ABI defined. Well, there are a few, but vendors don't use them.
February 06, 2014
On Thursday, 6 February 2014 at 09:27:19 UTC, Paulo Pinto wrote:
> So what do you do when different libraries require different runtimes?

I guess a given compiler could have a "cross compiler option" that generates libraries for all the available runtimes the compiler supports?

> To be more specific to my previous comment. Objective-C GC required special compilation flags and care needed to be taken in GC enabled code, like in C GCs.

I understand. I am not sure if having multiple flags that creates a combinatorial explosion would be a good idea. I think you should have a set of invidual runtimes targeting typical scenarios, supporting different sets of functionality. (embedded, kernel, multimedia, server, batch, hpc…)

However, I think it would only work for the same compiler, because you really don't want to prevent innovation…

> So no distinct runtimes were required as the generated code is no different than an Objective-C developer would have written by hand.

You might be able to design the runtime/object code in such a way that you get link errors.

> In any case there isn't a standard C++ ABI defined. Well, there are a few, but vendors don't use them.

Yeah, well I am not personally bothered by it. The only time I consider using binary-only libraries is for graphics and OS level stuff that is heavily used by others so that it is both well tested and workaround is available on the net. (OpenGL, Direct-X, Cocoa etc)

Not having the source code to a library is rather risky in terms of having to work around bugs by trail and error, without even knowing what the bug actually is.

Thankfully, most useful libraries are open source.
February 06, 2014
Am 06.02.2014 11:21, schrieb "Ola Fosheim Grøstad"
> Not having the source code to a library is rather risky in terms
> of having to work around bugs by trail and error, without even
> knowing what the bug actually is.

so you're not work in the normal software development business where
non-source code third party dependicies are fully normal :)