April 17, 2014
I'm just going to put my 2-cents into this discussion, it's my personal opinion that while _allocations_ should be removed from phobos wherever possible, replacing GC usage with manual calls to malloc/free has no place in the standard library, as it's quite simply a mess that is really not needed, and quite simply, one should be figuring out how to simply not allocate at all rather than trying do do manual management.

It is possible to implement a much better GC than what D currently has, and I intend to do exactly that when I have the time needed (in roughly a month). Firstly by making it heap precise, maybe even adding a stack precise mode (unlikely). Secondly by making it optionally use an allocation strategy similar to tcmalloc, which is able to avoid using a global lock for most allocations, as an interim measure until DMD gets full escape analysis, which, due to the nature of D, would be required before I could implement an effective compacting GC. Depending on if I can grasp the underlying theory behind it, I *may* also create an async collection mode, but it will be interesting to see how I am able to tie in the extensible scanning system (Andrei's allocators) into it. Lastly, I'll add support for stack allocated classes, however that will likely have to be disabled until DMD gets full escape analysis. As a final note, this will be the 3rd GC I've written, although it will be the most complex by far. The first was just heap precise, the second a generational compacting version of it.

On 4/17/14, Manu via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 17 April 2014 21:57, John Colvin via Digitalmars-d < digitalmars-d@puremagic.com> wrote:
>
>> On Thursday, 17 April 2014 at 11:31:52 UTC, Manu via Digitalmars-d wrote:
>>
>>> ARC offers a solution that is usable by all parties.
>>>
>>
>> Is this a proven statement?
>>
>> If that paper is right then ARC with cycle management is in fact
>> equivalent to Garbage Collection.
>> Do we have evidence to the contrary?
>>
>
> People who care would go to the effort of manually marking weak references.
> If you make a commitment to that in your software, you can eliminate the
> backing GC. Turn it off, or don't even link it.
> The backing GC is so that 'everyone else' would be unaffected by the shift.
> They'd likely see an advantage too, in that the GC would have a lot less
> work to do, since the ARC would clean up most of the memory (fall generally
> in the realm you refer to below).
>
>
> My very vague reasoning on the topic:
>>
>> Sophisticated GCs use various methods to avoid scanning the whole heap,
>> and by doing so they in fact implement something equivalent to ARC, even
>> if
>> it doesn't appear that way on the surface. In the other direction, ARC
>> ends
>> up implementing a GC to deal with cycles. I.e.
>>
>> Easy work (normal data): A clever GC effectively implements ARC. ARC does
>> what it says on the tin.
>>
>> Hard Work (i.e. cycles): Even a clever GC must be somewhat conservative*. ARC effectively implements a GC.
>>
>> *in the normal sense, not GC-jargon.
>>
>> Ergo they aren't really any different?
>
>
> Nobody has proposed a 'sophisticated' GC for D. As far as I can tell, it's
> considered impossible by the experts.
> It also doesn't address the fundamental issue with the nature of a GC,
> which is that it expects plenty of free memory. You can't use a GC in a
> low-memory environment, no matter how it's designed. It allocates until it
> can't, then spends a large amount of time re-capturing unreferenced memory.
> As free memory decreases, this becomes more and more frequent.
>
April 17, 2014
On 17 April 2014 22:28, Michel Fortin via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> On 2014-04-17 03:13:48 +0000, Manu via Digitalmars-d < digitalmars-d@puremagic.com> said:
>
>  Obviously, a critical part of ARC is the compilers ability to reduce
>> redundant inc/dec sequences. At which point your 'every time' assertion is
>> false. C++ can't do ARC, so it's not comparable.
>> With proper elimination, transferring ownership results in no cost, only
>> duplication/destruction, and those are moments where I've deliberately
>> committed to creation/destruction of an instance of something, at which
>> point I'm happy to pay for an inc/dec; creation/destruction are rarely
>> high-frequency operations.
>>
>
> You're right that transferring ownership does not cost with ARC. What costs you is return values and temporary local variables.
>

Why would they cost? If a function receives a reference, it will equally
release it on return. I don't see why a ref should be bumped to pass it to
a function?
Return values I can see, because return values are effectively copying
assignments. But if the assignment is to a local, then the close of scope
implies a dec, which would again cancel out.


While it's nice to have a compiler that'll elide redundant retain/release
> pairs, function boundaries can often makes this difficult. Take this first example:
>
>         Object globalObject;
>
>         Object getObject()
>         {
>                 return globalObject; // implicit: retain(globalObject)
>         }
>
>         void main()
>         {
>                 auto object = getObject();
>                 writeln(object);
>                 // implicit: release(object)
>         }
>
> It might not be obvious, but here the getObject function *has to* increment the reference count by one before returning. There's no other convention that'll work because another implementation of getObject might return a temporary object. Then, at the end of main, globalObject's reference counter is decremented. Only if getObject gets inlined can the compiler detect the increment/decrement cycle is unnecessary.
>

Well in most cases of accessors like this, it would inline properly. It's a fairly reliable rule that, if a function is not an inline candidate, it is probably also highly unlikely to appear in a hot loop.

I don't follow why it needs to retain before returning though. It would seem that it should retain upon assignment after returning (making it similar to the situation below). Nothing can interfere with the refcount before and after the function returns.


But wait! If writeln isn't pure (and surely it isn't), then it might change
> the value of globalObject (you never know what's in Object.toString, right?), which will in turn release object. So main *has to* increment the reference counter if it wants to make sure its local variable object is valid until the end of the writeln call. Can't elide here.
>
> Let's take this other example:
>
>         Object globalObject;
>         Object otherGlobalObject;
>
>         void main()
>         {
>                 auto object = globalObject; // implicit:
> retain(globalObject)
>                 foo(object);
>                 // implicit: release(object)
>         }
>
> Here you can elide the increment/decrement cycle *only if* foo is pure. If foo is not pure, then it might set another value to globalObject (you never know, right?), which will decrement the reference count and leave the "object" variable in main the sole owner of the object. Alternatively, if foo is not pure but instead gets inlined it might be provable that it does not touch globalObject, and elision might become a possibility.
>

Sure, there is potential that certain bits of code between the retain/release can break the ability to eliminate the pair, but that's why I think D has an advantage here over other languages, like Obj-C for instance. D has so much more richness in the type system which can assist here. I'm pretty confident that D would offer much better results than existing implementations.

I think ARC needs to be practical without eliding of redundant calls. It's
> a good optimization, but a difficult one unless everything is inlined. Many such elisions that would appear to be safe at first glance aren't provably safe for the compiler because of function calls.


I'm very familiar with this class of problem. I have spent much of my
career dealing with precisely this class of problem.
__restrict addresses the exact same problem with raw pointers in C, and
programmers understand the issue, and know how to work around it when it
appears in hot loops.

D has some significant advantages that other ARC languages don't have though. D's module system makes inlining much more reliable than C/C++ for instance, pure is an important part of D, and people do use it liberally.


April 17, 2014
On Thursday, 17 April 2014 at 12:20:06 UTC, Manu via Digitalmars-d wrote:
> See, I just don't find managed memory incompatible with 'low level' realtime or embedded code, even on tiny microcontrollers in principle.

RC isn't incompatible with realtime, since the overhead is O(1).

But it is slower than the alternatives where you want maximum performance. E.g. raytracing.

And it is slower and less more "safe" than GC for long running servers that have uneven loads. E.g. web services.

I think it would be useful to discuss real scenarios when discussing performance:

1. Web server request that can be handled instantly (no database lookup): small memory requirements and everything is released immediately.

Best strategy might be to use a release pool (allocate incrementally and free all upon return in one go).

2. Web server, cached content-objects: lots of cycles, shared across threads.

Best strategy is global GC.

3. Non-maskable interrupt: can cut into any running code at any time. No deallocation must happen and can only touch code that is consistent after atomic single instruction CPU operations.

Best strategy is preallocation and single instruction atomic communication.

> ARC would be fine in low level code, assuming the language supported it to
> the fullest of it's abilities.

Yes, but that requires whole program optimization, since function calls cross compilation unit boundaries frequently.

> No. It misses basically everything that compels the change. Strings, '~',
> closures. D largely depends on it's memory management.

And that is the problem. Strings can usually be owned objects.

What benefits most from GC are the big complex objects that have lots of links to other objects, so you get many circular references.

You usually have fewer of those.

If you somehow can limit GC to precise collection of those big objects, and forbid foreign references to those, then the collection cycle could complete quickly and you could use GC for soft real time. Which most code application code is.

I don't know how to do it, but global-GC-everything only works for batch programming or servers with downtime.

> Take this seriously. I want to see ARC absolutely killed dead rather than dismissed.

Why is that? I can see ARC in D3 with whole program optimization. I cannot see how D2 could be extended with ARC given all the other challenges.

Ola.

April 17, 2014
On Thu, 17 Apr 2014 14:08:29 +0100, Orvid King via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

> I'm just going to put my 2-cents into this discussion, it's my
> personal opinion that while _allocations_ should be removed from
> phobos wherever possible, replacing GC usage with manual calls to
> malloc/free has no place in the standard library, as it's quite simply
> a mess that is really not needed, and quite simply, one should be
> figuring out how to simply not allocate at all rather than trying do
> do manual management.

The standard library is a better place to put manual memory management than user space because it should be done by experts, peer reviewed and then would benefit everyone at no extra cost.

There are likely a number of smaller GC allocations which could be replaced by calls to alloca, simultaneously improving performance and avoiding GC interaction.

These calls could then be marked @nogc and used in the realtime sections of applications without fear of collections stopping the world.

Neither ARC nor a super amazing GC would be able to improve upon the efficiency of this sort of change.

Seems like win-win-win to me.

> It is possible to implement a much better GC than what D currently
> has, and I intend to do exactly that when I have the time needed (in
> roughly a month).

Excellent :)

R
April 17, 2014
On Wed, 16 Apr 2014 18:38:23 +0100, Walter Bright <newshound2@digitalmars.com> wrote:

> On 4/16/2014 8:01 AM, qznc wrote:
>> However, what is still an open issue is that @nogc can be stopped by allocations
>> in another thread. We need threads which are not affected by stop-the-world. As
>> far as I know, creating threads via pthreads C API directly achieves that, but
>> integration with @nogc could provide more type safety. Stuff for another DIP?
>
> That's a completely separate issue.

Yep.  I was thinking an attribute like @rt (realtime) would be super cool (but, perhaps impossible).  It would be a super-set of things like @nogc, and imply those things.  Adding @nogc does not prevent such a thing being done in the future.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
April 17, 2014
On 17 April 2014 23:17, via Digitalmars-d <digitalmars-d@puremagic.com>wrote:

> On Thursday, 17 April 2014 at 12:20:06 UTC, Manu via Digitalmars-d wrote:
>
>> See, I just don't find managed memory incompatible with 'low level' realtime or embedded code, even on tiny microcontrollers in principle.
>>
>
> RC isn't incompatible with realtime, since the overhead is O(1).
>
> But it is slower than the alternatives where you want maximum performance. E.g. raytracing.
>

You would never allocate in a ray tracing loop. If you need a temp, you
would use some pre-allocation strategy. This is a tiny, self-contained, and
highly specialised loop, that will always have a highly specialised
allocation strategy.
You also don't make library calls inside a raytrace loop.


And it is slower and less more "safe" than GC for long running servers that
> have uneven loads. E.g. web services.
>

Hey? I don't know what you mean.


I think it would be useful to discuss real scenarios when discussing
> performance:
>
> 1. Web server request that can be handled instantly (no database lookup): small memory requirements and everything is released immediately.
>
> Best strategy might be to use a release pool (allocate incrementally and free all upon return in one go).
>

Strings are the likely source of allocation. I don't think this suggests a preference from GC or ARC either way. A high-frequency webserver would use something more specialised in this case I imagine.

2. Web server, cached content-objects: lots of cycles, shared across
> threads.
>
> Best strategy is global GC.
>

You can't have web servers locking up for 10s-100s of ms at random
intervals... that's completely unacceptable.
Or if there is no realtime allocation, then management strategy is
irrelevant.

3. Non-maskable interrupt: can cut into any running code at any time. No
> deallocation must happen and can only touch code that is consistent after atomic single instruction CPU operations.
>
> Best strategy is preallocation and single instruction atomic communication.


Right, interrupts wouldn't go allocating from the master heap.


I don't think these scenarios are particularly relevant.

 ARC would be fine in low level code, assuming the language supported it to
>> the fullest of it's abilities.
>>
>
> Yes, but that requires whole program optimization, since function calls cross compilation unit boundaries frequently.


D doesn't usually have compilation unit boundaries. And even if you do, assuming the source is available, it can still inline if it wants to, since the source of imported modules is available while compiling a single unit. I don't think WPO is as critical as you say.

 No. It misses basically everything that compels the change. Strings, '~',
>> closures. D largely depends on it's memory management.
>>
>
> And that is the problem. Strings can usually be owned objects.
>

I find strings are often highly shared objects.

What benefits most from GC are the big complex objects that have lots of
> links to other objects, so you get many circular references.
>
> You usually have fewer of those.
>

These tend not to change much at runtime.
Transient/temporary allocations on the other hand are very unlikely to
contain circular references.

Also, I would mark weak references explicitly.


> Take this seriously. I want to see ARC absolutely killed dead rather than
>> dismissed.
>>
>
> Why is that? I can see ARC in D3 with whole program optimization. I cannot see how D2 could be extended with ARC given all the other challenges.


Well it's still not clear to me what all the challenges are... that's my point. If it's not possible, I want to know WHY.


April 17, 2014
I should probably have said heap allocation rather than just allocation, because the alloca calls are the ones that would have the real benefit, those realtime applications are the reason I hope to be able to implement an async collection mode. If I were able to implement even a moderately compacting GC, I would be able to use a bump-the-pointer allocation strategy, which would be significantly faster than manual calls to malloc/free.

On 4/17/14, Regan Heath via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Thu, 17 Apr 2014 14:08:29 +0100, Orvid King via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>
>> I'm just going to put my 2-cents into this discussion, it's my personal opinion that while _allocations_ should be removed from phobos wherever possible, replacing GC usage with manual calls to malloc/free has no place in the standard library, as it's quite simply a mess that is really not needed, and quite simply, one should be figuring out how to simply not allocate at all rather than trying do do manual management.
>
> The standard library is a better place to put manual memory management than user space because it should be done by experts, peer reviewed and then would benefit everyone at no extra cost.
>
> There are likely a number of smaller GC allocations which could be replaced by calls to alloca, simultaneously improving performance and avoiding GC interaction.
>
> These calls could then be marked @nogc and used in the realtime sections of applications without fear of collections stopping the world.
>
> Neither ARC nor a super amazing GC would be able to improve upon the efficiency of this sort of change.
>
> Seems like win-win-win to me.
>
>> It is possible to implement a much better GC than what D currently has, and I intend to do exactly that when I have the time needed (in roughly a month).
>
> Excellent :)
>
> R
>
April 17, 2014
On 04/17/2014 02:34 PM, Manu via Digitalmars-d wrote:
> On 17 April 2014 21:57, John Colvin via Digitalmars-d
> <digitalmars-d@puremagic.com <mailto:digitalmars-d@puremagic.com>> wrote:
>
>     On Thursday, 17 April 2014 at 11:31:52 UTC, Manu via Digitalmars-d
>     wrote:
>
>         ARC offers a solution that is usable by all parties.
> ...
> You can't use a GC in a
> low-memory environment, no matter how it's designed. It allocates until
> it can't, then spends a large amount of time re-capturing unreferenced
> memory. As free memory decreases, this becomes more and more frequent.

What John was trying to get at is that the two quoted statements above are in contradiction with each other. An GC is a subsystem that automatically frees dead memory. (Dead as in it will not be accessed again, which is a weaker notion than it being unreferenced.)

Maybe the distinction you want to make is between ARC and tracing garbage collectors.

April 17, 2014
On Thursday, 17 April 2014 at 13:43:17 UTC, Manu via Digitalmars-d wrote:
> You would never allocate in a ray tracing loop. If you need a temp, you
> would use some pre-allocation strategy.

Path-tracing is predictable, but regular ray tracing may spawn many rays per hit. So you pre-allocate a buffer, but might need to extend it.

The point was: RC-per-object is unacceptable.

>> And it is slower and less more "safe" than GC for long running servers that have uneven loads. E.g. web services.
>
> Hey? I don't know what you mean.

1. You can get memory leaks by not collecting cycles with RC.

2. You spend time RC accounting when you need speed and run idle when you could run GC collection. GC is faster than ARC.

>> Best strategy is global GC.
>
> You can't have web servers locking up for 10s-100s of ms at random intervals... that's completely unacceptable.

The kind of servers I write can live with occasional 100-200ms lockups. That is no worse than the time it takes to get a response from a database node with a transaction running on it.

For a game server that is too much so you would need to get down to under ~50ms, but then you also tend to run with a in-memory database that you cannot run full GC frequently on because of all the pointers in it.

> D doesn't usually have compilation unit boundaries.

It does if you need multiple entry/return paths. E.g. need to compile 2 different versions of a function depending on the context. You don't want a copy in each object file.

> I find strings are often highly shared objects.

Depends on how you use them. You can usually tie them to the object "they describe".

>> What benefits most from GC are the big complex objects that have lots of
>> links to other objects, so you get many circular references.
>>
>> You usually have fewer of those.
>>
>
> These tend not to change much at runtime.

On the contrary, content objects are the ones that do change both in terms of evolutionary programming which makes it easy to miss a cycle, and at runtime.

This is especially true for caching web-servers, I think.

> Also, I would mark weak references explicitly.

Which can be difficult to figure out and then you also have to deal with "unexpected null references" and the exceptions it might cause.

Weak references can be useful with GC too if the semantics are right for the situation, but it is a crutch if it is only used for killing cycles.

> Well it's still not clear to me what all the challenges are... that's my
> point. If it's not possible, I want to know WHY.

I think it is possible, but I also think shipping D2 as a maintained stable product should have first priority. ARC would probably set D back one year? I think that would be a bad thing.

Ola.
April 17, 2014
On Tuesday, 15 April 2014 at 17:01:38 UTC, Walter Bright wrote:
> http://wiki.dlang.org/DIP60
>
> Start on implementation:
>
> https://github.com/D-Programming-Language/dmd/pull/3455

OK, a bit late to the thread, seeing how it has already went to ARC off-topic domain :( An attempt to get back to the original point.

I was asking for @nogc earlier and I find proposed implementation too naive to be practically useful, to the point where I will likely be forced to ignore it in general.

=== Problem #1 ===

First problem is that, by an analogy with `pure`, there is no such thing as "weakly @nogc@". A common pattern for performance intensive code is to use output buffers of some sort:

void foo(OutputRange buffer)
{
    buffer.put(42);
}

`foo` can't be @nogc here if OutputRange uses GC as backing allocator. However I'd really like to use it to verify that no hidden allocations happen other than those explicitly coming from user-supplied arguments. In fact, if such "weakly @nogc" thing would have been available, it could be used to clean up Phobos reliably.

With current limitations @nogc is only useful to verify that embedded code which does not have GC at all does not use any GC-triggering language features before it comes to weird linker errors / rt-asserts. But that does not work good either because of next problem:

=== Problem #2 ===

The point where "I told ya" statement is extremely tempting :) bearophile has already pointed this out - for some of language features like array literals you can't be sure about possible usage of GC at compile-time as it depends on optimizations in backend. And making @nogc conservative in that regard and marking all literals as @nogc-prohibited will cripple the language beyond reason.

I can see only one fix for that - defining clear set of array literal use cases where optimizing GC away is guaranteed by spec and relying on it.