May 11, 2014
On Sunday, 11 May 2014 at 18:18:41 UTC, Rainer Schuetze wrote:
>
>
> On 11.05.2014 10:22, Benjamin Thaut wrote:
>> Am 10.05.2014 19:54, schrieb Andrei Alexandrescu:
>>>
>>>> The next sentence goes on to list the advantages of RC (issues we have
>>>> wrestled with, like destructors), and then goes on to say the recent
>>>> awesome RC is within 10% of "the fastest tracing collectors".
>>>> Are you suggesting that D's GC is among 'the fastest tracing
>>>> collectors'? Is such a GC possible in D?
>>>
>>> I believe it is.
>>>
>>
>> While it might be possible to implement a good GC in D it would require
>> major changes in the language and its librariers. In my opinion it would
>> be way more work to implement a propper GC than to implement ARC.
>>
>> Every state of the art GC requires percise knowdelge of _all_ pointers.
>> And thats exactly what we currently don't have in D.
>
> I think most garbage collectors can work with a number of false pointers. The referenced memory area has to be treated as pinned and cannot be moved. Limiting the false pointers to stack and registers seems like a compromise, though most of the stack could even be annotated. Code for this does already exist in the debug info generation, though I suspect stack tracing could be unreliable.
>
> Here's my current stance on the GC discussions:
>
> I agree that the current GC is pretty lame, even if it were precise. "Stop-the-World" with complete tracing does not work for any interactive application that uses more than a few hundred MB of garbage collected memory (with or without soft-realtime requirements). Other applications with larger allocation requirements are easily dominated by collection time. Proposing to use manual memory management instead is admitting failure to me.
>
> For a reasonable GC I currently see 2 possible directions:
>Coventry
> 1. Use a scheme that takes a snapshot of the heap, stack and registers at the moment of collection and do the actual collection in another thread/process while the application can continue to run. This is the way Leandro Lucarellas concurrent GC works (http://dconf.org/2013/talks/lucarella.html), but it relies on "fork" that doesn't exist on every OS/architecture. A manual copy of the memory won't scale to very large memory, though it might be compressed to possible pointers. Worst case it will need twice as much memory as the current heap.

a) Can't we do our own specialised alternative, instead of doing something as heavyweight as fork?

b) Surely that worst case is a very unlikely case, given a precise, concurrent garbage collector.

c) Does this actually help in the 99% cpu, 99% memory, 16ms per frame setting that Manu talks about?

> It would be very interesting how far we can push this model on the supported platforms.
>
> 2. Change the compiler to emit (library defined) write barriers for modifications of (possible) pointers. This will allow to experiment with more sophisticated GC algorithms (if the write barrier is written in D, we might also need pointers without barriers to implement it). I know Walter is against this, and I am also not sure if this adds acceptable overhead, but we don't have proof of the opposite, too.


Adding optional write-barriers would be great. It would allow us to do real experiments and stop the discussion being so hypothetical. It doesn't do any harm to add the capability, surely?


> As we all know, the usual eager reference counting with atomic operations is not memory-safe, so my current favorite is "concurrent buffered reference counting" (see chapter 18.2/3 "The garbage collection handbook" by Richard Jones et al): reference count modifications are not performed directly by the write barrier, but it just logs the operation into a thread local buffer. This is then processed by a collector thread which also detects cycles (only on candidates which had their reference count decreased during the last cycle). Except for very large reference chains this scales with the number of executed pointer modifications and not with the heap size.
May 11, 2014
On 5/11/2014 5:52 AM, Michel Fortin wrote:
> On 2014-05-11 08:29:13 +0000, Walter Bright <newshound2@digitalmars.com> said:
>
>> Again, O-C and C++/CX ARC are not memory safe because in order to make it
>> perform they provide unsafe escapes from it.
>
> But D could provide memory-safe escapes. If we keep the current GC to collect
> cycles, we could also allow raw pointers managed by the GC alongside ARC.
>
> Let's say we have two kinds of pointers: rc+gc pointers (the default) and
> gc_only pointers (on demand). When assigning from a rc+gc pointer to a gc_only
> pointer, the compiler emits code that disables destruction via the reference
> counting. This makes the GC solely responsible for destructing and deallocating
> that memory block. You can still assign the pointer to a rc+gc pointer later on,
> but the reference count is no longer reliable which is why RC-based destruction
> has been disabled.

Yes, you can make it memory safe by introducing another pointer type, as Rust does. But see my comments about this scheme in the message you replied to.

(Yes, I understand your proposal is a variation on that scheme.)

May 11, 2014
On 5/11/2014 7:55 AM, w0rp wrote:
> Python people don't tend to talk about speed that much.

:-)

May 11, 2014
On 5/11/2014 8:54 AM, Timon Gehr wrote:
> It is not an escape from ARC per se. It is a way to write type safe code which
> is not dependent on the allocation strategy of the processed data. (One can e.g.
> safely borrow mutable data as immutable and the type system ensures that during
> the time of the borrow, the data does not mutate.)

That's clearly an additional benefit of the borrowed pointer notion. But have you examined generated Rust code for the cost of inc/dec? I haven't, but I don't see any way they could avoid this (very expensive) cost without borrowed pointers.


> The type system tracks lifetimes across function and data structure boundaries.
> The main restriction is that such a pointer cannot be escaped. (But borrowed
> pointers may still be stored in data structures.)

The idea must be that the borrowed pointer lifetime cannot exceed the lifetime of the lender.

The thing is, if the compiler is capable of figuring out these lifetimes by examining the code, then there would be no need for the programmer to specify a pointer as borrowed - the compiler could infer it. Hence, there have got to be significant restrictions to the point where the compiler could figure out the rest.


>> 2. Introducing an annotation to distinguish a borrowed pointer from an
>> ARC pointer. If you don't use the annotation, you get pervasive ARC with
>> all the poor performance that entails.
>> ...
> No, this is completely inaccurate: Both choices are explicit,

Yes, one is & and the other is @. This does not change my point. Back in the bad old DOS days, there was near* and *. The choices were explicit, but the results were there anyway - people overwhelmingly chose the more general *, performance be damned.


> and reference
> counting is just one of the possible memory management schemes. (Hence borrowed
> pointers are used unless there is a reason not to.)

Yes, I can see it supports other schemes, but I predict that RC will dominate, and other schemes will be about as pervasive as using Boehm's GC with C++.


>> Implicit in borrowed pointers is Rust did not solve the problem of
>> having the compiler eliminate unnecessary inc/dec.
> Avoiding inc/dec is not what justifies borrowed pointers.

I find that very hard to believe. It implies that there is little cost to inc/dec, or the compiler is clever enough to eliminate the bulk of the inc/dec pairs.


>> My experience with pointer annotations to improve performance is pretty
>> compelling - almost nobody adds those annotations. They get their code
>> to work with the default, and never get around to annotating it.
> The 'default' way to pass by reference is by borrowed pointer.

Time will tell how well having the most restrictive pointer type be the default works out.


>> They do not mix. A function taking one type of
>> pointer cannot be called with the other type.
> A function taking a borrowed pointer can be called with a reference counted
> pointer. Abstracting over allocation strategy is the point of borrowing.

Right, and a function in DOS taking a * would accept a near*. But the other way did not work, and so people wrote their functions using *, performance was thrown under the bus.


>> Are these valid concerns with Rust?
> I don't think they are.

Perhaps you're right. But my experience with DOS programming is programmers preferred convenience and reusable functions, and hence used plain * everywhere. And like I said, this provided a huge competitive advantage for me.

The great boon with 32 bit code was the elimination of special pointer types, now it was easy to write fast, reusable functions, and the effort involved in creating efficient data structures dropped by probably half.

It seems clear that the decisions of borrow/managed are going to pervade Rust code. How well is that going to work with complex programs and data structures? How well with average programmers? How well is it going to work when some leaf has to change from borrowed to managed, and the ripple effects of that change?

These are all unknowns, not proven successful technology. My experience with a similar system (16 bit C) does not make it look promising, as far as performance goes.
May 11, 2014

On 11.05.2014 21:16, John Colvin wrote:
> On Sunday, 11 May 2014 at 18:18:41 UTC, Rainer Schuetze wrote:
>>
>> For a reasonable GC I currently see 2 possible directions:
>> Coventry
>> 1. Use a scheme that takes a snapshot of the heap, stack and registers
>> at the moment of collection and do the actual collection in another
>> thread/process while the application can continue to run. This is the
>> way Leandro Lucarellas concurrent GC works
>> (http://dconf.org/2013/talks/lucarella.html), but it relies on "fork"
>> that doesn't exist on every OS/architecture. A manual copy of the
>> memory won't scale to very large memory, though it might be compressed
>> to possible pointers. Worst case it will need twice as much memory as
>> the current heap.
>
> a) Can't we do our own specialised alternative, instead of doing
> something as heavyweight as fork?

If we don't want to make a copy with memcpy, we have to rely on the OS paging. I don't know linux too well, but on Windows the idea would be to use VirtualProtect with PAGE_WRITECOPY. This could allow control on page size granularity over what would be mapped to another process.

>
> b) Surely that worst case is a very unlikely case, given a precise,
> concurrent garbage collector.
>

Preciseness has little impact because copy-on-write protection works on page level. You will also pay for the page fault if you only write non-pointer data into the page.

It would help to sort memory chunks without pointers (they have attribute NO_SCAN) into separate pages that don't need page protection.

> c) Does this actually help in the 99% cpu, 99% memory, 16ms per frame
> setting that Manu talks about?

You'll need more memory, but given enough idle-time for a lower priority collector thread, a higher priority thread could run almost undisturbed. IIRC this meant about 40ms, as all threads need to be stopped for the time the fork is running.

>
>> It would be very interesting how far we can push this model on the
>> supported platforms.
>>
>> 2. Change the compiler to emit (library defined) write barriers for
>> modifications of (possible) pointers. This will allow to experiment
>> with more sophisticated GC algorithms (if the write barrier is written
>> in D, we might also need pointers without barriers to implement it). I
>> know Walter is against this, and I am also not sure if this adds
>> acceptable overhead, but we don't have proof of the opposite, too.
>
>
> Adding optional write-barriers would be great. It would allow us to do
> real experiments and stop the discussion being so hypothetical. It
> doesn't do any harm to add the capability, surely?

Sure, but it is also not easy ;-) I guess we would need some inspiration from existing implementation in other languages how to do it efficiently.
May 11, 2014
On 5/11/2014 2:48 AM, Benjamin Thaut wrote:
> Mostly percise doesn't help. Its either fully percise or beeing stuck with a
> impercise mark & sweep.

This is not correct. It helps because most of the false pointers will be in the heap, and the heap will be accurately scanned, nearly eliminating false references to garbage.


> Also D's GC doesn't use RTInfo currently.

Yes, but that is not an issue with the language design.


> Additionally Rainer Schuetzes implementation showed that using RTInfo with the current GC
> makes the collection actually slower at the price of beeing 'mostly' percise.

As I recall, his implementation showed dramatic improvements in memory consumption.

(All GC schemes are tradeoffs.)


>> The Boehm collector cannot move objects around, the D one can.
> Oh it can? Really?

Yes. D, for example, requires that objects not be self-referential for this reason.


> I would love to see how well interfacing with any C library
> works, if the D garbage collector actually does that.

The current D GC does not do that, but it can. It will work fine with C libraries as long as a pointer to the data is retained on the stack, as it naturally will be by passing it as a stack parameter to a C function. Roots found on the stack would automatically pin the object.

I know this works because I implemented it for a Java VM back in the 90's. Someone wrote a paper about a similar scheme calling it a "mostly copying collector".


> Also I'm talking about what the D garbage collector currently actually does, and
> not what it could do.

I think it would be more useful to talk about what it could do without language changes.


> If we actually would implement arc, it would most likely
> take less time then a full blown proper gc and we would end up with better
> performance than we currently have. Beating the 300% slowdown which the current
> GC imposes is not that hard.
>
> If we however keep arguing what a GC could do, we will be stuck with the
> impercise mark & sweep forever. (or at least the next 5 years)

I believe the performance claims of pervasive ARC are unjustifiably optimistic and based on unproven technology that has never been implemented anywhere.

May 11, 2014
On 5/11/2014 11:18 AM, Rainer Schuetze wrote:
> 1. Use a scheme that takes a snapshot of the heap, stack and registers at the
> moment of collection and do the actual collection in another thread/process
> while the application can continue to run. This is the way Leandro Lucarellas
> concurrent GC works (http://dconf.org/2013/talks/lucarella.html), but it relies
> on "fork" that doesn't exist on every OS/architecture. A manual copy of the
> memory won't scale to very large memory, though it might be compressed to
> possible pointers. Worst case it will need twice as much memory as the current
> heap.
>
> It would be very interesting how far we can push this model on the supported
> platforms.

This is worth doing even if it is only possible on a subset of platforms. D should not be constrained to the worst of all the platforms.

My recollection is that Leandro ran into some serious reentrancy problems in the Linux API. Was this ever resolved?

May 11, 2014
On 2014-05-11 19:37:30 +0000, Walter Bright <newshound2@digitalmars.com> said:

> On 5/11/2014 5:52 AM, Michel Fortin wrote:
>> On 2014-05-11 08:29:13 +0000, Walter Bright <newshound2@digitalmars.com> said:
>> 
>>> Again, O-C and C++/CX ARC are not memory safe because in order to make it
>>> perform they provide unsafe escapes from it.
>> 
>> But D could provide memory-safe escapes. If we keep the current GC to collect
>> cycles, we could also allow raw pointers managed by the GC alongside ARC.
>> 
>> Let's say we have two kinds of pointers: rc+gc pointers (the default) and
>> gc_only pointers (on demand). When assigning from a rc+gc pointer to a gc_only
>> pointer, the compiler emits code that disables destruction via the reference
>> counting. This makes the GC solely responsible for destructing and deallocating
>> that memory block. You can still assign the pointer to a rc+gc pointer later on,
>> but the reference count is no longer reliable which is why RC-based destruction
>> has been disabled.
> 
> Yes, you can make it memory safe by introducing another pointer type, as Rust does. But see my comments about this scheme in the message you replied to.
> 
> (Yes, I understand your proposal is a variation on that scheme.)

It is a variation on that scheme, but with one significant difference: those two pointer types can mix and there's no restriction on assignments from one type to the other. There's therefore no transitive effect and no complicated ownership rule to understand.

This obviously does not address all your concerns with ARC (which I'll admit most are valid), but this "ARC isn't memory-safe" argument has to stop. It does not make sense. One doesn't need to sacrifice memory safety to use ARC, neither is that sacrifice necessary for having islands of non-ARC code. That's what I was trying to point out in my previous post.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

May 11, 2014

On 11.05.2014 22:33, Walter Bright wrote:
>
>>> The Boehm collector cannot move objects around, the D one can.
>> Oh it can? Really?
>
> Yes. D, for example, requires that objects not be self-referential for
> this reason.

I don't think the GC would have problems with fixing up internal pointers to the object itself. self-referential is prohibited to allow moving structures by memcpy, e.g. as return value.
May 11, 2014
On 05/11/2014 10:05 PM, Walter Bright wrote:
> On 5/11/2014 8:54 AM, Timon Gehr wrote:
>> It is not an escape from ARC per se. It is a way to write type safe
>> code which
>> is not dependent on the allocation strategy of the processed data.
>> (One can e.g.
>> safely borrow mutable data as immutable and the type system ensures
>> that during
>> the time of the borrow, the data does not mutate.)
>
> That's clearly an additional benefit of the borrowed pointer notion. But
> have you examined generated Rust code for the cost of inc/dec? I
> haven't, but I don't see any way they could avoid this (very expensive)
> cost without borrowed pointers.
> ...

Sure, but performance is the additional benefit.

>
>> The type system tracks lifetimes across function and data structure
>> boundaries.
>> The main restriction is that such a pointer cannot be escaped. (But
>> borrowed
>> pointers may still be stored in data structures.)
>
> The idea must be that the borrowed pointer lifetime cannot exceed the
> lifetime of the lender.
>
> The thing is, if the compiler is capable of figuring out these lifetimes
> by examining the code,

There are explicit lifetime annotations in function signatures.

> then there would be no need for the programmer to
> specify a pointer as borrowed - the compiler could infer it.

Not modularly. (Yes, global lifetime analysis exists, but this is not what they are doing.)

> Hence, there have got to be significant restrictions to the point where the
> compiler could figure out the rest.
> ...

It is simply not true that type systems are inherently restricted to checking trivial properties. They can be made as strong as mathematical logic without much fuss.

>
>>> 2. Introducing an annotation to distinguish a borrowed pointer from an
>>> ARC pointer. If you don't use the annotation, you get pervasive ARC with
>>> all the poor performance that entails.
>>> ...
>> No, this is completely inaccurate: Both choices are explicit,
>
> Yes, one is & and the other is @.

No, actually currently one is & and the other is RC<T> AFAIK.

> This does not change my point.

In my opinion it actually does, to the point of rendering it invalid.

> Back in
> the bad old DOS days, there was near* and *. The choices were explicit,

Explicit among near* and *. I.e. between adding 'near' and leaving it out. Also, if you add 'near' it will not be compatible with pointers that do not have it.

> but the results were there anyway - people overwhelmingly chose the more
> general *, performance be damned.
> ...

RC<T> is not more general. It cannot refer to stack-allocated data, for instance.

>
>> and reference
>> counting is just one of the possible memory management schemes. (Hence
>> borrowed
>> pointers are used unless there is a reason not to.)
>
> Yes, I can see it supports other schemes, but I predict that RC will
> dominate, and other schemes will be about as pervasive as using Boehm's
> GC with C++.
> ...

The point is, borrowing does not depend on the allocation scheme, as long as it is safe.

>
>>> Implicit in borrowed pointers is Rust did not solve the problem of
>>> having the compiler eliminate unnecessary inc/dec.
>> Avoiding inc/dec is not what justifies borrowed pointers.
>
> I find that very hard to believe. It implies that there is little cost
> to inc/dec, or the compiler is clever enough to eliminate the bulk of
> the inc/dec pairs.
> ...

Sure, borrowing is very lightweight, but ultimately what is most important is that it solves the problem of multiple incompatible pointer types and makes the type system more expressive as well.

>
>>> My experience with pointer annotations to improve performance is pretty
>>> compelling - almost nobody adds those annotations. They get their code
>>> to work with the default, and never get around to annotating it.
>> The 'default' way to pass by reference is by borrowed pointer.
>
> Time will tell how well having the most restrictive pointer type be the
> default works out.
> ...

A function that uses none of the specific pointer capabilities is more general, so what other choice of 'default' makes sense?

>
>>> They do not mix. A function taking one type of
>>> pointer cannot be called with the other type.
>> A function taking a borrowed pointer can be called with a reference
>> counted
>> pointer. Abstracting over allocation strategy is the point of borrowing.
>
> Right, and a function in DOS taking a * would accept a near*. But the
> other way did not work, and so people wrote their functions using *,
> performance was thrown under the bus.
>
>
>>> Are these valid concerns with Rust?
>> I don't think they are.
>
> Perhaps you're right. But my experience with DOS programming is
> programmers preferred convenience and reusable functions, and hence used
> plain * everywhere. And like I said, this provided a huge competitive
> advantage for me.
> ...

Convenience and reusable functions means using borrowed pointers whenever possible.

> The great boon with 32 bit code was the elimination of special pointer
> types, now it was easy to write fast, reusable functions, and the effort
> involved in creating efficient data structures dropped by probably half.
>
> It seems clear that the decisions of borrow/managed are going to pervade
> Rust code.

But they are often obvious.

> How well is that going to work with complex programs and data
> structures? How well with average programmers? How well is it going to
> work when some leaf has to change from borrowed to managed, and the
> ripple effects of that change?
>
> These are all unknowns, not proven successful technology.

They are using Rust to write a safe and performant web browser while developing the language.

> My experience with a similar system (16 bit C) does not make it look
> promising, as far as performance goes.

Borrowed pointers are not even superficially similar to near*. They are compatible with everything else, because they can store data that was borrowed from anywhere else.