January 07, 2019
On Mon, 07 Jan 2019 14:53:50 -0700, Jonathan M Davis wrote:
> Given that sort of situation, I don't see how we can have the
> GC accurately track whether objects are thread-local or shared. Casting
> is just too blunt an instrument and allows too much.

This is exactly the situation I brought up in the post you're replying to. I explained what the solution is in the post you just replied to. That was in fact the entire point of that post.

It requires a number of careful steps. It's not automatic. It's *mostly* automatic, and you can wrap the rest in a library. But it's expensive and it makes it easy to write incorrect code.

To review:

When you cast to shared, the thread-local GC pins the thing as shared. It also finds every bit of its memory that you can reach from that object, the same way as if it were doing a mark/sweep collection, and *that* is also pinned as shared.

When you are modifying an object that's had shared cast away, you need to tell the GC not to run, or you need to pin objects-that-will-become-shared temporarily (for instance, by keeping local references to them).

When you are done modifying an object that's had shared cast away, you need to cast it back to shared.

This is not a good solution, so D isn't getting a thread-local GC that eliminates stop-the-world unless someone else comes up with a cleverer solution. On the other hand, it doesn't have a cost for casting away shared as such; casting shared(T) to const(T) is totally free.
January 07, 2019
On Mon, Jan 07, 2019 at 10:02:02PM +0000, Guillaume Piolat via Digitalmars-d wrote:
> On Monday, 7 January 2019 at 18:18:05 UTC, H. S. Teoh wrote:
> > Yep. GC phobia is a common malady among C/C++ folks (I used to be
> > one of them).
> 
> Guilty as charge here!

I wonder if some of us ex-GC-phobists(?) should throw together a D blog entry with brief summaries / excerpts of how we got over our GC phobia and came to embrace the GC.  Could be a useful way to collect some common GC myths / arguments for GC in a place where we can point people to.


> What I like most about GC is the efficiency! With scanning and a global owner, you don't need to keep ownership information anywhere.

That's true!

And also, having a dedicated collection thread also means you get better cache coherency and economy of scale, as opposed to individually freeing small objects here and there and incurring many RAM roundtrips.


> Which leads to less copying (std::string leads malloc for every string copy, GC avoids that), slices being two machine words instead of 3, etc.

Yeah, std::string copy-on-assign and copy-on-substring do add a lot of overhead that people are often unaware of.  Well, some people *are* aware of it, but the "solution" many adopt is to avoid std::* and go back to the bad ole world of char* and buffer overrun heaven.  Both are pessimal.  D's slices r0x0rs.


T

-- 
Too many people have open minds but closed eyes.
January 07, 2019
On Mon, Jan 07, 2019 at 10:27:15PM +0000, Neia Neutuladh via Digitalmars-d wrote: [...]
> To review:
> 
> When you cast to shared, the thread-local GC pins the thing as shared. It also finds every bit of its memory that you can reach from that object, the same way as if it were doing a mark/sweep collection, and *that* is also pinned as shared.
> 
> When you are modifying an object that's had shared cast away, you need to tell the GC not to run, or you need to pin objects-that-will-become-shared temporarily (for instance, by keeping local references to them).

This step is too easy to get wrong.  Once you cast shared away, the type system can no longer help you identify it as shared (or "once was shared").  As far as the compiler is concerned, assigning a pointer to a thread-local object to a shared pointer that has been cast into a thread-local pointer, is identical to assigning one thread-local pointer to another.  So you'll run into the problem Steven described, without any warning whatsoever.


> When you are done modifying an object that's had shared cast away, you need to cast it back to shared.

This is also too easy to miss, because under current D, such a cast is redundant and not done in practice. It also looks weird:

	class C { ... }
	shared(C) myObj;

	mutex.acquire();
	C tmp = cast(C) myObj;
	tmp.doStuff();
	myObj = cast(shared(C)) tmp; // people will not think to do this
	mutex.release();


> This is not a good solution, so D isn't getting a thread-local GC that eliminates stop-the-world unless someone else comes up with a cleverer solution. On the other hand, it doesn't have a cost for casting away shared as such; casting shared(T) to const(T) is totally free.

Yeah, shared is the fly in the ointment that prevents us from having a thread-local GC.

Perhaps there can be some way of detecting whether the code casts away shared?  I mean if your code never actually casts away shared, e.g. by using messaging or whatever for thread communication, then you won't run into this problem, and you could safely use a thread-local GC without any ill-effects.  Though all it takes is for *one* cast to exist *somewhere* and your code becomes memory-unsafe.  So it cannot be enabled by default.


T

-- 
They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to Kill
January 08, 2019
On 06.01.19 04:02, Meta wrote:
> On Saturday, 5 January 2019 at 23:51:53 UTC, Nicholas Wilson wrote:
>> On Saturday, 5 January 2019 at 22:42:11 UTC, Meta wrote:
>>> On Saturday, 5 January 2019 at 22:05:19 UTC, Manu wrote:
>>>> Is progress possible, or is the hard reality that the language is just designed such to be resistant to a quality GC, while the ecosystem sadly tends to rely on it?
>>>>
>>>> Where's the ARC stuff? What happened to opAddRef/opRelease?
>>>
>>> As per Andrei's talk at the last Dconf, ref counting requires __mutable to play nicely with const and immutable.
>>
>> I'd rather have opHeadMutable than __mutable, does the same thing but doesn't subvert  the type system
> 
> I'm fairly dubious of adding __mutable as well, but I'm assuming the previous solution of using an AfixAllocator didn't pan out.

It can't work without `__mutable` because it breaks transitivity of immutability. The entire point of `__mutable` is to add a transitivity escape hatch for (@system-level data structure and runtime implementations) that doesn't block high-level optimizations based on immutability and purity.


> I don't know enough about opHeadMutable to consider whether it would address the same problems.

It's orthogonal. If you have an immutable object, it can't have a reference to a reference count without `__mutable`. opHeadMutable can't change that.
January 07, 2019
On Monday, January 7, 2019 3:27:15 PM MST Neia Neutuladh via Digitalmars-d wrote:
> On Mon, 07 Jan 2019 14:53:50 -0700, Jonathan M Davis wrote:
> > Given that sort of situation, I don't see how we can have the
> > GC accurately track whether objects are thread-local or shared. Casting
> > is just too blunt an instrument and allows too much.
>
> This is exactly the situation I brought up in the post you're replying to. I explained what the solution is in the post you just replied to. That was in fact the entire point of that post.

Well, then I clearly read over it way too quickly.

> It requires a number of careful steps. It's not automatic. It's *mostly* automatic, and you can wrap the rest in a library. But it's expensive and it makes it easy to write incorrect code.
>
> To review:
>
> When you cast to shared, the thread-local GC pins the thing as shared. It also finds every bit of its memory that you can reach from that object, the same way as if it were doing a mark/sweep collection, and *that* is also pinned as shared.
>
> When you are modifying an object that's had shared cast away, you need to tell the GC not to run, or you need to pin objects-that-will-become-shared temporarily (for instance, by keeping local references to them).
>
> When you are done modifying an object that's had shared cast away, you need to cast it back to shared.
>
> This is not a good solution, so D isn't getting a thread-local GC that eliminates stop-the-world unless someone else comes up with a cleverer solution. On the other hand, it doesn't have a cost for casting away shared as such; casting shared(T) to const(T) is totally free.

Yeah, such a solution wouldn't fly. Basically, you're talking about having to have a way to tell the GC that you're moving stuff between threads, and that would be so error prone that it's not even funny. It's already problematic enough to get code that deals with sharing data across threads right as it is.

If we could solve the forking problem on Windows so that we could actually have a cross-platform concurrent GC like the Linux one that Sociomantic has used, then that would likely give similar benefits (if not better) without having to muck with the type system. I forget exactly what the stop-the-world pause times were, but they were pretty low. If a thread really couldn't afford to be stopped at all, it would still need to be separate from the GC, just like now, but that would be true of a solution that involved thread-local heaps as well.

In any case, it's issues like these which definitely make it much harder to drastically improve D's GC. We have much worse constraints to work under than languages like Java, and we don't have the same kind of manpower trying to improve the situation. But at least D is set up in a way that works quite well with minimizing heap allocations - especially with the idioms that are typical in idiomatic D.

- Jonathan M Davis



January 08, 2019
On Mon, 2019-01-07 at 14:29 -0800, H. S. Teoh via Digitalmars-d wrote:
> […]
> 
> I wonder if some of us ex-GC-phobists(?) should throw together a D blog entry with brief summaries / excerpts of how we got over our GC phobia and came to embrace the GC.  Could be a useful way to collect some common GC myths / arguments for GC in a place where we can point people to.

Or create an article for Overload or CVu which can then be a blog post on the D website somewhere.



-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk



January 08, 2019
On Tuesday, 8 January 2019 at 04:34:11 UTC, Jonathan M Davis wrote:
> On Monday, January 7, 2019 3:27:15 PM MST Neia Neutuladh via Digitalmars-d wrote:
>> On Mon, 07 Jan 2019 14:53:50 -0700, Jonathan M Davis wrote:
>> > Given that sort of situation, I don't see how we can have the
>> > GC accurately track whether objects are thread-local or shared. Casting
>> > is just too blunt an instrument and allows too much.
>>
>> This is exactly the situation I brought up in the post you're replying to. I explained what the solution is in the post you just replied to. That was in fact the entire point of that post.
>
> Well, then I clearly read over it way too quickly.
>
>> It requires a number of careful steps. It's not automatic. It's *mostly* automatic, and you can wrap the rest in a library. But it's expensive and it makes it easy to write incorrect code.
>>
>> To review:
>>
>> When you cast to shared, the thread-local GC pins the thing as shared. It also finds every bit of its memory that you can reach from that object, the same way as if it were doing a mark/sweep collection, and *that* is also pinned as shared.
>>
>> When you are modifying an object that's had shared cast away, you need to tell the GC not to run, or you need to pin objects-that-will-become-shared temporarily (for instance, by keeping local references to them).
>>
>> When you are done modifying an object that's had shared cast away, you need to cast it back to shared.
>>
>> This is not a good solution, so D isn't getting a thread-local GC that eliminates stop-the-world unless someone else comes up with a cleverer solution. On the other hand, it doesn't have a cost for casting away shared as such; casting shared(T) to const(T) is totally free.
>
> Yeah, such a solution wouldn't fly. Basically, you're talking about having to have a way to tell the GC that you're moving stuff between threads, and that would be so error prone that it's not even funny. It's already problematic enough to get code that deals with sharing data across threads right as it is.
>
> If we could solve the forking problem on Windows so that we could actually have a cross-platform concurrent GC like the Linux one that Sociomantic has used, then that would likely give similar benefits (if not better) without having to muck with the type system. I forget exactly what the stop-the-world pause times were, but they were pretty low. If a thread really couldn't afford to be stopped at all, it would still need to be separate from the GC, just like now, but that would be true of a solution that involved thread-local heaps as well.
>
> In any case, it's issues like these which definitely make it much harder to drastically improve D's GC. We have much worse constraints to work under than languages like Java, and we don't have the same kind of manpower trying to improve the situation. But at least D is set up in a way that works quite well with minimizing heap allocations - especially with the idioms that are typical in idiomatic D.
>
> - Jonathan M Davis

Yet, all the GC enabled languages happened to beat Swift with its reference counting, on the implementation of an high performance network userspace driver.

"Safe and Secure Drivers in High-Level Languages
How to write PCIe drivers in Rust, go, C#, Swift, Haskell, and OCaml"

https://media.ccc.de/v/35c3-9670-safe_and_secure_drivers_in_high-level_languages

The professor that organized the research thesis didn't even considered D for the project, and he had plenty to choose from.

And MIT is doing research with writing POSIX kernels in Go, https://github.com/mit-pdos/biscuit

Meanwhile C# now has all the nice features from project Midori for low level programming, and is getting all the love from game developers fed up with C++, including some very well known ones.

The theme that D's GC cannot be improved won't take the language very far.
January 08, 2019
On Tuesday, 8 January 2019 at 08:44:35 UTC, Paulo Pinto wrote:
> The theme that D's GC cannot be improved won't take the language very far.

I share the same sentiment.
Every GC conversation ends up into the performance of write barriers, shared, comparison with other languages, or some other argument that is not moving the bar at all.

Is the current implementation of simple MS not improvable in any way?
January 08, 2019
On Tuesday, 8 January 2019 at 04:34:11 UTC, Jonathan M Davis wrote:
> If we could solve the forking problem on Windows so that we could actually have a cross-platform concurrent GC like the Linux one that Sociomantic has used, then that would likely give similar benefits (if not better) without having to muck with the type system. I forget exactly what the stop-the-world pause times were, but they were pretty low. If a thread really couldn't afford to be stopped at all, it would still need to be separate from the GC, just like now, but that would be true of a solution that involved thread-local heaps as well.
>

I don't want to talk about my work (mentored by Leandro of Sociomantic) until it is done, but the concurrent GC has been mentioned so many times in this thread that I can't refrain to show current results.

This are the pause times for Dustmite:
Not forking GC (Linux):
   Collection time  Stop-the-world time
0         0.000139             0.000137
1         0.000172             0.000155
2         0.001917             0.001033
3         0.013672             0.007031
4         0.028831             0.017149
5         0.073201             0.042553
6         0.175916             0.103788
7         0.271328             0.179002
8         0.607061             0.375101

Forking GC (Linux):
    Collection time  Stop-the-world time
0          0.000128             0.000125
1          0.000025             0.000125
2          0.000163             0.000155
3          0.003665             0.000155
4          0.000569             0.000564
5          0.023067             0.000564
6          0.001888             0.001882
7          0.062640             0.001882
8          0.004437             0.004429
9          0.118210             0.004429
10         0.009619             0.009613
11         0.251923             0.009613


As you can see the pause times are way lower.
This is true for many other small programs used for benchmarking.
Back in the old TangoRT times the concurrent GC even improved the program execution time but this is no longer true mainly because Martin Nowak implemented a similar strategy for the eager allocation of pools.

I was already contacted by Mike in order to report the status of the project for the SAOC so I hope we can disclose more in the upcoming weeks.

This post highlights a possible way to implement the forking behaviour on Windows as well, but I haven't touched a Windows box in years.
https://rainers.github.io/visuald/druntime/concurrentgc.html

I would like to see a thread local GC in the future as well but I still have to understand the weak points of `shared` and how it can be reshaped.

January 08, 2019
On Monday, 7 January 2019 at 02:00:05 UTC, H. S. Teoh wrote:
> Just out of curiosity, any concrete examples of difficulties that prevent easy elision of counter bumps?

Let's say you obtain ownership of every other object in an array, then you have to prove that  you also release every other object in that array before you return from the function.

If you always hold onto the ref-counted object in a named reference that is never changed, then it is reasonably easy to do. Then you only have to prove that the acquired object is released once while the reference is live.

So basically, whenever the recounted object is accessed through a graph-like structure then you need an advanced prover.