January 07, 2019
On Mon, 07 Jan 2019 10:21:58 -0500, Steven Schveighoffer wrote:
> On 1/5/19 6:12 PM, Neia Neutuladh wrote:
>> On Sat, 05 Jan 2019 14:05:19 -0800, Manu wrote:
>>> Is progress possible, or is the hard reality that the language is just designed such to be resistant to a quality GC, while the ecosystem sadly tends to rely on it?
>> 
>> Three things about D make it harder to make a good GC for it: * unaligned pointers
> 
> Unaligned pointers are not scanned, and so this is not a problem. As far as the GC is concerned, they aren't pointers.

In that case, it's already solved! And in pretty much the same way as malloc'd memory.

(I was going to check but was overcome with a bout of laziness.)

>> * unions
> 
> Unions and void * (or essentially untyped blocks) are the biggest problems. But depending on what we want to implement, we may have solutions for that.
> 
> It's difficult to paint with broad brushes here. There are certain problems that prevent certain solutions. But, for instance, a precise GC is getting traction as we speak: https://github.com/dlang/druntime/pull/2418

That's a more precise GC instead of a fully precise GC. Unions and void[] mean you can't have a fully precise GC.

> But precise GC is not necessarily better performing. And it's still "stop-the-world".

Right. Precise means you collect more data more often, and it means you use up cache lines for pointer maps. It might also fragment your heap more (each allocation needs a pointer map, or maybe you have different pools for sufficiently common things like all pointers or no pointers).

> I think the biggest improvement for D's GC could be the thread-local pools (and avoid stop-the-world for those), but since casting shared and local data around is valid, I don't know how it can be implemented without language changes.

Like I said before, casting to shared would have to invoke a runtime function. That's a runtime change and compiler change, but I don't think the spec says anything about whether any cast might invoke a runtime function.
January 07, 2019
On Monday, 7 January 2019 at 18:18:05 UTC, H. S. Teoh wrote:
> but there's the tradeoff of spending easily 3x to 4x more effort in writing correct pointer code under manual MM

Most code is slow because programmer didn't had time to optimize. For normal code GC wont add more than 15% overhead but with some optimization you can easily get over 2x improvement in your program.
This could be excellent topic for D blog.
January 07, 2019
On Mon, Jan 07, 2019 at 06:59:37PM +0000, Neia Neutuladh via Digitalmars-d wrote:
> On Mon, 07 Jan 2019 10:21:58 -0500, Steven Schveighoffer wrote:
[...]
> > I think the biggest improvement for D's GC could be the thread-local pools (and avoid stop-the-world for those), but since casting shared and local data around is valid, I don't know how it can be implemented without language changes.
> 
> Like I said before, casting to shared would have to invoke a runtime function. That's a runtime change and compiler change, but I don't think the spec says anything about whether any cast might invoke a runtime function.

Casting between arrays of different types does already call a druntime function in some cases (e.g., casting int[] to S[] where S is some struct).  So I'd say it's fair game.


T

-- 
Only boring people get bored. -- JM
January 07, 2019
On Sun, 2019-01-06 at 22:52 -0800, H. S. Teoh via Digitalmars-d wrote:
> On Sun, Jan 06, 2019 at 07:34:41PM -0800, Walter Bright via Digitalmars-d wrote:
> > On 1/6/2019 12:28 PM, Reimer Behrends wrote:
> > > In general, a modern precise stop-the-world collector should be able to handle 1GB worth of pointers in something like .2 seconds on modern hardware,
> > 
> > I use Thunderbird mail, latest version, and it has random pauses that are up to 10 seconds. It'll happen when I'm in the middle of typing, which is frustrating.
> > 
> > I'm not positive it is a GC pause, maybe it's something else, but the randomness of it suggests it is GC.
> 
> I've experienced a similar effect in Firefox, and though I cannot say for sure it isn't a GC problem, I notice that it causes a long spike of intensive I/O, and appears to be correlated with occasional segfaults and signs of memory corruption / memory leak, and generally happens only after significant usage over a prolonged timeframe, generally ending up in a state of extreme memory usage (several GBs in resident set size for just a small number of persistent tabs) that reset to more reasonable levels upon restarting and restoring exactly the same tabs.
> 

Some plugins to JetBrains CLion, IntelliJ IDEA, GoLand, and PyCharm (the ones I use) can be prone to quite lengthy stoppages – sadly the D plugin 1.18 suffers this. Many people blame the Java GC, usually from prejudice rather than actual data. The Rust plugin had such a problem some weeks back, but it turned out not to be a GC problem but an exception and logging problem.

Whilst GCs can, indeed do, have problems, some people are too quick to blame the GC when actually the problems are elsewhere in most of the situations.


-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk



January 07, 2019
On Sun, 2019-01-06 at 01:22 -0800, Walter Bright via Digitalmars-d wrote:
> On 1/5/2019 11:48 PM, Manu wrote:
> > Hmnm, that's pretty disappointing... as a C++ user who has been doing
> > C++ ref counting for years, I would say that, while maybe it is
> > *kinda* successful, it's not at all satisfying, and it's one major
> > reason I've been pushing it in D for the last 8-9 years.
> > Ref counting in C++ is a sloppy hack. I really just want the compiler
> > to emit opInc and opDec appropriately on either sides of a reference
> > assignment, and let me take it from there.
> > Also, with escape analysis, we can elide such calls very effectively,
> > and we can't do that using the C++ strategy.
> > 
> > I think it's disappointing to embrace C++'s hacky 'solution', we could do better.
> 
> So far, we've failed at every attempt at it.

This is not an excuse to stop trying!

In this, Java and Go are excellent examples: someone comes up with the best that can be done, and then someone finds a new and usually better way of doing things.

I don't have any specific ideas, so I can't put my activity where my mouth is. Sorry.

-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk



January 07, 2019
On Mon, Jan 07, 2019 at 06:59:10PM +0000, Russel Winder via Digitalmars-d wrote:
> On Sun, 2019-01-06 at 13:28 -0800, Walter Bright via Digitalmars-d wrote:
> > 
> […]
> > 1. Java has a very constrained interface to C, with a lot of rules to follow, mainly so the GC is not corrupted. D, being a systems programming language, simply does not have that luxury.
> 
> I understand that the Java language and its heap are very different from D, so the context and constraints are different. I am assuming Go is the same, though a priori I would think less different to D than Java. My concern (if it can be called that) is that the JVM folk and the Go folk spend a lot of time finding new and/or improved GC algorithms, whereas in D there appears to be no movement from an old and simple GC.

IOW it's more of a matter of perception than an actual technical
issue then?  And just FYI, there *have* been gradual improvements to D's
GC over the years, even if they are not ground-breaking ones.

In any case, as Walter said, Java / Go / similar languages impose quite a lot of restrictions on what the programmer can do, which gives the language implementors many more assumptions they can work with.

D pretty much gives the programmer free rein, except perhaps under @safe, but even then, the restrictions are pretty minimal and nowhere near the managed languages.  This leaves the language implementors very little room for maneuvre, meaning that any prospective GC algorithm for D is constrained by being unable to make the same assumptions that a Java or Go GC can take for granted -- in fact, unable to make many assumptions at all.  Naturally, this also rules out a lot of algorithms that could otherwise be applicable, and makes finding a significantly improved algorithm a very challenging task.  (And that's assuming said significantly improved algorithm exists at all, under D's operating conditions.)


> > 2. Let me know when Java lets go of write barriers!
> 
> G1 GC certainly avoids write barriers whenever it can.

The stickler is in "whenever it can".  Currently, Walter is operating under the premise of "no write barriers *at all*".  Whether or not this is a tenable stance as far as GC improvement is concerned remains to be seen, but being able to fallback to write barriers occasionally represents a completely different ballpark to having *zero* write barriers, *anywhere*.  It may sound like a subtle difference, but it makes a world of difference as far as GC algorithms are concerned.


> I do not know the details, but there are conference papers out there looking at avoiding write barriers at all costs in the newer GC algorithms.
[...]

If there are any GC algorithms that require no write barriers at all and still offer the same benefits as a modern generational GC, I'm sure we'll be jumping all over it in no time at all!  Assuming that it doesn't make other assumptions incompatible with D, that is.


T

-- 
The best compiler is between your ears. -- Michael Abrash
January 07, 2019
On 2019-01-06 04:42, Walter Bright wrote:

>  > Where's the ARC stuff? What happened to opAddRef/opRelease?
> 
> Andrei, Razvan, and I have decided that the most pragmatic way forward to support reference counting is to support copy constructors ala C++. C++ is clearly successful with this approach, and Andrei/Razvan concluded that D's postblit just doesn't cut it. Razvan has a proposal for copy constructors, and an implementation PR that has fallen a bit behind his latest proposal.

Razvan has confirmed that copy constructors are only for structs [1]. The ARC (opAddRef/opRelease) proposal was for classes as well. So copy constructors are not a substitute as defined now.

[1] https://forum.dlang.org/post/hqqszotxfwauczupjmez@forum.dlang.org

-- 
/Jacob Carlborg
January 07, 2019
On 2019-01-07 04:34, Walter Bright wrote:

> I use Thunderbird mail, latest version, and it has random pauses that are up to 10 seconds. It'll happen when I'm in the middle of typing, which is frustrating.
> 
> I'm not positive it is a GC pause, maybe it's something else, but the randomness of it suggests it is GC.

For me it only happens when downloading new emails. Which suggests they're not using a separate thread for that. But that would be very poorly implemented.

-- 
/Jacob Carlborg
January 07, 2019
On 1/7/19 1:59 PM, Neia Neutuladh wrote:
> On Mon, 07 Jan 2019 10:21:58 -0500, Steven Schveighoffer wrote:
>> On 1/5/19 6:12 PM, Neia Neutuladh wrote:
>>> On Sat, 05 Jan 2019 14:05:19 -0800, Manu wrote:
>>>> Is progress possible, or is the hard reality that the language is just
>>>> designed such to be resistant to a quality GC, while the ecosystem
>>>> sadly tends to rely on it?
>>>
>>> Three things about D make it harder to make a good GC for it:
>>> * unaligned pointers
>>
>> Unaligned pointers are not scanned, and so this is not a problem. As far
>> as the GC is concerned, they aren't pointers.
> 
> In that case, it's already solved! And in pretty much the same way as
> malloc'd memory.
> 
> (I was going to check but was overcome with a bout of laziness.)
> 
>>> * unions
>>
>> Unions and void * (or essentially untyped blocks) are the biggest
>> problems. But depending on what we want to implement, we may have
>> solutions for that.
>>
>> It's difficult to paint with broad brushes here. There are certain
>> problems that prevent certain solutions. But, for instance, a precise GC
>> is getting traction as we speak:
>> https://github.com/dlang/druntime/pull/2418
> 
> That's a more precise GC instead of a fully precise GC. Unions and void[]
> mean you can't have a fully precise GC.

Yes, well said. It's more precise, and a fully precise GC is not possible due to how D works.

> 
>> But precise GC is not necessarily better performing. And it's still
>> "stop-the-world".
> 
> Right. Precise means you collect more data more often, and it means you
> use up cache lines for pointer maps. It might also fragment your heap more
> (each allocation needs a pointer map, or maybe you have different pools
> for sufficiently common things like all pointers or no pointers).

The pointer maps are static, and so they are stored with typeinfo I believe. We already store a typeinfo pointer for everything but low-level allocations and basic types (for destruction).

There may be tweakable cases where it's OK to be imprecise to achieve better speed, or have a "small pointer map optimization".

> 
>> I think the biggest improvement for D's GC could be the thread-local
>> pools (and avoid stop-the-world for those), but since casting shared and
>> local data around is valid, I don't know how it can be implemented
>> without language changes.
> 
> Like I said before, casting to shared would have to invoke a runtime
> function. That's a runtime change and compiler change, but I don't think
> the spec says anything about whether any cast might invoke a runtime
> function.
> 

This is doable. Casting to/from shared should be a rare thing, and probably fine to hook.

However, the problem is more (for me) that the memory would have to be copied, as the memory is currently in one pool and has to be moved to another pool.

It's better probably to just allocate it shared to begin with, or copy it if you need to "cast" to/from shared.

Some notion of tail-modified type constructors would be really useful here.

-Steve
January 07, 2019
On Mon, Jan 07, 2019 at 07:18:02PM +0000, Russel Winder via Digitalmars-d wrote:
> On Sun, 2019-01-06 at 01:22 -0800, Walter Bright via Digitalmars-d wrote:
> > On 1/5/2019 11:48 PM, Manu wrote:
> > > Hmnm, that's pretty disappointing... as a C++ user who has been doing C++ ref counting for years, I would say that, while maybe it is *kinda* successful, it's not at all satisfying, and it's one major reason I've been pushing it in D for the last 8-9 years. Ref counting in C++ is a sloppy hack. I really just want the compiler to emit opInc and opDec appropriately on either sides of a reference assignment, and let me take it from there.  Also, with escape analysis, we can elide such calls very effectively, and we can't do that using the C++ strategy.
> > > 
> > > I think it's disappointing to embrace C++'s hacky 'solution', we could do better.
> > 
> > So far, we've failed at every attempt at it.
> 
> This is not an excuse to stop trying!
[...]

I think you may have misunderstood.  Read the "study" mailing lists that Walter referred to.  You'll see some concrete examples of tricky situations that may not be immediately obvious from a naive conception of ARC.  It's not that we've stopped trying, but that we've run out of good ideas and nobody has stepped up offering better ones.

After reading through said mailing lists, I think I better understand now why Walter is pushing for dip25 and dip1000.  They tackle essential issues related to proper escape analysis, which is a requirement for RC counter bump elisions that won't cause UB or other nasty problems.  In themselves they won't give you RC just yet, but they pave the way and lay some of the essential foundations that an RC implementation will require.


T

-- 
Never step over a puddle, always step around it. Chances are that whatever made it is still dripping.