Thread overview
Re: What's the go with the GC these days?
Jan 06, 2019
Neia Neutuladh
Jan 06, 2019
H. S. Teoh
Jan 06, 2019
Neia Neutuladh
Jan 07, 2019
Walter Bright
Jan 06, 2019
Walter Bright
January 06, 2019
On Sat, 05 Jan 2019 20:34:30 -0800, H. S. Teoh wrote:
> On Sat, Jan 05, 2019 at 11:12:52PM +0000, Neia Neutuladh via Digitalmars-d wrote:
>> On Sat, 05 Jan 2019 14:05:19 -0800, Manu wrote:
>> > I'm somewhere between a light GC user and a @nogc user, and I don't really know much about where we're at, or much about start-of-the-art GC in general.
>> 
>> I use the GC unabashedly and only try to make sure I reuse memory when it's reasonably convenient. I've also looked into GC a bit.
> 
> I also use the GC freely, and only bother with GC optimization when my profiler shows that there's an actual problem.
> 
> As I've said numerous times before, unless you're working on extremely time-sensitive code like real-time applications or 3D game engines, D's GC usually does not cause much noticeable difference. Unless you do something extreme like allocate tens of millions of small strings (or other small objects) per second, or allocate huge objects rapidly and expect unreferenced memory to be quickly reused.

String allocation is one area where D makes it absurdly easy to reuse existing memory while other languages force you to go through hoops.

One of the reasons I know as much as I do about Unicode is that I ported some code from D to C# and was annoyed by the C# code taking 50 times as long and allocating 17000 times as much memory. The main culprits were UTF-16 and copy-on-substring. To resolve that, I made my own string type. (And used Mono's ahead-of-time compilation and did a couple other things.)

>> (You'd have a stop-the-world phase that happened infrequently. Casting something to shared would pin it in the thread it was allocated from.)
> 
> How does this solve the problem of shared, though?  The last time I checked, casting to/from shared is the main showstopper for a thread-local GC.

The issue with a thread-local GC and casting to shared is that the GC for thread A doesn't know about references held by thread B. That means thread A might collect and reuse that shared object.

If we had a runtime call in the cast to shared, thread A's GC can record that that object is possibly referenced from another thread. It won't collect it until a stop-the-world phase says it's not referenced.

This would give teeth to some of the undefined behavior we currently have.

> Now that I think of it, we could deal with pointers in unions the same way -- if the compiler detects it, then trigger conservative mode in the GC.

Right. In the past when people have looked at this problem, the idea was to treat anything that could possibly be a pointer as a pointer.

> With these two out of the way, a generational GC for D seems closer to the realm of possibility.

I'm not sure how much sense it makes to have a generational GC without write barriers.
January 05, 2019
On Sun, Jan 06, 2019 at 05:11:19AM +0000, Neia Neutuladh via Digitalmars-d wrote:
> On Sat, 05 Jan 2019 20:34:30 -0800, H. S. Teoh wrote:
[...]
> One of the reasons I know as much as I do about Unicode is that I ported some code from D to C# and was annoyed by the C# code taking 50 times as long and allocating 17000 times as much memory. The main culprits were UTF-16 and copy-on-substring. To resolve that, I made my own string type.  (And used Mono's ahead-of-time compilation and did a couple other things.)

Yeah, Walter's right on the money about copy-on-substring being a big performance hit in C/C++, and apparently also C# (caveat: I don't know anything about C#).  D's slices really does eliminate a lot of the background cost that most people don't normally think about.


> >> (You'd have a stop-the-world phase that happened infrequently. Casting something to shared would pin it in the thread it was allocated from.)
> > 
> > How does this solve the problem of shared, though?  The last time I checked, casting to/from shared is the main showstopper for a thread-local GC.
> 
> The issue with a thread-local GC and casting to shared is that the GC for thread A doesn't know about references held by thread B. That means thread A might collect and reuse that shared object.
> 
> If we had a runtime call in the cast to shared, thread A's GC can record that that object is possibly referenced from another thread. It won't collect it until a stop-the-world phase says it's not referenced.
> 
> This would give teeth to some of the undefined behavior we currently have.

Ahh, I see what you mean.  Hmm, neat idea.  Might actually work!  Worth exploring, I think.


> > Now that I think of it, we could deal with pointers in unions the same way -- if the compiler detects it, then trigger conservative mode in the GC.
> 
> Right. In the past when people have looked at this problem, the idea was to treat anything that could possibly be a pointer as a pointer.
> 
> > With these two out of the way, a generational GC for D seems closer to the realm of possibility.
> 
> I'm not sure how much sense it makes to have a generational GC without write barriers.

True.  But it would allow at least a precise GC.  And perhaps a few other GC improvements that are currently not possible.


T

-- 
Why waste time reinventing the wheel, when you could be reinventing the engine? -- Damian Conway
January 06, 2019
On Sat, 05 Jan 2019 22:17:27 -0800, H. S. Teoh wrote:
> Yeah, Walter's right on the money about copy-on-substring being a big performance hit in C/C++, and apparently also C# (caveat: I don't know anything about C#).  D's slices really does eliminate a lot of the background cost that most people don't normally think about.

Interestingly, Java used to use slicing for substrings. This was changed (circa java 7, IIRC). The logic was that you might read in a large string and preserve a short substring of it for a long time, and that costs you a lot of extra memory.

I mean, you could just call `new String(substring)`, but apparently people weren't doing that and were confused at their memory usage.
January 06, 2019
On 1/5/2019 9:11 PM, Neia Neutuladh wrote:
> I'm not sure how much sense it makes to have a generational GC without
> write barriers.

Eh, I invented a technique in the 90's to do that. Then I read a paper about it, it's called a "mostly copying" generational GC. All it does is pin the objects it isn't sure about (i.e. ambiguous pointers to them).

January 06, 2019
On 1/5/2019 11:03 PM, Neia Neutuladh wrote:
> Interestingly, Java used to use slicing for substrings. This was changed
> (circa java 7, IIRC). The logic was that you might read in a large string
> and preserve a short substring of it for a long time, and that costs you a
> lot of extra memory.

That's why DMD doesn't use slicing of the source file buffer for strings and identifiers. There's also a caching issue with it, as the slices will be spread out in memory rather than concentrated.