January 21, 2022
On Friday, 21 January 2022 at 13:10:46 UTC, Adam D Ruppe wrote:
> On Friday, 21 January 2022 at 12:51:11 UTC, user1234 wrote:
>> I agree. The GC run-time options are poorly documented. It's written nowhere that druntime can take specific arguments that change the GC behavior.
>
> the page
> dlang.org/garbage
>
> has a header called "nowhere":
>
> https://dlang.org/spec/garbage.html#gc_config

okay it's better nowadays, I've upgraded my point of view.
January 21, 2022

On Friday, 21 January 2022 at 12:45:42 UTC, Guillaume Piolat wrote:

>

The GC is not as slow and annoying as it used to be. When I went to Dconf 2019 I was surprised to learn that it had been speed-up during the year (with both fork-GC and speed-up efforts by Rainer IIRC) but this wasn't advertised anywhere. IT's easy to be under the impression that the GC is the same than 10 years ago when it's not written anywhere what changes happened, you'd have to peruse all the release notes. Who knows the real timeline of GC improvement in D? What happened?

IIRC, Rainer's work on the GC was discussed in the forums at the time. The fork-based GC was a SAOC 2018 project, but it wasn't actually merged until last fall. I announced it on the blog and in a YouTube video.

January 21, 2022

On Thursday, 20 January 2022 at 23:56:42 UTC, Chris Katko wrote:

>

I just found out that Unity has incremental garbage collection. I didn't know that was a possibility for GCs.

https://docs.unity3d.com/Manual/performance-incremental-garbage-collection.html

I'm just curious what the D language community's thoughts are on it. The tradeoff is: For a longer total time / lower throughput, you reduce stress on individual frames which prevents hiccups. That's pretty darn important for games and soft/hard real-time systems.

I would say it really depends on a handful of things, like how much allocating/deallocating you're actually doing, and if fragmentation or things would be the issue and if doing it more often would help. I try to write code using the stack rather than allocating for small temporary items which frees itself when it leaves the function.

Though, having the GC pick up while waiting on the OS to reply would be a great time to do it's work; that would be the best time to work.

As for handling fragmentation, I'd almost wish to make a different type of allocator which gives you an id and you use the id+offset to get the actual address (though to make it work it would be a new type of slice that would handle those details transparently); Then periodically (say every 50ms or something) it would do a quick check/scan for a hole, and then move data and adjust the values of said id's to remove empty holes and make it as tight as possible.

I'm also not sure how well it would work on multi-threaded applications, though having locks which the GC locks the id's while it moves them and then unlocks should handle those details.

This of course only would be needed if you have limited/preallocated memory and fragmentation (due to some parts being too small) could kill the process. Otherwise it would probably be a lot of busywork.

To note while i love contemplating this stuff, I'm not quite sure how to implement it all myself. I don't have the same mental fortitude i had when i was 14.

January 21, 2022
On Fri, Jan 21, 2022 at 11:12:01AM +0000, Elronnd via Digitalmars-d wrote: [...]
> void f(int** x) {
> 	*x = new int;
> }
> void g() {
> 	int** x = new int*;
> 	f(x);
> 	int* y;
> 	f(&y);
> }
> 
> and we want to add a generational gc, which barriers.  So naively we rewrite f as follows:
> 
> void f(int** x) {
> 	store_barrier(x, new int);
> }
> 
> This will involve overhead because we don't know if x is a gc pointer or not.  So instead, generate multiple definitions and rewrite callers:
> 
> void f_gc(int** x) {
> 	store_barrier(x, new int);
> }
> void f_nogc(int** x) {
> 	*x = new int; //no barrier!
> }
> void g() {
> 	int** x = new int*;
> 	f_gc(x);
> 	int* y;
> 	f_nogc(&y); //no overhead from passing stack object!
> }

So basically *every* function that receives pointers in some way (could be implicitly via built-in slices, for example) will essentially be implicitly templatized?  That will lead to a LOT of template bloat. I'm not sure if this is a good idea.


> Of course this transformation is conservative: you will sometimes call f_gc with a non-gc pointer.  But I bet it works 99% of the time such that the runtime overhead is negligible in practice.

What we need is a working proof-of-concept implementation that we can actually benchmark and compare with the current situation, so that we can make decisions based on actual data. Otherwise, it's just your word against mine and my word against yours, which gets us nowhere.


T

-- 
If blunt statements had a point, they wouldn't be blunt...
January 21, 2022

On Friday, 21 January 2022 at 13:00:31 UTC, Guillaume Piolat wrote:

>

On Friday, 21 January 2022 at 11:27:51 UTC, Ola Fosheim Grøstad wrote:

>

Makes sense to me for that particular niche. Might also work for embedded with the caveat that you need extra memory.

While it must be interesting, another solution you have in D is to deregister audio threads so that a GC pause would let them run free. The audio threads rarely owns GC root anyway.

Yes, for a plugin that is reasonable. I think an audio-friendly garbage collector would be more for a language-like application like Max, or maybe even a complex sound editor.

But even if you do make your realtime tasks independent of the GC, it is still better to have a collector that is throttling the collection in the background so it doesn't wipe out the caches or saturate the data bus. Then you get more headroom for your realtime tasks (you get a more even load over time).

>

Amortized malloc is surprisingly cheap, you can merge allocation buffers, and you would do it only once at initialization. A lot of the audio buffers are also scoped in nature, with a lifetime that is the one of the processing treatment. So you can also leverage specialized allocators to mass release those buffers rapidly.

Hm, how would you use amortized algorithms in a real time thread where you want even load on every callback? I guess you could use "coroutines" and spread the load over many "interrupts"/"callbacks".

One strategy for dynamic allocation is to have "chunks" for same sized allocations (avoiding fragmentation and merging), and then do a worst case estimate for how many allocations can happen in one realtime-callback. Then a non-realtime thread can make sure that there is enough free space in the chunks to support N callbacks.

Another strategy I have used that may be used if you do not write a plugin, but create your own host, is to pass in the buffers needed in the events you send to the realtime thread. Then have a mailbox for returning the buffers and let a non-realtime thread do the cleanup.

>

In Dplug plugins we got a GC back :) by integrating Wren as UI scripting language. It is expected to cause very few problems, as it has a max-heap size and doesn't touch audio.

Is that a dedicated GC for wren or is it a generic GC?

January 21, 2022

On Friday, 21 January 2022 at 12:52:02 UTC, Elronnd wrote:

>

You do not. You do need to do something about unions, but you do need to disallow pointers in them. The spec says to pin objects which are in unions, which is a perfectly acceptable solution. There are also cleverer solutions which violate that clause of the spec but do not break any actual code.

No, pinning is not a solution, that is band-aid for a broken design. You'll end up with memory leaks all over the place.

Regarding your suggestion, this is not as trivial as you imply. There is no way today for the type system to know where a pointer points. So there is also no way to know whether you need to apply a barrier or not. Since non-D libraries can mutate pointers there is also no way to be sure that they only modify pointers that does not require barriers. As a result you risk having spurious bugs that are impossible to track.

You need some kind of language change to make the type system work properly with barriers.

January 21, 2022

On Friday, 21 January 2022 at 18:53:03 UTC, Ola Fosheim Grøstad wrote:

>

Is that a dedicated GC for wren or is it a generic GC?

Wren use its own non-moving, non-incremental, precise, stop-the-world mark&sweep GC.

January 21, 2022

On Friday, 21 January 2022 at 18:53:03 UTC, Ola Fosheim Grøstad wrote:

>

Hm, how would you use amortized algorithms in a real time thread where you want even load on every callback?

Well I was using the term incorrectly, "amortized" in the sense the large majority of callback will not allocate, so in other words, the opposite :)

pushack_back leading to realloc can make a lot more trouble because it moves large amount of memory. Delay lines may need to be allocated with a max size / pessimized for that reason.

In game-mixer I had to use chunked allocation to avoid that while decoding an audio file with playback, in the case you want to keep it in memory.

January 21, 2022

On Friday, 21 January 2022 at 20:22:15 UTC, Guillaume Piolat wrote:

>

Well I was using the term incorrectly, "amortized" in the sense the large majority of callback will not allocate, so in other words, the opposite :)

Yes, that is different, but actually using a stackless coroutine can be a solution to spread out the load when you need to. In D I guess you could have do it as a state machine using a switch statement. I've never done this in practice though…

>

pushack_back leading to realloc can make a lot more trouble because it moves large amount of memory. Delay lines may need to be allocated with a max size / pessimized for that reason.

What is pushack_back? You can also build a delay with a chain of smaller fixed size buffers, although a bit unusual perhaps. If you need a filter you can do that in a smaller buffer. So many implementation options!

January 22, 2022

On Friday, 21 January 2022 at 21:29:06 UTC, Ola Fosheim Grøstad wrote:

>

What is pushack_back?

push_back* ie. appending to a vector-like structure.