Thread GC non "stop-the-world" (page 3)

On Wednesday, 24 September 2014 at 14:36:13 UTC, Sean Kelly wrote: > Large allocations are the easy case, as the allocation lives in its own pool and you can just move the entire pool. Dataset is not a contiguous object. It's like an in-memory database: tables can added or removed from it, rows can be added or removed from tables, fields in rows can be set with various values. In the end a dataset is a collection of a big number of relatively small objects, big datasets are collections of big numbers of objects.

On Wednesday, 24 September 2014 at 20:24:01 UTC, Oscar Martin wrote: > Yes, that's the problem I see with the shared GC. But I think cases like this should be solved "easily" with a mechanism for transfer of responsibility between thread GCs. The truly problematic cases are shared objects with roots in various threads You might want to look at Nimrod. AFAIK, it uses thread-local GC and thread groups are planned to be introduced; as I understand, shared GC will stop only threads in a group and not other groups.

On Wednesday, 24 September 2014 at 20:15:52 UTC, Oscar Martin wrote: > On Wednesday, 24 September 2014 at 08:13:15 UTC, Marc Schütz wrote: >> >> There can also be a shared _and_ a local GC at the same time, and a thread could opt from the shared GC (or choose not to opt in by not allocating from the shared heap). > > Yes, a shared GC should be a possibility, but how you avoid the "stop-the-world" phase for that GC? > > Obviously this pause can be minimized by performing the most work out of that phase, but after seeing the test of other people on internet about advanced GCs (java, .net) I do not think it's enough for some programs > > But hey, I guess it's enough to cover the greatest number of cases. My goal is to start implementing the thread GC. Then I will do testing of performance and pauses (my program requires managing audio every 10 ms) and then I might dare to implement the shared GC, which is obviously more complex if desired to minimize the pauses. We'll see what the outcome This thread reminds me again of a paper I read a few months ago with a clever way of dealing with the sharing problem while maintaining performance: https://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/local-gc.pdf The caveat for D being this design requires read and write barriers and I'm pretty sure I recall correctly that those have been vetoed several times for complexity. -Wyatt

On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote: > On Wednesday, 24 September 2014 at 20:15:52 UTC, Oscar Martin wrote: >> On Wednesday, 24 September 2014 at 08:13:15 UTC, Marc Schütz wrote: >>> >>> There can also be a shared _and_ a local GC at the same time, and a thread could opt from the shared GC (or choose not to opt in by not allocating from the shared heap). >> >> Yes, a shared GC should be a possibility, but how you avoid the "stop-the-world" phase for that GC? >> >> Obviously this pause can be minimized by performing the most work out of that phase, but after seeing the test of other people on internet about advanced GCs (java, .net) I do not think it's enough for some programs >> >> But hey, I guess it's enough to cover the greatest number of cases. My goal is to start implementing the thread GC. Then I will do testing of performance and pauses (my program requires managing audio every 10 ms) and then I might dare to implement the shared GC, which is obviously more complex if desired to minimize the pauses. We'll see what the outcome > > This thread reminds me again of a paper I read a few months ago with a clever way of dealing with the sharing problem while maintaining performance: https://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/local-gc.pdf > > The caveat for D being this design requires read and write barriers and I'm pretty sure I recall correctly that those have been vetoed several times for complexity. > > -Wyatt An interesting paper. Thank you very much

On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote: > > The caveat for D being this design requires read and write barriers and I'm pretty sure I recall correctly that those have been vetoed several times for complexity. Pretty much for reasons of being able to call C functions and inline asm code. Memory barriers may still be possible in these scenarios, but they would be extremely expensive.

September 28, 2014

Re: Thread GC non "stop-the-world"

Posted by Rainer Schuetze
in reply to Oscar Martin

Permalink

Rainer Schuetze

Posted in reply to Oscar Martin

Permalink


On 23.09.2014 02:15, Oscar Martin wrote:
> With some/a lot of work and a little help compiler (currently it
> indicates by a flag if a class/structure contains pointers/references to
> other classes/structures, it could increase this support to indicate
> which fields are pointers/references)

https://github.com/rainers/druntime/gcx_precise2

we could implement a
> semi-incremental-generational-copying GC-conservative like:
>
>     http://www.hboehm.info/gc/
> or
>     http://www.ravenbrook.com/project/mps/
>
> Being incremental, they try to minimize the "stop-the-world" phase. But
> even with an advanced GC, as programs become more complex and use more
> memory, pause time also increases. See for example (I know it's not
> normal case, but in a few years ...)

As others have already mentioned, incremental GCs need read/write barriers. There is currently resistence to implementing these in the compiler, the alternative in the library is using page protection, but this is very coarse.

"semi-incremental-generational-copying" is probably asking to much in one step ;-)

>
> http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector
>
> (*) What if:
> - It is forbidden for "__gshared" have references/pointers to objects
> allocated by the GC (if the compiler can help with this prohibition,
> perfect, if not the developer have to know what he is doing)
> - "shared" types are not allocated by the GC (they could be reference
> counted or manually released or ...)

shared objects will eventually contain references to other objects that you don't want to handle manually (e.g. string). That means you will have to add the memory range of the shared object to some GC for scanning. Back to square one...

> - "immutable" types are no longer implicitly "shared"
>
> In short, the memory accessible from multiple threads is not managed by
> the GC.

Is the compiler meant to help via the type system? I don't think this works as AFAIK the recommended way to work with shared objects is to cast away shared after synchronizing on some mutex:

class C
{
	void doWork() { /*...*/ }

	void doSharedWork() shared
	{
		synchronized(someMutex)
		{
			C self = cast(C)this;
			self.doWork();
		}
	}
}

Maybe I missed other patterns to use shared (apart from atomics on primitive types). Are there any?

>
> With these restrictions each thread would have its "I_Allocator", whose
> default implementation would be an
> incremental-generational-semi-conservative-copying GC, with no
> inteference with any of the other program threads (it should be
> responsible only for the memory reserved for that thread). Other
> implementations of "I_Allocator" could be based on Andrei's allocators.
> With "setThreadAllocator" (similar to current gc_setProxy) you could
> switch between the different implementations if you need. Threads with
> critical time requirements could work with an implementation of
> "I_Allocator" not based on the GC. It would be possible simulate scoped
> classes:
>
> {
>      setThreadAllocator(I_Allocator_pseudo_stack)
>      scope(exit) {
>          I_Allocator_pseudo_stack.deleteAll();
>          setThreadAllocator(I_Allocator_gc);
>      }
>      auto obj = MyClass();
>      ...
>      // Destructor are called and memory released
> }
>

There is a DIP by Walter with similar functionality: http://wiki.dlang.org/DIP46

On Thursday, 25 September 2014 at 21:59:15 UTC, Sean Kelly wrote: > On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote: >> >> The caveat for D being this design requires read and write barriers and I'm pretty sure I recall correctly that those have been vetoed several times for complexity. > > Pretty much for reasons of being able to call C functions and inline asm code. Memory barriers may still be possible in these scenarios, but they would be extremely expensive. BTW, C usually accepts data only for reading, and writes mostly strings and buffers - plain data without pointers. In both cases it doesn't need to notify GC (as far as I understand write barriers).

On Friday, 21 November 2014 at 10:24:09 UTC, Kagamin wrote: > On Thursday, 25 September 2014 at 21:59:15 UTC, Sean Kelly wrote: >> On Thursday, 25 September 2014 at 13:55:42 UTC, Wyatt wrote: >>> >>> The caveat for D being this design requires read and write barriers and I'm pretty sure I recall correctly that those have been vetoed several times for complexity. >> >> Pretty much for reasons of being able to call C functions and inline asm code. Memory barriers may still be possible in these scenarios, but they would be extremely expensive. > > BTW, C usually accepts data only for reading, and writes mostly strings and buffers - plain data without pointers. In both cases it doesn't need to notify GC (as far as I understand write barriers). "usually" isn't sufficient if you're trying to make a GC that doesn't collect live data. It's possible that we could do something around calls to extern (C) functions that accept a type containing pointers, but I'd have to give this some thought.

I believe I have never seen such C function. What can it do to screw managed memory? It usually requires allocation of new memory from GC and assigning it to an old object, but (true) C function is very unlikely to allocate memory from GC.

Forums