May 21, 2013
On Tue, 21 May 2013 08:27:23 +0200
Jacob Carlborg <doob@me.com> wrote:

> On 2013-05-20 14:50, Andrei Alexandrescu wrote:
> > On reddit:
> >
> > http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/
> 
> Great talk. What's up with the extra minute added in the beginning?
> 

Suspense!

May 21, 2013
Can't wait to see a prototype for D2 :) I have a feeling that this may solve at least some of vibe.d latency issues at high concurrency levels.
May 21, 2013
Diggory, el 21 de May a las 00:52 me escribiste:
> On Monday, 20 May 2013 at 13:55:05 UTC, Regan Heath wrote:
> >On Mon, 20 May 2013 13:50:25 +0100, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> >
> >>On reddit: http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/
> >
> >This may be the Windows Copy On Write feature mentioned in the Q&A
> >at the end:
> >http://support.microsoft.com/kb/103858
> >
> >.. but it's not clear to me how useful this is for fork emulation or similar.
> >
> >R
> 
> Fork isn't needed at all really in the technique described, this is
> all that's needed:
> - Map a copy of the memory using copy-on-write
> - Run some code concurrently
> 
> It just happens that fork does both of these things, but you can equally well do the two things using separate calls.
> 
> In fact you should be able to avoid the deadlock issue by not using
> fork but just remapping some shared memory using copy on write. The
> GC can exist in a separate thread which pauses itself after every
> run. To run the GC it's then just a case of:
> - stop the world
> - copy registers to stack
> - remap shared memory using COW
> - resume the world
> - resume the GC thread
> 
> And that would work on all modern OSes, plus you don't have the overhead of creating a new process or even a new thread. Also immutable memory doesn't need to be mapped, the GC thread can access it directly.

I'm interested in what you're describing, but I don't know how can you
achieve this without fork()ing (or clone()ing in Linux). What does
"remap shared memory using COW" in a context where fork() doesn't
happen? Why do you even need shared memory if fork() doesn't happen? If
"remap shared memory using COW" means get a different address for the
same block of memory until a write happens in that block, then you can't
scan the roots anymore.

I'm *very* interested in your suggestion.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
CONDUCTOR BORRACHO CASI PROVOCA UNA TRAGEDIA: BATMAN UNICO TESTIGO
	-- Crónica TV
May 22, 2013
On Tuesday, 21 May 2013 at 20:08:16 UTC, Leandro Lucarella wrote:
>
> I'm interested in what you're describing, but I don't know how can you
> achieve this without fork()ing (or clone()ing in Linux). What does
> "remap shared memory using COW" in a context where fork() doesn't
> happen? Why do you even need shared memory if fork() doesn't happen? If
> "remap shared memory using COW" means get a different address for the
> same block of memory until a write happens in that block, then you can't
> scan the roots anymore.
>
> I'm *very* interested in your suggestion.

I do mean mapping the same physical block of memory into two virtual memory blocks, such that a write will cause the OS to make a copy and sever the link.

Given that we're dealing with significantly large blocks of memory all in multiples of the page size, it should be a simple matter to map from one copy of the memory to the other and back again using basic pointer arithmetic, so scanning the roots should not be a problem.

You need shared memory because a file handle is needed when calling "mmap" to enable the COW functionality, and shared memory lets you get a file handle to a block of memory.
May 22, 2013
Dicebot, el 21 de May a las 09:55 me escribiste:
> Can't wait to see a prototype for D2 :) I have a feeling that this may solve at least some of vibe.d latency issues at high concurrency levels.

I hope I can start porting it to D2 at some (not so far in the future)
point...

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Did you know the originally a Danish guy invented the burglar-alarm unfortunately, it got stolen
May 23, 2013
On Tuesday, 21 May 2013 at 04:52:25 UTC, Diggory wrote:
> Either way, at least on windows the separate process would have to be persistent as creating a new process has a lot more overhead attached to it than on posix systems. Implicitly creating a child process is also something a D programmer might not want, again this is more windows specific where processes do not have the same strict hierarchy as on posix.

What would be some of the potential downsides of a long-running satellite process?

A concurrent GC would probably be opt-in or opt-out, either way.
May 23, 2013
On Thursday, 23 May 2013 at 12:39:04 UTC, Vladimir Panteleev wrote:
> On Tuesday, 21 May 2013 at 04:52:25 UTC, Diggory wrote:
>> Either way, at least on windows the separate process would have to be persistent as creating a new process has a lot more overhead attached to it than on posix systems. Implicitly creating a child process is also something a D programmer might not want, again this is more windows specific where processes do not have the same strict hierarchy as on posix.
>
> What would be some of the potential downsides of a long-running satellite process?
>
> A concurrent GC would probably be opt-in or opt-out, either way.

You'd either have to distribute a separate .exe for the GC process or embed the second .exe inside the first and extract it at runtime to a temporary location, or call the same .exe with some flag which tells it to turn into a GC. None of those options are particularly nice, all of them require significant interference from the GC, it's not just a case of calling some function to start the GC. This is especially a problem in cases where the runtime can't insert the init handler such as when using WinMain, which you currently have to do unless you want a console window to pop up.

Next there's the fact that you have a separate top level process for every D process. This is going to cause problems with security software since you now have to give both processes permissions in order to run. In addition you have to consider the case where one or other of the two processes is paused or killed unexpectedly - you don't want orphaned GC processes hanging around wasting resources.
May 24, 2013
On Tuesday, 21 May 2013 at 20:08:16 UTC, Leandro Lucarella wrote:
> I'm interested in what you're describing, but I don't know how can you
> achieve this without fork()ing (or clone()ing in Linux). What does
> "remap shared memory using COW" in a context where fork() doesn't
> happen? Why do you even need shared memory if fork() doesn't happen? If
> "remap shared memory using COW" means get a different address for the
> same block of memory until a write happens in that block, then you can't
> scan the roots anymore.
>
> I'm *very* interested in your suggestion.

After doing some further investigation I think I've found a fairly awesome way of doing garbage collection on windows, hopefully linux has a similar mechanism. It doesn't require memory mapped files or anything special, it can be done retroactively, and it allows a kind of reverse copy on write which is what's actually needed.

When the GC is run:
- Use VirtualProtect to mark all mutable memory pages as read-only
- Add a vectored exception handler to handle the access violation exception
- Resume the GC thread

In the exception handler:
- Copy the page being modified to a new page
- Mark the original page as writable again
- Tell the GC to use the new page instead
- Continue

The main problem I see is the need to synchronise between the exception handler and the GC, when it would be nice if the exception handler was as light-weight as possible. However this can be largely solved by doing finer grained synchronisation/some clever stuff. For example, the handler can add the new mapping first, then do the page copy, and only then wait for the GC to acknowledge the new mapping, so the GC has the whole time that the copying is taking place to see the change. The GC would also of course have to wait for the copying to complete before continuing but pausing the GC for a short time isn't really an issue at all.
May 24, 2013
On Thursday, 23 May 2013 at 23:49:20 UTC, Diggory wrote:
> You'd either have to distribute a separate .exe for the GC process or embed the second .exe inside the first and extract it at runtime to a temporary location, or call the same .exe with some flag which tells it to turn into a GC. None of those options are particularly nice, all of them require significant interference from the GC, it's not just a case of calling some function to start the GC.

I don't know why you say that. The satellite process would be the same executable file as the main process, but started with a special switch (or environment variable), which will be handled by the D runtime. Since the two processes use the same image, the executable code will share the same pages in memory, resulting in a small overhead.

> This is especially a problem in cases where the runtime can't insert the init handler such as when using WinMain,

When implementing WinMain, you have to call the runtime initialization function, which will initialize the GC. GC initialization for the satellite process need not exit.

> which you currently have to do unless you want a console window to pop up.

I don't think this is true. Although the image subsystem is auto-detected by the entry point you're using (CONSOLE for main, WINDOWS for WinMain), you can specify it explicitly using the /SUBSYSTEM linker switch (/SUBSYSTEM:WINDOWS).

> Next there's the fact that you have a separate top level process for every D process. This is going to cause problems with security software since you now have to give both processes permissions in order to run.

Sorry? Could you provide an example (with a specific security software package)?

> In addition you have to consider the case where one or other of the two processes is paused or killed unexpectedly - you don't want orphaned GC processes hanging around wasting resources.

Why is this an issue? Windows provides simple ways to wait or check for a process's termination.
May 24, 2013
On Friday, 24 May 2013 at 06:52:19 UTC, Diggory wrote:
> On Tuesday, 21 May 2013 at 20:08:16 UTC, Leandro Lucarella wrote:
>> I'm interested in what you're describing, but I don't know how can you
>> achieve this without fork()ing (or clone()ing in Linux). What does
>> "remap shared memory using COW" in a context where fork() doesn't
>> happen? Why do you even need shared memory if fork() doesn't happen? If
>> "remap shared memory using COW" means get a different address for the
>> same block of memory until a write happens in that block, then you can't
>> scan the roots anymore.
>>
>> I'm *very* interested in your suggestion.
>
> After doing some further investigation I think I've found a fairly awesome way of doing garbage collection on windows, hopefully linux has a similar mechanism. It doesn't require memory mapped files or anything special, it can be done retroactively, and it allows a kind of reverse copy on write which is what's actually needed.
>
> When the GC is run:
> - Use VirtualProtect to mark all mutable memory pages as read-only
> - Add a vectored exception handler to handle the access violation exception
> - Resume the GC thread

I've tried writing a generational GC for D that used page protection for write barriers a while ago. IIRC, I ran into performance issues (the page faults were rather expensive).

This approach does have the benefit that it will not cause pages that have been moved to swap to be pulled out in order to be scanned every time, though.