Go's march to low-latency GC

On Thursday, 7 July 2016 at 22:36:29 UTC, Enamex wrote: > https://news.ycombinator.com/item?id=12042198 > > ^ reposting a link in the right place. > While a program using 10,000 OS threads might perform poorly, that number of goroutines is nothing unusual. One difference is that a goroutine starts with a very small stack — only 2kB — which grows as needed, contrasted with the large fixed-size stacks that are common elsewhere. Go’s function call preamble makes sure there’s enough stack space for the next call, and if not will move the goroutine’s stack to a larger memory area — rewriting pointers as needed — before allowing the call to continue. Correct me if I'm wrong, but in D fibers allocate stack statically, so we have to preallocate large stacks. If yes - can we allocate stack frames on demand from some non-GC area?

On 07/08/2016 07:45 AM, ikod wrote: > Correct me if I'm wrong, but in D fibers allocate stack statically, so we have to preallocate large stacks. > > If yes - can we allocate stack frames on demand from some non-GC area? Fiber stacks are just mapped virtual memory pages that the kernel only backs with physical memory when they're actually used. So they already are allocated on demand.

On Friday, 8 July 2016 at 20:35:05 UTC, Martin Nowak wrote: > On 07/08/2016 07:45 AM, ikod wrote: >> Correct me if I'm wrong, but in D fibers allocate stack statically, so we have to preallocate large stacks. >> >> If yes - can we allocate stack frames on demand from some non-GC area? > > Fiber stacks are just mapped virtual memory pages that the kernel only backs with physical memory when they're actually used. So they already are allocated on demand. But the size of fiber stack is fixed? When we call Fiber constructor, the second parameter for ctor is stack size. If I made a wrong guess and ask for too small stack then programm may crash. If I ask for too large stack then I probably waste resources. So, it would be nice if programmer will not forced to make any wrong decisions about fiber's stack size. Or maybe I'm wrong and I shouldn't care about stack size when I create new fiber?

On 07/09/2016 02:48 AM, ikod wrote: > If I made a wrong guess and > ask for too small stack then programm may crash. If I ask for too large > stack then I probably waste resources. Nope, this is exactly the point. You can demand crazy 10 MB of stack for each fiber and only the actually used part will be allocated by kernel.

On Saturday, 9 July 2016 at 13:48:41 UTC, Dicebot wrote: > On 07/09/2016 02:48 AM, ikod wrote: >> If I made a wrong guess and >> ask for too small stack then programm may crash. If I ask for too large >> stack then I probably waste resources. > > Nope, this is exactly the point. You can demand crazy 10 MB of stack for each fiber and only the actually used part will be allocated by kernel. Thanks, nice to know.

On 7/7/16 6:36 PM, Enamex wrote: > https://news.ycombinator.com/item?id=12042198 > > ^ reposting a link in the right place. A very nice article and success story. We've had similar stories with several products at Facebook. There is of course the opposite view - an orders-of-magnitude improvement means there was quite a lot of waste just before that. I wish we could amass the experts able to make similar things happen for us. Andrei

On Fri, 08 Jul 2016 22:35:05 +0200, Martin Nowak wrote: > On 07/08/2016 07:45 AM, ikod wrote: >> Correct me if I'm wrong, but in D fibers allocate stack statically, so we have to preallocate large stacks. >> >> If yes - can we allocate stack frames on demand from some non-GC area? > > Fiber stacks are just mapped virtual memory pages that the kernel only backs with physical memory when they're actually used. So they already are allocated on demand. The downside is that it's difficult to release that memory. On the other hand, Go had a lot of problems with its implementation in part because it released memory. At some point you start telling users: if you want a fiber that does a huge recursion, dispose of it when you're done. It's cheap enough to create another fiber later.

On Saturday, 9 July 2016 at 17:41:59 UTC, Andrei Alexandrescu wrote: > On 7/7/16 6:36 PM, Enamex wrote: >> https://news.ycombinator.com/item?id=12042198 >> >> ^ reposting a link in the right place. > > A very nice article and success story. We've had similar stories with several products at Facebook. There is of course the opposite view - an orders-of-magnitude improvement means there was quite a lot of waste just before that. > > I wish we could amass the experts able to make similar things happen for us. > > > Andrei kickstarter for improve gc :)

July 09, 2016

Re: Go's march to low-latency GC

Posted by Martin Nowak
in reply to Andrei Alexandrescu

Permalink

Martin Nowak

Posted in reply to Andrei Alexandrescu

Permalink

On Saturday, 9 July 2016 at 17:41:59 UTC, Andrei Alexandrescu wrote:
> On 7/7/16 6:36 PM, Enamex wrote:
>> https://news.ycombinator.com/item?id=12042198
>>
>> ^ reposting a link in the right place.
>
> A very nice article and success story. We've had similar stories with several products at Facebook. There is of course the opposite view - an orders-of-magnitude improvement means there was quite a lot of waste just before that.

Exactly, how someone can run a big site with 2 second pauses in the GC code is beyond me.

> I wish we could amass the experts able to make similar things happen for us.

We sort of have an agreement that we don't want to pay 5% for write barriers, so the common algorithmic GC improvements aren't available for us.
There is still connectivity based GC [¹], which is an interesting idea, but AFAIK it hasn't been widely tried.
Maybe someone has an idea for optional write barriers, i.e. zero cost if you don't use them. Or we agree that it's worth to have different incompatible binaries.

[¹]: https://www.cs.purdue.edu/homes/hosking/690M/cbgc.pdf

In any case now that we made the GC pluggable we should port the forking GC. It has almost no latency at the price of higher peak memory usage and throughput, the same trade-offs you have with any concurrent mark phase.
Moving the sweeping to background GC threads is sth. we should be doing anyhow.

Overall I think we should focus more on good deterministic MM alternatives, rather than investing years of engineering into our GC, or hoping for silver bullets.

Forums