March 31, 2020
On Monday, 30 March 2020 at 17:24:44 UTC, Timon Gehr wrote:
> On 30.03.20 18:50, Atila Neves wrote:
>> 
>> 
>> On Monday, 30 March 2020 at 15:49:55 UTC, Arine wrote:
>>> Anyways, here's a good example of where GC failed to meet their requirements,
>> 
>> Cases like this happen, but more often than not it's just, like, their opinion man.
>
> They were moving from _Go_ to Rust. The GC-related issue they were having seems as good an excuse as any to justify the move. :)

Of course! That's why I said "cases like this happen", by which I meant "sometimes, it's true that the project can't afford a GC".
March 31, 2020
On Monday, 30 March 2020 at 19:32:54 UTC, Meta wrote:
> On Monday, 30 March 2020 at 15:49:55 UTC, Arine wrote:
>> I've also seen it used to replace GC languages like Go on servers as well:
>>
>> Anyways, here's a good example of where GC failed to meet their requirements, and Rust solved their problem as it doesn't use a GC.
>>
>> https://blog.discordapp.com/why-discord-is-switching-from-go-to-rust-a190bbca2b1f
>
> Let's be honest, *anything* would be better than Go, for a reasonable value of "anything". ;-)

Umm no. Go is a solid language, with a very good toolchain, tooling, documentation, ecosystem. It might not have the fanciest language features and it doesn't invent a new paradigm. I know this comment is sarcastic in nature, but I wouldn't underestimate Go.
March 31, 2020
On Tuesday, 31 March 2020 at 12:15:28 UTC, JN wrote:

>
> Umm no. Go is a solid language, with a very good toolchain, tooling, documentation, ecosystem. It might not have the fanciest language features and it doesn't invent a new paradigm. I know this comment is sarcastic in nature, but I wouldn't underestimate Go.

Everyone, this thread has gone way off topic. Let's please stick to discussion of DIP 1028. Thanks!
March 31, 2020
On 3/27/2020 2:30 AM, rikki cattermole wrote:
> Its a bit late to take their approach.

Not at all too late. Plenty of room there.
March 31, 2020
On Tuesday, 31 March 2020 at 20:16:45 UTC, Walter Bright wrote:
> On 3/27/2020 2:30 AM, rikki cattermole wrote:
>> Its a bit late to take their approach.
>
> Not at all too late. Plenty of room there.

With the current implementation and proposal of @live, it is effectively the equivalent of comparing a pair of scissors to a lawn mower. To be comparable to something like Rust would require an entire language rewrite from the ground up. Even though there are already significant breaking changes, they aren't sufficient and I don't imagine breaking everything completely is on the table.
March 31, 2020
On Monday, 30 March 2020 at 18:12:03 UTC, Steven Schveighoffer wrote:
> Sociomantic avoids unpredictable GC cycles, but doesn't disable it (they still allow collections periodically IIRC). And they are built to be as fast as possible.

There were apps where a single GC cycle was not really an option, because even a small GC pause would delay our responding to requests where we couldn't afford the delay.  But the solution was not really _that_ hard: be strict about using recyclable buffers and object pools, preallocate in advance so that there would be minimal resizing ... and then be really strict about keeping that policy.

There's no reason Discord couldn't have done that with their Go app, but if I understood their blog post right, Go's GC force-activates a cycle every 2 minutes regardless of whether any new allocation has actually happened.  (TBH I do wonder if this is really _really_ true, or whether they were just generating sufficient garbage to ensure this happened, despite their claims of efficiency.)

But in any case Sociomantic could rely on the fact that in a D app no new allocations means no chance to trigger a GC cycle (which of course is why we preallocated as well as recycling buffers and objects: ideally, we wanted no heap allocation after app startup).

However, it was only a few apps where this was really necessary.  In fact I think a lot of the time we were much more strict about preallocation and reusable buffers than we needed to be, and the strictness was more of a hangover from working around historical bugs that occurred when using 32-bit DMD.

Basically, the _other_ problem that arose in Sociomantic's use case was that if you want to keep a given app running indefinitely on the same server (and there were some apps that we never wanted to restart if we didn't absolutely have to for new deployments), then you really, really want to be sure that its long term memory usage is stable.  A small daily growth can add up to a lot over months, and wind up bringing down the app or the server.  And in the early days, what they found was that if they generated garbage, then slowly, over time, the memory usage would creep up and up ... so they instigated this strong "preallocate and reuse" policy to work around it.

When I was fairly new in the company I got the chance to implement a new app, and quite early on my team lead sat down with me to show me how to implement and validate the prellocate-and-recycle way of doing things.  The use-case meant that it was unlikely there would be a problem if we had a GC pause, and we wanted to iterate fast on this app, so I suggested we make the code simpler and just rely on the GC.  He explained the long-term memory leak issue, but we agreed to let me try and observe to see what happened.  And it turned out that no garbage-based memory leak emerged.  Which was a nice surprise for my lead and all the other old lags in the R&D team.

I don't think anybody ever did work out exactly what the problem had been in the early days, but it's likely relevant that by the time I broke the rules, the company had been using 64-bit DMD for a long time.  IIRC what was suspected (N.B. this is from memory and from someone who is not an expert on the internals of the GC:-) was that with the 32-bit GC there was something about the size of GC pools or memory chunks that meant that it was very likely that you could wind up with a chunk of GC memory where all of it was in principle recyclable except for a couple of bytes, and hence you would allocate new chunks and then the same thing would happen with them, and so on until you were using far more chunks than should really have been needed.

So, either in 64-bit DMD that didn't happen, or whatever GC bug it was had long been fixed anyway.  And once that discovery was clearly established, I think we started relaxing the strictness a bit in apps that didn't need to care about GC pauses.

The team that grew out of the app I was working on never did have to really care about GC issues, but ironically I did wind up rewriting that same app to make a lot more use of recyclable buffers, though not preallocation.  I don't recall that it was ever really _necessary_, though: it was more of a precaution to try and ensure the same memory consumption for D1 and D2 builds of the same app, given that D2's GC seemed happy to allocate a lot more memory for the same "real" levels of use.  Most likely D2 just allowed the size of the GC heap to grow a lot more before triggering a collection, but we were hyper-cautious about getting identical resource usage just on the offchance it might have been something nastier.

> That doesn't mean D would beat Rust in a competition on who makes the best discord software. It really depends on a lot of factors, and I don't think generalizing Go and D to be the same because they both have a GC is fair or accurate.

For those apps that really couldn't afford a single GC cycle, we did have some discussions about how, if we were writing from scratch, Rust's memory model might have been a nice fit (it was no fun having to monitor those apps for signs of GC cycles and then work out what was causing them).  It would certainly have been _interesting_ to try to write those apps in Rust.  But I think we would have missed a lot of other things that were also important: the friendliness of the code, the ease of iteration, and especially the compile-time introspection and metaprogramming that even in D1 were a major, major help.

I've had a little bit of a go at metaprogramming in Rust, and ... I can't say I like it :-)

It's difficult not to feel that maybe what really made the difference for Discord was not really the language, but that this time they got the design right.  But maybe, for them, Rust's strictness was a way of settling design questions that they could have sorted out for themselves but only by having debate and consensus and making sure that everyone was consistent in doing the right thing.  And Rust probably took all that off the table.
March 31, 2020
On Tuesday, 31 March 2020 at 21:26:41 UTC, Joseph Rushton Wakeling wrote:
> On Monday, 30 March 2020 at 18:12:03 UTC, Steven Schveighoffer wrote:
>> Sociomantic avoids unpredictable GC cycles, but doesn't disable it (they still allow collections periodically IIRC). And they are built to be as fast as possible.
>
> There were apps where a single GC cycle was not really an option, because even a small GC pause would delay our responding to requests where we couldn't afford the delay.  But the solution was not really _that_ hard: be strict about using recyclable buffers and object pools, preallocate in advance so that there would be minimal resizing ... and then be really strict about keeping that policy.

This is a very useful summary of Sociomantic's experience, thanks for taking the time to write it up and post it. The Discord blog post was a good read too.
March 31, 2020
Nice and interesting writeup Joe!

I might shine some light here:

On 3/31/20 5:26 PM, Joseph Rushton Wakeling wrote:
> 
> I don't think anybody ever did work out exactly what the problem had been in the early days, but it's likely relevant that by the time I broke the rules, the company had been using 64-bit DMD for a long time.  IIRC what was suspected (N.B. this is from memory and from someone who is not an expert on the internals of the GC:-) was that with the 32-bit GC there was something about the size of GC pools or memory chunks that meant that it was very likely that you could wind up with a chunk of GC memory where all of it was in principle recyclable except for a couple of bytes, and hence you would allocate new chunks and then the same thing would happen with them, and so on until you were using far more chunks than should really have been needed.

The biggest problem in 32-bit land is that the address space is so small. With a conservative GC, it treats things that aren't pointers as pointers. This means that depending on where the system lays out your memory, likely integers have a better chance of "pinning" memory. In other words, some int on a stack somewhere is actually treated as a pointer holding some piece of memory from being collected. If that memory has pointers in it, maybe it also has ints too. Those ints are treated as pointers, so now more memory could be "caught". As your address space available shrinks, the chances of having false pinnings get higher, so it's a degenerative cycle.

With 64-bit address space, typically everything is allocated far away from typical long values, so the pinning is much rarer.

I'm not sure if this matches your exact problem, but I definitely am sure that 64-bit D is much less likely to leak GC memory than 32-bit D.

> The team that grew out of the app I was working on never did have to really care about GC issues, but ironically I did wind up rewriting that same app to make a lot more use of recyclable buffers, though not preallocation.  I don't recall that it was ever really _necessary_, though: it was more of a precaution to try and ensure the same memory consumption for D1 and D2 builds of the same app, given that D2's GC seemed happy to allocate a lot more memory for the same "real" levels of use.  Most likely D2 just allowed the size of the GC heap to grow a lot more before triggering a collection, but we were hyper-cautious about getting identical resource usage just on the offchance it might have been something nastier.

This I'm sure I can answer :) It is actually something I added to the runtime -- the non-stomping array feature. In D1, an array was only appendable if it was pointing at the beginning of the block. There was no assumeSafeAppend. So if you for instance allocated a block of 16 bytes, you got a 16-byte block from the GC.

But the drawback was that you could overwrite memory that was still referenced without meaning to.

With the non-stomping feature, the "used" space of the array is stored in the block as well (at the end of the block). This allows the array runtime to know when it's safe to append in-place, or when a new block has to be allocated. This is actually quite necessary especially for immutable data such as strings (overwriting still-accessible immutable data is undefined behavior in D2).

The drawback though, is that allocating an array of 16 bytes really needs 17 bytes (one byte for the array length stored in the block). Which actually ends up allocating a 32-byte block (GC blocks come in powers of 2).

Since then, we are also storing the typeinfo in the block if the data has a destructor, meaning less space for actual data.

So this probably explains why a D2 app is going to consume a bit more memory than a D1 app that is written the same.

-Steve
April 01, 2020
On Wednesday, 1 April 2020 at 02:43:09 UTC, Steven Schveighoffer wrote:
> Nice and interesting writeup Joe!
>
> I might shine some light here:

Thanks! :-)

> The biggest problem in 32-bit land is that the address space is so small. With a conservative GC, it treats things that aren't pointers as pointers. This means that depending on where the system lays out your memory, likely integers have a better chance of "pinning" memory. In other words, some int on a stack somewhere is actually treated as a pointer holding some piece of memory from being collected. If that memory has pointers in it, maybe it also has ints too. Those ints are treated as pointers, so now more memory could be "caught". As your address space available shrinks, the chances of having false pinnings get higher, so it's a degenerative cycle.

Ah right, this was it!  I remember several different folks discussing that with me at some point (probably my lead and Luca, on different occasions).

> With 64-bit address space, typically everything is allocated far away from typical long values, so the pinning is much rarer.

Right.  In fact, I think may have been part of why my lead was happy to let me try to relax the rules with my app.

I don't think we ever got _certainty_ that this was what had been impacting the older code and builds, but it was such a good contender that, with the problem no longer showing up (and 32-bit DMD long abandoned), it didn't seem worth anyone's time to dig deeper and prove it 100%.  But I should check in with some folks to confirm.  It may be that they explicitly identified the problem with 32-bit all those years ago, and that's why the strict rules were introduced in the first place.

> I'm not sure if this matches your exact problem, but I definitely am sure that 64-bit D is much less likely to leak GC memory than 32-bit D.

Yup.  But once practical experience showed that, we never dived too deep on whether the problem was really not there with 64-bit, or if it was just happening so slowly that it didn't matter even for the long-lifetime server apps.

> With the non-stomping feature, the "used" space of the array is stored in the block as well (at the end of the block). This allows the array runtime to know when it's safe to append in-place, or when a new block has to be allocated. This is actually quite necessary especially for immutable data such as strings (overwriting still-accessible immutable data is undefined behavior in D2).
>
> The drawback though, is that allocating an array of 16 bytes really needs 17 bytes (one byte for the array length stored in the block). Which actually ends up allocating a 32-byte block (GC blocks come in powers of 2).

Ah, interesting!  I don't think we ever explicitly considered this (Dicebot might recall, as he thought about all the transition issues in much more depth than anyone else).  It certainly could have been a factor.

As I recall, apps that had really strict preallocate-and-reuse policies (and which added all the required `assumeSafeAppend` to avoid stomping prevention on reusable buffers) in general wound up with very similar memory usage (getting all the `assumeSafeAppend` in place was the tricky thing).  But likely for those apps the prellocated buffers were large enough that a 1-byte addition wouldn't push them up into a larger block size.

The places where we saw a big difference tended to be apps with a relatively small overall memory usage, and a quite profligate attitude towards generating garbage.  But there were several of these (doing arguably quite similar things from a general design point of view) and fairly significant discrepancies in behaviour.

So I suspect that more than one factor was in play.  But the impact of stomping prevention and typeinfo on block size would have certainly been worth investigating if any of us had thought of it at the time (assuming my memory is right and we didn't:-)

Before we derail the discussion thread any more, maybe I ought to have a chat with a few former colleagues just to refresh memories, and write this up as a blog post ...
April 01, 2020
On 3/31/2020 2:12 PM, Arine wrote:
> With the current implementation and proposal of @live, it is effectively the equivalent of comparing a pair of scissors to a lawn mower. To be comparable to something like Rust would require an entire language rewrite from the ground up. Even though there are already significant breaking changes, they aren't sufficient and I don't imagine breaking everything completely is on the table.

I've encountered such opinions my entire career. Fortunately, I never pay attention to them.
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19