Jump to page: 1 2 3
Thread overview
D Language Foundation October 2023 Quarterly Meeting Summary
Dec 06
ryuukk_
Dec 06
Sergey
Dec 11
Sergey
Dec 12
Dukc
Dec 28
monkyyy
December 06

The D Language Foundation's quarterly meeting for October 2023 took place on Friday the 6th at 15:00 UTC. This was quite a short one as far as quarterlies go, clocking in at around 35 minutes.

The Attendees

The following people attended the meeting:

  • Mathis Beer (Funkwerk)
  • Walter Bright (DLF)
  • Dennis Korpel (DLF)
  • Mario Kröplin (Funkwerk)
  • Mathias Lang (DLF/Symmetry)
  • Átila Neves (DLF/Symmetry)
  • Mike Parker (DLF)
  • Igor Pikovets (Ahrefs)
  • Carsten Rasmussen (Decard)
  • Robert Schadek (DLF/Symmetry)
  • Bastiaan Veelo (SARC)

The Summary

Bastiaan

Bastiaan reported that SARC had been testing their D codebase (transpiled from Pascal---see Bastiaan's DConf 2019 talk). They'd found the multithreaded performance worse than the Pascal version. He said that execution time increased with more threads and that it didn't matter how many threads you throw at it. It's the latter problem he was focused on at the moment.

At first, they'd suspected the GC, but it turned out to be contention resulting from heap allocation. In Pascal, they'd heavily used variable-length arrays. For those, the length is determined at run time, but it's fixed. Since they can't grow, they're put on the stack. This makes them quite fast and avoids the global lock of the heap.

One way to do that in D is to use alloca, but that's an issue because the memory it allocates has to be used in the same function that calls the alloca. So you can't, e.g., use alloca to alloc memory in a constructor, and that prevents using it in a custom array implementation. He couldn't think of a way to translate it. He was able to work around it by using allocators in the array implementation with a thread-local free list. He found that promising. His current problem was that it took a lot of time to understand the experimental allocators package. Once he got this sorted, he would have to see if it helped solve the problem they were seeing with more threads resulting in worse performance.

There was also a problem with DMD underperforming Pascal. DMD's output was about five times slower than Pascal's. His tests with LDC showed it was two times faster than Pascal. Unfortunately, they are currently limited to 32-bit Windows, and it will be a few years before they can migrate to 64-bit. LDC unfortunately had an issue that caused stack corruption on 32-bit Windows. They'd hit it in one case and were able to work around it, but he couldn't be sure they wouldn't hit it somewhere else. He wasn't willing to risk unreliable computations.

He said that LDC could do the right thing, but his understanding from talking to Martin was that implementing it would have a large time cost. Since Win32 is going to eventually go away, he wasn't very keen on paying that cost. They'd spoken at DConf about the possibility of LDC raising compilation errors when stack corruption could occur so that they could then work around those cases, but he hadn't followed up with Martin about it.

They'd spent seven years getting the transcompilation complete, so this was a critical issue they needed to resolve. He was hopeful that the experimental allocator package would help solve it.

Robert asked if he'd looked into doing something like the small string optimization, where you set a default size that you use for static arrays and then only resort to heap allocation when you need something larger. Had they analyzed their code to determine the array sizes they were using? Bastiaan said yes, a consequence of this issue was that they were linking with a rather large stack size.

Walter suggested he just use alloca. Just have the transcompiler emit calls to alloca in the first lines of the function body for any VLAs and they should be okay. Bastiaan said they'd thought of allocating large chunks of memory up front and just picking off chunks of that for a custom allocator. That works very close to a free list, then he discovered the std allocator package has a free list. His experiments with that worked, but it had been challenging to implement it more generally. He said he would have to take another look at alloca.

Walter said alloca wasn't used very much in D, but it's there. If he were to implement C VLAs, that's what he'd use to do it. Robert stressed they should analyze their code to see what a magic maximum number of elements is and just use that for static arrays, allocating on the heap when they need more. Static arrays and alloca were comparable to some degree. Maybe they could get away with that. It should result in cleaner code.

Robert also suggested that since this project has been going on for so long and was a good showcase for D in general, Bastiaan should come back and ask for help even on more than a quarterly basis. We then had a bit of discussion about what it would take to fix the LDC issue. Bastiaan said that having the compiler throw errors as he and Martin had discussed would be fine as long as it were a manageable number of errors.

Igor

Igor said Ahrefs had updated to the latest LDC and were trying it out, but had nothing to share with us this time.

I noted that at DConf, the Ahrefs team had given the DLF access to their platform. We hadn't started using it, but we plan to do so once when we overhaul the website. I thanked him for that.

Mathis Beer

Mathis said there was nothing much going on. They had decided to hold off on updating to the latest DMD because LDC had been lagging behind a bit, but that was somewhat normal. Everything was working fine.

However, he'd been playing around with 2.104 on his own and had encountered some weird crashes, but hadn't yet reduced anything for a bug report. He asked if anyone else had seen the same and the response was negative. He said he'd put some time into Dustmite and file an issue.

He brought up the COVID outbreak at DConf. Everyone who attended from Funkwerk had gotten it. We discussed what could be done to reduce that risk next year. I reported that this had come up in post-DConf meetings I'd had with Symmetry and our event planners. There's no way we're going to be able to force people to take tests or wear masks. But we definitely need a policy in place. In 2022, we asked people to stay in their hotel rooms and watch via the live stream if they had symptoms during the conference. We didn't do that this year, but it's one step we've already decided to take next year. We'll work out other ideas before then. Robert suggested we include masks and COVID tests in the swag bag.

Mathias Lang

Mathias said he'd been mentoring a SAOC student working on C++ interop: namely, making C++ STL containers more accessible. They'd started by copying the existing code from core.std.cpp into its own repository. This needs to be taken out of DRuntime because DRuntime is distributed pre-compiled, and that ties it to a specific compiler API, which isn't good. Instead, we should distribute it as a package. It's something he'd brought up before.

Now, they were looking into adding tests and fixing the bugs they'd found. They'd also extracted a CI from DRuntime. The project was ongoing and making progress.

(NOTE: You can search the General Forum for "SAOC 2023 C++ STL INTEROP" to see Emmanuel Nyarko's weekly updates during the SAOC event, which continues until January 15.)

Robert

Robert had nothing for us this time.

Carsten

Carsten reported that Decard were trying to get their release out in three months. They were happy with the system they were working on.

Dennis

I told everyone that normally, I wouldn't go to any of the DLF-only people in a quarterly these days since we've split out our monthlies. However, since we had so few attendees this time and things were running quickly, I said I'd give Dennis and Walter each a turn.

Dennis said he had nothing for us this time, but he had some things to bring up at the monthly the following week.

Walter

Walter said he'd been taking steps aimed at facilitating work on DMD-as-a-library. He'd been trying to disentangle different parts of the compiler from each other, in particular making the ASTs more tractable for users without becoming completely "englommed" by the compiler. He was awaiting feedback on whether Razvan was happy with his approach. Either way, the end goal was to get rid of the two parallel "same-only-different" ASTs we currently have. He'd made some progress on it.

He had also been working on some ImportC fixes.

He apologized to me for not looking into three DIPs I'd asked him to look at. He'd emailed me about one of them before the meeting. He asked if I could hang around after to discuss the other two.

I let everyone know that Walter was talking about three grammar-related DIPs that Graham D'Amour had submitted last year. I'd been wondering if we needed DIPs for those or not. I'd asked Walter to look at them a while ago to decide, but I'd never followed up. Graham had pinged me about them a week before the meeting.

(UPDATE_:I did hang around after the meeting and we did discuss the DIPs. The outcome is posted in the comment thread of PR #234. You can see the other DIPs in PR #233 and in PR #235. The TL;DR is that these correct issues in the grammar and we absolutely should implement them, but because of potential breakage they should be done in an edition.)

Me

I told everyone I'd had a preliminary discussion about DConf '24 with Symmetry's CTO. I'd answered his questions about how things had gone over the past three editions Symmetry had sponsored and about what we needed. I was expecting that we could start planning in earnest before the end of this year. I was looking at the possibility of doing it either in May or in September so that we could get out of peak travel season. That depends entirely on the availability of the venue and what they charge us. They've given us a significant discount for the past two editions because peak travel season is also off-peak conference season. We wouldn't be able to get the same deal for May or September.

I then mentioned DConf Online. I'd scheduled it in December last year just so we could do it in 2022. I should have delayed it until February or March. Holding it four months after DConf was a real PITA. So I decided I'd push the next edition into 2024. Whether it happens early or late in the year depends on the final DConf dates.

(UPDATE: DConf '24 planning has since begun. The event planner is on the case and the gears are moving inside Symmetry. Stay tuned.)

I closed by mistakenly telling everyone that our next quarterly would be in December (it's January). I invited everyone to reach out if they had any issues before then.

The Next Meetings

We had our October monthly meeting one week after this meeting. The next quarterly should happen on January 5, 2024. We had no regular planning sessions in October, but two workgroup meetings took place regarding DMD-as-a-library. The monthly meeting summary is coming next, then I'll publish an update about the workgroup meetings.

December 06
>

This needs to be taken out of DRuntime because DRuntime is distributed pre-compiled, and that ties it to a specific compiler API, which isn't good. Instead, we should distribute it as a package. It's something he'd brought up before.

Why not directly distribute DRuntime as a source? or rather, simplify how it can be used as a source

dmd -i does the magic already, it'll be able to pick what ever module on the fly

That's how i use my custom runtime, as source, makes things much smoother to use, however, in the case of druntime, it might highlight some compilation speed issues

What was the rational behind distributing the runtime as a compiled library?

December 06

On Wednesday, 6 December 2023 at 16:28:08 UTC, Mike Parker wrote:

>

Bastiaan

They'd found the multithreaded performance worse than the Pascal version. He said that execution time increased with more threads and that it didn't matter how many threads you throw at it. It's the latter problem he was focused on at the moment.

At first, they'd suspected the GC, but it turned out to be contention resulting from heap allocation. In Pascal, they'd heavily used variable-length arrays. For those, the length is determined at run time, but it's fixed. Since they can't grow, they're put on the stack. This makes them quite fast and avoids the global lock of the heap.

I am kindly invite Bastiaan and his team to participate in this competition :) https://github.com/jinyus/related_post_gen

fixed-sized arrays will suit perfectly for the task, and it also has multithreading comparison! Pascal should be good over there

December 10

On Wednesday, 6 December 2023 at 16:28:08 UTC, Mike Parker wrote:

>

Bastiaan reported that SARC had been testing their D codebase (transpiled from Pascal---see Bastiaan's DConf 2019 talk). They'd found the multithreaded performance worse than the Pascal version. He said that execution time increased with more threads and that it didn't matter how many threads you throw at it. It's the latter problem he was focused on at the moment.

I have an update on this issue. But first let me clarify how grave this situation is (was!) for us. There are certain tasks that we, and our customers, need to perform that involves a 20 logical core computer to crunch numbers for a week. This is painful, but it also means that a doubling of that time is completely unacceptable, let alone a 20-fold increase. It is the difference between in business and out of business.

Aside from the allocation issue, there are several other properties that our array implementation needs to replicate from Extended Pascal: being able to have non-0 starting indices, having value semantics, having array limits that can be compile-time and run-time, and function arguments that must work on arrays of any limits, also for multi-dimensional arrays. So while trying to solve one aspect, care had to be taken not to break any of the other aspects.

It turned out that thread contention had more than one causes, which made this an extra frustrating problem because just as we thought to have found the culprit, it did not have the effect that we expected.

These were the three major reasons we were seeing large thread contention, in no particular order:

  1. Missing scope storage class specifiers on delegate function arguments. This can be chalked down as a beginner error, but also one that is easy to miss. If you didn't know: without scope the compiler cannot be sure that the delegate is not stored in some variable that has a longer lifetime than the stack frame of the (nested) function pointed to by the delegate. Therefore, a dynamic closure is created, which means that the stack is copied to new GC-allocated memory. In the majority of our cases, delegate arguments are simple callbacks that are only stored on the stack, but a select number of delegates in the GUI are stored for longer. The compiler can check if scope delegates escape a function, but it only does this in @safe code --- and our code is long from being @safe. So it was a bit of a puzzle to find out which arguments needed to be scope and which arguments couldn't be scope.
  2. Allocating heap memory in the array implementation, as discussed in the meeting. We followed Walter's advice and now use alloca. Not directly, but using string mixin's and static member functions that generate the appropriate code.
  3. Stale calls to GC.addRange and GC.removeRange. These were left over from an experiment where we tried to circumvent the garbage collector. Without knowing these were still in there, we were puzzled because we even saw contention in code that was marked @nogc. It makes sense now, because even though addRange doesn't allocate, it does need the global GC lock to register the range safely. Because the stack is already scanned by default, these calls were now superfluous and could be removed.

So now all cores are finally under full load, which is a magnificent sight! Speed of DMD release-nobounds is on par with our Pascal version, if not slightly faster. We are looking forward to being able to safely use LDC, because tests show that it has the potential to at least double the performance.

A big sigh of relief from us as we have solved the biggest hurdle (hopefully!) on our way to full adoption of D.

-- Bastiaan.

December 11
That is awesome to hear!

If the move towards ldc has the potential to half your run time, that is quite a significant improvement for your customers.

It will be interesting to hear how dcompute will fare in your situation, due to it being D code it should be an incremental improvement once you're ready to move to D fully.

Based upon the estimates here already, it seems like acquiring an LDC developer in house might be well worth it.
December 10
On Sunday, 10 December 2023 at 15:31:55 UTC, Richard (Rikki) Andrew Cattermole wrote:

> It will be interesting to hear how dcompute will fare in your situation, due to it being D code it should be an incremental improvement once you're ready to move to D fully.

Yes, dcompute could mean another leap forward. There are so many great things to look forward to.

-- Bastiaan.
December 10

On Sunday, 10 December 2023 at 15:08:05 UTC, Bastiaan Veelo wrote:

>
  1. Missing scope storage class specifiers on delegate function arguments. This can be chalked down as a beginner error, but also one that is easy to miss. If you didn't know: without scope the compiler cannot be sure that the delegate is not stored in some variable that has a longer lifetime than the stack frame of the (nested) function pointed to by the delegate. Therefore, a dynamic closure is created, which means that the stack is copied to new GC-allocated memory. In the majority of our cases, delegate arguments are simple callbacks that are only stored on the stack, but a select number of delegates in the GUI are stored for longer. The compiler can check if scope delegates escape a function, but it only does this in @safe code --- and our code is long from being @safe. So it was a bit of a puzzle to find out which arguments needed to be scope and which arguments couldn't be scope.

This reminded me of https://forum.dlang.org/thread/myiqlzkghnnyykbyksga@forum.dlang.org
LDC has a special GC2Stack IR optimization pass, which is a lifesaver in many cases like this.

>

So now all cores are finally under full load, which is a magnificent sight! Speed of DMD release-nobounds is on par with our Pascal version, if not slightly faster. We are looking forward to being able to safely use LDC, because tests show that it has the potential to at least double the performance.

Are there some known blocker bugs, which prevent a safe usage of LDC in production?

December 10

On Sunday, 10 December 2023 at 17:11:04 UTC, Siarhei Siamashka wrote:

>

On Sunday, 10 December 2023 at 15:08:05 UTC, Bastiaan Veelo wrote:

>

The compiler can check if scope delegates escape a function, but it only does this in @safe code --- and our code is long from being @safe. So it was a bit of a puzzle to find out which arguments needed to be scope and which arguments couldn't be scope.

This reminded me of https://forum.dlang.org/thread/myiqlzkghnnyykbyksga@forum.dlang.org
LDC has a special GC2Stack IR optimization pass, which is a lifesaver in many cases like this.

Interesting.

>

Are there some known blocker bugs, which prevent a safe usage of LDC in production?

This one: https://github.com/ldc-developers/ldc/issues/4265

Mike has summarized it:

>

LDC unfortunately had an issue that caused stack corruption on 32-bit Windows. They'd hit it in one case and were able to work around it, but he couldn't be sure they wouldn't hit it somewhere else. He wasn't willing to risk unreliable computations.

He said that LDC could do the right thing, but his understanding from talking to Martin was that implementing it would have a large time cost. Since Win32 is going to eventually go away, he wasn't very keen on paying that cost. They'd spoken at DConf about the possibility of LDC raising compilation errors when stack corruption could occur so that they could then work around those cases, but he hadn't followed up with Martin about it.

-- Bastiaan.

December 10

On Wednesday, 6 December 2023 at 16:28:08 UTC, Mike Parker wrote:

>

One way to do that in D is to use alloca, but that's an issue because the memory it allocates has to be used in the same function that calls the alloca. So you can't, e.g., use alloca to alloc memory in a constructor, and that prevents using it in a custom array implementation.

You can call alloca as a default argument to a function. The memory will be allocated on the caller's stack before calling the function:
https://github.com/ntrel/stuff/blob/master/util.d#L113C1-L131C2

I've just tested and it seems it works as a constructor default argument too.

December 10
On Sunday, 10 December 2023 at 16:08:45 UTC, Bastiaan Veelo wrote:
> On Sunday, 10 December 2023 at 15:31:55 UTC, Richard (Rikki) Andrew Cattermole wrote:
>
>> It will be interesting to hear how dcompute will fare in your situation, due to it being D code it should be an incremental improvement once you're ready to move to D fully.
>
> Yes, dcompute could mean another leap forward. There are so many great things to look forward to.
>
> -- Bastiaan.

Always happy to help if you're interested in looking into using dcompute. I can't remember if we've talked about it before, but if you were wanting to use it you'd need OpenCL 2.x (explicitly the 2.x version series, or make sure the 3.x implementation supports SPIRV) running on that 20 logical core box (or if it has GPUs attached to it, CUDA (any version should do) for NVidia GPUs or OpenCL 2.x (as above) on any other GPUs).

With regards to the stack corruption there is https://github.com/ldc-developers/ldc/blob/master/gen/abi/x86.cpp#L260 which has been there for some time. It would be fairly simple to issue a diagnostic there (although getting source location from there might be a bit tricky) for when there is both a `byval` and an alignment specified.

Or you could use grep with `--output-ll` as noted by Johan https://github.com/ldc-developers/ldc/issues/4265#issuecomment-1376424944 although this will be with that `workaroundIssue1356` applied.

« First   ‹ Prev
1 2 3