btdu - a sampling disk usage profiler for btrfs (written in D) (page 2)

November 27, 2020

Re: btdu - a sampling disk usage profiler for btrfs (written in D)

Posted by Vladimir Panteleev
in reply to Ola Fosheim Grøstad

Permalink

Vladimir Panteleev

Posted in reply to Ola Fosheim Grøstad

Permalink

On Tuesday, 10 November 2020 at 13:55:52 UTC, Ola Fosheim Grøstad wrote:
> On Tuesday, 10 November 2020 at 10:42:09 UTC, Vladimir Panteleev wrote:
>> But, by itself the GC doesn't add much latency to introduce stutter in the UI - a GC scan is generally quick enough that the UI doesn't feel laggy or stuttery. The problem is that the GC is waiting for all threads to finish their ioctls, while the program otherwise is completely suspended. This affects not just UI, but throughput.
>
> Would a thread local GC with reference counted shared objects work for your use case?

I don't think there is a simple answer here.

Removing the global GC lock for allocations, and allowing each thread to allocate from its own private pool, would greatly improve the performance of multi-threaded applications. For example, the global GC lock was what was preventing moving more processing in Dustmite to worker threads - currently, it's often better to keep everything in one thread for GC-dependent code instead of using worker threads specifically because of the overhead of the global GC lock. I think such a modification would be possible without radical changes to the language or GC design, but it's possible I'm missing something.

However, that wouldn't help in this case, because the problem here doesn't come from allocations, but from the stop-the-world aspect of the GC.

A theoretical non-stop-the-world GC would indeed help in this situation, but such a GC is only possible if you restrict the language to a subset, such that all copies of managed objects are always visible to the compiler. It would require all @system / extern(C) code to be carefully re-scrutinized. In short, this would essentially be a different language (based on D). I don't think we can get there from where we are now.

On Friday, 27 November 2020 at 10:20:41 UTC, Vladimir Panteleev wrote: > However, that wouldn't help in this case, because the problem here doesn't come from allocations, but from the stop-the-world aspect of the GC. > > A theoretical non-stop-the-world GC would indeed help in this situation, but such a GC is only possible if you restrict the language to a subset, such that all copies of managed objects are always visible to the compiler. It would require all @system / extern(C) code to be carefully re-scrutinized. In short, this would essentially be a different language (based on D). I don't think we can get there from where we are now. Hm, but it would only stop a single thread. You would not be allowed to share nonpinned objects with other threads.

On Friday, 27 November 2020 at 10:26:18 UTC, Ola Fosheim Grostad wrote: > On Friday, 27 November 2020 at 10:20:41 UTC, Vladimir Panteleev wrote: >> However, that wouldn't help in this case, because the problem here doesn't come from allocations, but from the stop-the-world aspect of the GC. >> >> A theoretical non-stop-the-world GC would indeed help in this situation, but such a GC is only possible if you restrict the language to a subset, such that all copies of managed objects are always visible to the compiler. It would require all @system / extern(C) code to be carefully re-scrutinized. In short, this would essentially be a different language (based on D). I don't think we can get there from where we are now. > > Hm, but it would only stop a single thread. You would not be allowed to share nonpinned objects with other threads. Right, so that's another imposed limitation of such a GC. You'd still also lose the ability to memcpy or memset a struct that had managed pointers, as that would break the reference count that the GC relies on to work. It would definitely solve the performance problem, but it would be such a radical change that it would essentially be a different language (and debatedly no longer a system-programming one).

On Friday, 27 November 2020 at 10:31:21 UTC, Vladimir Panteleev wrote: > Right, so that's another imposed limitation of such a GC. You'd still also lose the ability to memcpy or memset a struct that had managed pointers, as that would break the reference count that the GC relies on to work. It would definitely solve the performance problem, but it would be such a radical change that it would essentially be a different language (and debatedly no longer a system-programming one). I think it is no different than shared_ptr. I also think one can add some safety through global pointer analysis for existing code. Let the pinning be done by a counter, when you pin the object you get a smartpointer borrowed_ptr... when the count goes to zero, the object is local again.

On Friday, 27 November 2020 at 10:41:54 UTC, Ola Fosheim Grostad wrote: > > I think it is no different than shared_ptr. I also think one can add some safety through global pointer analysis for existing code. Let the pinning be done by a counter, when you pin the object you get a smartpointer borrowed_ptr... when the count goes to zero, the object is local again. Reference counting which also means multiple ownership doesn't play well well with any borrowing mechanism. Reason is that the compiler cannot determine the borrow checker at compile time and must insert runtime checks if you are allowed to borrow or not. This reduces the performance, probably not a lot but still. Let's leave borrow checker outside D and just have good old reference counting, that's what we need. Speaking of parallel GC, even if we have atomic reference counting or other parallel method, the underlying malloc/free must also be non blocking or at least reduce the locking as much as possible. Many libc implementations have this already though.

On Friday, 27 November 2020 at 11:31:48 UTC, IGotD- wrote: > Reference counting which also means multiple ownership doesn't play well well with any borrowing mechanism. Reason is that the compiler cannot determine the borrow checker at compile time and must insert runtime checks if you are allowed to borrow or not. This reduces the performance, probably not a lot but still. Let's leave borrow checker outside D and just have good old reference counting, that's what we need. You can view ARC as a borrowchecker. If the ARC optimizer succeeds globally then all acquire/release can be omitted for that type (or that call graph path). The problem is interior pointers which would require fat pointers or borrowchecker... But again you could rewrite those fat pointers if ARC optimization is highly successful (if the code validates like it would for a borrow checker)

On Friday, 27 November 2020 at 12:00:40 UTC, Ola Fosheim Grostad wrote: > You can view ARC as a borrowchecker. If the ARC optimizer succeeds globally then all acquire/release can be omitted for that type (or that call graph path). > > The problem is interior pointers which would require fat pointers or borrowchecker... > > But again you could rewrite those fat pointers if ARC optimization is highly successful (if the code validates like it would for a borrow Sadly, templated types that depend on struct size and field offsets could be a problem for such rewrites... So the compiler would have to annotate structs with dependencies...

Forums