low-latency GC (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » low-latency GC (page 3)

December 06, 2020

Re: low-latency GC

Posted by Ola Fosheim Grostad
in reply to Paulo Pinto

Ola Fosheim Grostad

Posted in reply to Paulo Pinto

On Sunday, 6 December 2020 at 14:44:25 UTC, Paulo Pinto wrote:
> And while on the subject of low level programming in JVM or .NET.
>
> https://www.infoq.com/news/2020/12/net-5-runtime-improvements/

Didnt say anything about low level, only simd intrinsics, which isnt really low level?

It also stated "When it came to something that is pure CPU raw computation doing nothing but number crunching, in general, you can still eke out better performance if you really focus on "pedal to the metal" with your C/C++ code."

So it is more of a Go contender, and Go is not a systems level language... Apples and oranges.

> As I already mentioned in another thread, rebooting the language to pull in imaginary crowds will only do more damage than good, while the ones deemed unusable by the same imaginary crowd just keep winning market share, slowly and steady, even if takes yet another couple of years.

A fair number of people here are in that imaginary crowd.
So, I guess it isnt imaginary...

December 06, 2020

Re: low-latency GC

Posted by Bruce Carneal
in reply to Ola Fosheim Grostad

Bruce Carneal

Posted in reply to Ola Fosheim Grostad

On Sunday, 6 December 2020 at 16:42:00 UTC, Ola Fosheim Grostad wrote:
> On Sunday, 6 December 2020 at 14:44:25 UTC, Paulo Pinto wrote:
>> And while on the subject of low level programming in JVM or .NET.
>>
>> https://www.infoq.com/news/2020/12/net-5-runtime-improvements/
>
> Didnt say anything about low level, only simd intrinsics, which isnt really low level?
>
> It also stated "When it came to something that is pure CPU raw computation doing nothing but number crunching, in general, you can still eke out better performance if you really focus on "pedal to the metal" with your C/C++ code."

So you must make the familiar "ease-of-programming" vs "x% of performance" choice, where 'x' is presumably much smaller than earlier.

>
> So it is more of a Go contender, and Go is not a systems level language... Apples and oranges.
>

D is good for systems level work but that's not all.  I use it for projects where, in the past, I'd have split the work between two languages (Python and C/C++).  I much prefer working with a single language that spans the problem space.

If there is a way to extend D's reach with zero or a near-zero complexity increase as seen by the programmer, I believe we should (as/when resources allow of course).

December 06, 2020

Re: low-latency GC

Posted by IGotD-
in reply to Ola Fosheim Grøstad

IGotD-

Posted in reply to Ola Fosheim Grøstad

On Sunday, 6 December 2020 at 15:44:32 UTC, Ola Fosheim Grøstad wrote:
>
> It was more a hypothetical, as read barriers are too expensive. But write barriers should be ok, so a single-threaded incremental collector could work well if D takes a principled stance on objects not being 'shared' not being handed over to other threads without pinning them in the GC.
>
> Maybe a better option for D than ARC, as it is closer to what people are used to.

In kernel programming there are plenty of atomic reference counted objects. The reason is that is you have kernel that supports SMP you must have it because you don't really know which CPU is working with a structure at any given time. These are often manually reference counted objects, which can lead to memory leaking bugs but they are not that hard to find.

Is automatic atomic reference counting a contender for kernels? In kernels you want to reduce the increase/decrease of the counts. Therefore the Rust approach using 'clone' is better unless there is some optimizer that can figure it out. Performance is important in kernels, you don't want the kernel to steal useful CPU time that otherwise should go to programs.

In general I think that reference counting should be supported in D, not only implicitly but also under the hood with fat pointers. This will make D more attractive to performance applications. Another advantage is the reference counting can use malloc/free directly if needed without any complicated GC layer with associated meta data.

Also tracing GC in a kernel is my opinion not desirable. For the reason I previously mentioned, you want to reduce meta data, you want reduce CPU time, you want to reduce fragmentation. Special allocators for structures are often used.

December 06, 2020

Re: low-latency GC

Posted by Ola Fosheim Grostad
in reply to IGotD-

Ola Fosheim Grostad

Posted in reply to IGotD-

On Sunday, 6 December 2020 at 17:35:19 UTC, IGotD- wrote:
> Is automatic atomic reference counting a contender for kernels? In kernels you want to reduce the increase/decrease of the counts. Therefore the Rust approach using 'clone' is better unless there is some optimizer that can figure it out. Performance is important in kernels, you don't want the kernel to steal useful CPU time that otherwise should go to programs.

I am not sure if kernel authors want autmatic memory management, they tend to want full control and transparency. Maybe something people who write device drivers would consider.

> In general I think that reference counting should be supported in D, not only implicitly but also under the hood with fat pointers. This will make D more attractive to performance applications. Another advantage is the reference counting can use malloc/free directly if needed without any complicated GC layer with associated meta data.

Yes, I would like to see it, just expect that there will be protests when people realize that they have to make ownership explicit.

> Also tracing GC in a kernel is my opinion not desirable. For the reason I previously mentioned, you want to reduce meta data, you want reduce CPU time, you want to reduce fragmentation. Special allocators for structures are often used.

Yes, an ARC solution should support fixed size allocators for types that are frequently allocated to get better speed.

December 06, 2020

Re: low-latency GC

Posted by Ola Fosheim Grostad
in reply to Bruce Carneal

Ola Fosheim Grostad

Posted in reply to Bruce Carneal

On Sunday, 6 December 2020 at 17:28:52 UTC, Bruce Carneal wrote:
> D is good for systems level work but that's not all.  I use it for projects where, in the past, I'd have split the work between two languages (Python and C/C++).  I much prefer working with a single language that spans the problem space.

My impression from reading the forums is that people either use D as a replacement for C/C++ or Python/numpy, so I think your experience covers the essential use case scenario that is dominating current D usage? Any improvements have to improve both dimension, I agree.

> If there is a way to extend D's reach with zero or a near-zero complexity increase as seen by the programmer, I believe we should (as/when resources allow of course).

ARC involves a complexity increase, to some extent. Library authors have to think a bit more principled about when objects should be phased out and destructed, which I think tend to lead to better programs. It would also allow for faster precise collection. So it could be beneficial for all.

December 08, 2020

Re: low-latency GC

Posted by oddp
in reply to Bruce Carneal

oddp

Posted in reply to Bruce Carneal

On 06.12.20 06:16, Bruce Carneal via Digitalmars-d-learn wrote:
> How difficult would it be to add a, selectable, low-latency GC to dlang?
> 
> Is it closer to "we cant get there from here" or "no big deal if you already have the low-latency GC in hand"?
> 
> I've heard Walter mention performance issues (write barriers IIRC).  I'm also interested in the GC-flavor performance trade offs but here I'm just asking about feasibility.
> 

What our closest competition, Nim, is up to with their mark-and-sweep replacement ORC [1]:

ORC is the existing ARC algorithm (first shipped in version 1.2) plus a cycle collector

[...]

ARC is Nim’s pure reference-counting GC, however, many reference count operations are optimized away: Thanks to move semantics, the construction of a data structure does not involve RC operations. And thanks to “cursor inference”, another innovation of Nim’s ARC implementation, common data structure traversals do not involve RC operations either!

[...]

Benchmark:

Metric/algorithm         ORC    Mark&Sweep
Latency (Avg)      320.49 us      65.31 ms
Latency (Max)        6.24 ms     204.79 ms
Requests/sec        30963.96        282.69
Transfer/sec         1.48 MB      13.80 KB
Max memory           137 MiB       153 MiB

That’s right, ORC is over 100 times faster than the M&S GC. The reason is that ORC only touches memory that the mutator touches, too.

[...]

- uses 2x less memory than classical GCs
- can be orders of magnitudes faster in throughput
- offers sub-millisecond latencies
- suited for (hard) realtime systems
- no “stop the world” phase
- oblivious to the size of the heap or the used stack space.

[1] https://nim-lang.org/blog/2020/12/08/introducing-orc.html

December 08, 2020

Re: low-latency GC

Posted by oddp
in reply to Bruce Carneal

oddp

Posted in reply to Bruce Carneal

On 06.12.20 06:16, Bruce Carneal via Digitalmars-d-learn wrote:
> How difficult would it be to add a, selectable, low-latency GC to dlang?
> 
> Is it closer to "we cant get there from here" or "no big deal if you already have the low-latency GC in hand"?
> 
> I've heard Walter mention performance issues (write barriers IIRC).  I'm also interested in the GC-flavor performance trade offs but here I'm just asking about feasibility.
> 
What our closest competition, Nim, is up to with their mark-and-sweep replacement ORC [1]:

ORC is the existing ARC algorithm (first shipped in version 1.2) plus a cycle collector

[...]

ARC is Nim’s pure reference-counting GC, however, many reference count operations are optimized away: Thanks to move semantics, the construction of a data structure does not involve RC operations. And thanks to “cursor inference”, another innovation of Nim’s ARC implementation, common data structure traversals do not involve RC operations either!

[...]

Benchmark:

Metric/algorithm         ORC    Mark&Sweep
Latency (Avg)      320.49 us      65.31 ms
Latency (Max)        6.24 ms     204.79 ms
Requests/sec        30963.96        282.69
Transfer/sec         1.48 MB      13.80 KB
Max memory           137 MiB       153 MiB

That’s right, ORC is over 100 times faster than the M&S GC. The reason is that ORC only touches memory that the mutator touches, too.

[...]

- uses 2x less memory than classical GCs
- can be orders of magnitudes faster in throughput
- offers sub-millisecond latencies
- suited for (hard) realtime systems
- no “stop the world” phase
- oblivious to the size of the heap or the used stack space.

There's also some discussion on /r/programming [2] and hackernews [3], but it hasn't taken off yet.

[1] https://nim-lang.org/blog/2020/12/08/introducing-orc.html
[2] https://old.reddit.com/r/programming/comments/k95cc5/introducing_orc_nim_nextgen_memory_management/
[3] https://news.ycombinator.com/item?id=25345770

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation