More radical ideas about gc and reference counting (page 33) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » More radical ideas about gc and reference counting (page 33)

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Steven Schveighoffer
in reply to Manu

Steven Schveighoffer

Posted in reply to Manu

On Mon, 12 May 2014 03:39:12 -0400, Manu via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

> On 12 May 2014 17:24, Dicebot via Digitalmars-d

>> You will like Don's talk this year ;)
>
> I'm super-disappointed I can't make it this year!

?!! http://dconf.org/2014/talks/evans.html

> We were evicted from
> our house, have to move, and I can't bail for a week and leave that
> all on my mrs while she kicks along the fulltime job :(

Oh that sucks...

-Steve

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On Sun, 11 May 2014 16:33:04 -0400, Walter Bright <newshound2@digitalmars.com> wrote:

> On 5/11/2014 2:48 AM, Benjamin Thaut wrote:
>> Mostly percise doesn't help. Its either fully percise or beeing stuck with a
>> impercise mark & sweep.
>
> This is not correct. It helps because most of the false pointers will be in the heap, and the heap will be accurately scanned, nearly eliminating false references to garbage.

It doesn't matter where the false pointers are. The largest issue with false pointers is not how many false pointers there are. It only matters how large the block is that it "points" at. The larger your blocks get, the more likely they are "pointed" at by the stack. On 32-bit systems, allocate 1/256th of your memory space (i.e. 16.5MB), and the likelihood of random data on the stack pointing at it is roughly 1/256. This problem is just about eliminated with 64-bit pointers.

> Yes. D, for example, requires that objects not be self-referential for this reason.

As previously stated, self referencing does not preclude GC moving. This statement is simply false, you can self reference in D for objects. You cannot for structs, but not because of a possibility for the moving GC, but because of the requirement to be able to move a struct instance.

And in fact, even if it's forbidden, "requires" is too strong a word -- there is no static or runtime prevention of this.

-Steve

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Manu
in reply to Walter Bright

Manu

Posted in reply to Walter Bright

On 12 May 2014 18:45, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 5/12/2014 12:12 AM, Manu via Digitalmars-d wrote:
>>
>> What? You've never offered me a practical solution.
>
>
> I have, you've just rejected them.
>
>
>> What do I do?
>
>
> 1. you can simply do C++ style memory management. shared_ptr<>, etc.

I already have C++. I don't want another one.

> 2. you can have the non-pausible code running in a thread that is not registered with the gc, so the gc won't pause it. This requires that this thread not allocate gc memory, but it can use gc memory allocated by other threads, as long as those other threads retain a root to it.

It still sounds the same as manual memory management though in
practise, like you say, the other thread must maintain a root to it,
which means I need to manually retain it somehow, and when the worker
thread finishes with it, it needs to send a signal or something back
to say it's done so it can be released... it sounds more inconvenient
than direct manual memory management in practise.
Sounds slow too. Dec-ing a ref is certainly faster than inter-thread
communication.

This also makes library calls into effective RPC's if I can't call into them from the active threads.

How long is a collect liable to take in the event the GC threads need to collect? Am I likely to lose my service threads for 100s of milliseconds at a time?

I'll think on it, but I don't think there's anything practically applicable here, and it really sounds like it creates a lot more trouble and complexity than it addresses.

> 3. D allows you to create and use any memory management scheme you want. You are simply not locked into GC. For example, I rewrote my Empire game into D and it did not do any allocation at all - no GC, not even malloc. I know that you'll need to do allocation, I'm just pointing out that GC allocations and pauses are hardly inevitable.

C++ lets me create any memory management scheme I like by the same argument.
I lose all the parts of the language that implicitly depend on the GC,
and 3rd party libs (that don't care about me and my project).
Why isn't it a reasonable argument to say that not having access to
libraries is completely unrealistic? You can't write modern software
without extensive access to libraries. Period.

I've said before, I don't want to be a second class citizen with access to only a subset of the language.

> 4. for my part, I have implemented @nogc so you can track down gc usage in code. I have also been working towards refactoring Phobos to eliminate unnecessary GC allocations and provide alternatives that do not allocate GC memory. Unfortunately, these PR's just sit there.

The effort is appreciated, but it was never a solution. I said @nogc was the exact wrong approach to my situation right from the start, and I predicted that would be used as an argument the moment it appeared. Tracking down GC usage isn't helpful when it leads you to a lib call that you can't change. And again, eliminating useful and productive parts of the language is not a goal we should be shooting for.

I'll find it useful in the high-performance realtime bits; ie, the
bits that I typically disassemble and scrutinise after every compile.
But that's not what we're discussing here.
I'm happy with D for my realtime code, I have the low-level tools I
need to make the real-time code run fast. @nogc is a little bonus that
will allow to guarantee no sneaky allocations are finding their way
into the fast code, and that might save a little time, but I never
really saw that as a significant problem in the first place.

What we're talking about is productivity, convenience and safety in the non-realtime code. The vast majority of code, that programmers spend most of their days working on.

Consider it this way... why do you have all these features in D that
cause implicit allocation if you don't feel they're useful and
important parts of the language?
Assuming you do feel they're important parts of the language, why do
you feel it's okay to tell me I don't deserve access to them?
Surely I'm *exactly* the target market for D...? High-pressure,
intensive production environments, still depending exclusively on
native code, with code teams often in the realm of 50-100, containing
many juniors, aggressive schedules which can't afford to waste
engineering hours... this is a code environment that's prone to MANY
bugs, and countless wasted hours as a consequence.
Convenience and safety are important to me... I don't know what you
think I'm interested in D for if you think I should be happy to
abandon a whole chunk of the language, just because I have a couple of
realtime threads :/

> 5. you can divide your app into multiple processes that communicate via interprocess communication. One of them pausing will not pause the others. You can even do things like turn off the GC collections in those processes, and when they run out of memory just kill them and restart them. (This is not an absurd idea, I've heard of people doing that effectively.)

Most of the platforms I work on barely have operating systems.

> 6. If you call C++ libs, they won't be allocating memory with the D GC. D
> code can call C++ code. If you run those C++ libs in separate threads, they
> won't get paused, either (see (2)).

Whether this is practical or not thoroughly depends on the lib.
Maybe this concept can be applicable in some small places, but it's
not a salvation. I don't think this sufficient addresses the problems.
None of the problems are actually going away, they're just moved
somewhere else

> 7. The Warp program I wrote avoids GC pauses by allocating ephemeral memory with malloc/free, and (ironically) only using GC for persistent data structures that should never be free'd. Then, I just turned off GC collections, because they'd never free anything anyway.

That idea is obviously not applicable in my environment. Resource usage is dynamic and fluid.

> 8. you can disable and enable collections, and you can cause collections to be run at times when nothing is happening (like when the user has not input anything for a while).

If I disable collections, then I just crash when I receive that network packet? I'm back at manual memory management in practise.

I also don't think it's reasonable to assume there will just be 'times
when nothing is happening'. That's not how games work.
Games are often really fast paced, and even if they're not,
significant stuttering in the animation is usually considered a
non-ship-able bug.

https://www.youtube.com/watch?v=rqjOXR9QnMo
https://www.youtube.com/watch?v=giiZMktZrNI
https://www.youtube.com/watch?v=LoPC_ibBJiQ
Where would you manually issue collects?

> The point is, the fact that D has 'new' that allocates GC memory simply does not mean you are obliged to use it.

D also has ~, closures, dynamic arrays, even array literals. There are various things that create implicit GC allocations. And library calls...

> The GC is not going to pause your
> program if you don't allocate with it. Nor will it ever run a collection at
> uncontrollable, random, asynchronous times.

Those claims come with massive dependency on very specific restrictions, like abandoning part of the language and moving library calls to separate threads and accessing them via RPC or something like that.

None of your suggestions sound practical, or like they'd result in any
less effort or complexity than manual management in the first place
which everyone is already accustomed to. I'm almost certainly
sacrificing safety in every case.
You can't then go on to say you gave me plenty of options, but I
rejected them, when none of them were really options.

I wonder if you have a good conception of the scope/scale of the
software we write. It's not comparable to Empire, or a linker, or a
compiler, or a web server, or many things at all really. Games are
some of the biggest, broadest software projects there are, very
tightly integrated, with some of the most stringent operating
requirements. They're also growing steadily... it's harder and harder
to manage the scope without helpful language tools; this is why you
see so many gamedevs in the independent space flirting with 'modern'
languages like C#. For games with extremely small scope that don't
push the platform (indy/casual games), this is sometimes okay, but
there are plenty of cases where it has been a complete disaster as the
scope has grown towards a more traditional 'big game'. My mates game I
helped them with from last weekend is 'mid-scoped', but it's grown to
saturate the PS4, and the GC is causing them a nightmare... right at
the end of the project when trying to finalise the build for shipping,
precisely as I've always predicted.
C# is a better productivity experience than C++, and it allowed them
to do a lot more work in a lot less time with a lot fewer people. But
it's clearly not really compatible with the workload, and I think the
future of the industry needs to do a lot better.

I've said before, we are an industry in desperate need of salvation, it's LONG overdue, and I want something that actually works well for us, not a crappy set of compromises because the language has a fundamental incompatibility with my industry :/ ... It doesn't have to be that way.

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by bearophile
in reply to Manu

bearophile

Posted in reply to Manu

Manu:

> we are an industry in desperate need of salvation,
> it's LONG overdue, and I want something that actually works well for us, not a crappy set of compromises because the
> language has a fundamental incompatibility with my industry :/

Perhaps the game industry has to start the creation of a language designed for its needs, like the scientific people have done (Julia), the browser ones (Rust), the Web ones have done, etc. With lot of work in less than ten years you can have an usable language.

Bye,
bearophile

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Ola Fosheim Grøstad
in reply to Walter Bright

Ola Fosheim Grøstad

Posted in reply to Walter Bright

On Monday, 12 May 2014 at 08:45:56 UTC, Walter Bright wrote:
> 2. you can have the non-pausible code running in a thread that is not registered with the gc, so the gc won't pause it. This requires that this thread not allocate gc memory, but it can use gc memory allocated by other threads, as long as those other threads retain a root to it.

This and @nogc is a very promising trend, but you should still be able to partion the GC search space.

The key to controlled real time performance is to partition the search space, that goes for anything algorithmic; memory management inclusive. That applies to any scheme like owned, ARC, GC etc. It makes little sense having to trace everything if only the physics engine is the one churning memory like crazy.

And fork() is not a solution without extensive structuring of allocations. Stuff like physics touch all pages the physics objects are onto like 100+ times per second, so you need to group allocations to pages based on usage patterns. (No point in forking if you get write traps on 50.000 pages the next time the physics engine run :-).

> 3. D allows you to create and use any memory management scheme you want. You are simply not locked into GC. For example, I rewrote my Empire game into D and it did not do any allocation at all - no GC, not even malloc. I know that you'll need to do allocation, I'm just pointing out that GC allocations and pauses are hardly inevitable.

This is no doubt the best approach for a MMO client. You have a window on the world and cache as much as possible both to memory and disk. Basically get as much memory from the OS as you can hoard (with headroom set by heuristics) when your application has focus and release caches that are outside the window when you loose focus to another application. This means you need a dedicated runtime for games that can delay GC collection and eat into the caches when you are low on computational resources.

You also need to distinguish between memory that is locked to RAM and memory that can swap. You should always lock memory for real time threads. So if you want to GC this, you need a GC that support multiple heaps.

(Some hardware might also distinguish between RAM that is accessible by the GPU or that has different characteristics in areas such as persistence or performance.)

> 5. you can divide your app into multiple processes that communicate via interprocess communication. One of them pausing will not pause the others. You can even do things like turn off

Why processes and not threads with their own local GC?

> 6. If you call C++ libs, they won't be allocating memory with the D GC. D code can call C++ code. If you run those C++ libs

But what happens if that C++ code does "new HeapStuff(D_allocated_memory)" and then calls back to D? You cannot presume that C++ coders have the discipline to always allocate local memory from the stack, so basically you cannot GC collect while there are C++ functions on the stack. In order to get there the GC collector needs to understand the malloc heap and trace that one too.

Auditing all C++ libraries I want to use is too much work, and tracing the malloc heap is too time consuming, so at the end of the day you'll get a more robust environment by only scanning (tracing) the stacks when there is only D function calls on the stack, with a precise collector.

That means you need to partition the search space otherwise the collector might not run in time.

Freezing the world is really ugly. Most applications are actually soft real time.

Games are part hard real time, part soft real time. The difference between games and other applications is that there is less headroom so you have to do more work to make the "glitches" and "stuttering" occur sufficiently seldom to be perceived as acceptable by the end user. But games are not special.

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Walter Bright
in reply to Marco Leise

Walter Bright

Posted in reply to Marco Leise

On 5/12/2014 3:18 AM, Marco Leise wrote:
> Your were arguing against Michel Fortin's proposal on the
> surface, when your requirement cannot even be fulfilled
> theoretically it seems.

Lots of people use ARC without a GC.


> Which could mean that you don't like
> the idea of replacing D's GC with an ARC solution.

I don't like the idea of replacing D's GC with ARC. But for different reasons.

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Walter Bright
in reply to Timon Gehr

Walter Bright

Posted in reply to Timon Gehr

On 5/12/2014 5:15 AM, Timon Gehr wrote:
> On 05/12/2014 10:54 AM, Walter Bright wrote:
>> On 5/11/2014 10:57 PM, Marco Leise wrote:
>>> Am Sun, 11 May 2014 17:50:25 -0700
>>> schrieb Walter Bright <newshound2@digitalmars.com>:
>>>
>>>> As long as those pointers don't escape. Am I right in that one cannot
>>>> store a
>>>> borrowed pointer into a global data structure?
>>>
>>> Right, and that's the point and entirely positive-to-do™.
>>
>> This means that a global data structure in Rust has to decide what
>> memory allocation scheme its contents must use,
>
> Global variables are banned in Rust code outside of unsafe blocks.

Global can also mean assigning through a reference passed as a parameter.

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Walter Bright
in reply to Dicebot

Walter Bright

Posted in reply to Dicebot

On 5/12/2014 2:12 AM, Dicebot wrote:
> I think this is more of library writing culture problem than engineering
> problem. High quality library shouldn't rely on any internal allocations at all,
> deferring this decision to user code. Otherwise you will eventually have
> problems, GC or not.

Consider my PR:

 https://github.com/D-Programming-Language/phobos/pull/2149

This is exactly what it does - it 'pushes' the decisions about allocating memory up out of the library to the user. I suspect a great deal of storage allocation can be removed from Phobos with this technique, without sacrificing performance, flexibility, or memory safety. (In fact, it improves on performance and flexibility!)

I also agree with your larger point that if you are relying on an unknown library for time critical code, and that library was not designed with time criticality guarantees in mind, you're going to have nothing but trouble. Regardless of GC or RC.

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Manu
in reply to bearophile

Manu

Posted in reply to bearophile

On 13 May 2014 02:16, bearophile via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> Manu:
>
>
>> we are an industry in desperate need of salvation,
>> it's LONG overdue, and I want something that actually works well for us,
>> not a crappy set of compromises because the
>> language has a fundamental incompatibility with my industry :/
>
>
> Perhaps the game industry has to start the creation of a language designed for its needs, like the scientific people have done (Julia), the browser ones (Rust), the Web ones have done, etc. With lot of work in less than ten years you can have an usable language.

But D is *so close*... and I like it! >_<

I have to say that this discussion has certainly left me somewhat
intrigued by Rust though.
I've never given it a fair go because I find the syntax so distasteful
and deterring.
I wonder if there's a market for a rust fork that re-skin's the language ;)

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Dicebot
in reply to Walter Bright

Dicebot

Posted in reply to Walter Bright

On Monday, 12 May 2014 at 17:03:18 UTC, Walter Bright wrote:
> On 5/12/2014 2:12 AM, Dicebot wrote:
>> I think this is more of library writing culture problem than engineering
>> problem. High quality library shouldn't rely on any internal allocations at all,
>> deferring this decision to user code. Otherwise you will eventually have
>> problems, GC or not.
>
> Consider my PR:
>
>  https://github.com/D-Programming-Language/phobos/pull/2149
>
> This is exactly what it does - it 'pushes' the decisions about allocating memory up out of the library to the user. I suspect a great deal of storage allocation can be removed from Phobos with this technique, without sacrificing performance, flexibility, or memory safety. (In fact, it improves on performance and flexibility!)

We have already had discussion where I did state that current @nogc implementation is not robust enough and failed to explain the use case for weaker @nogc clearly. Conclusion was that we should return to this topic after Don's DConf talk ;)

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation