More radical ideas about gc and reference counting (page 32) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » More radical ideas about gc and reference counting (page 32)

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Paulo Pinto
in reply to John Colvin

Paulo Pinto

Posted in reply to John Colvin

On Monday, 12 May 2014 at 09:05:39 UTC, John Colvin wrote:
> On Monday, 12 May 2014 at 08:45:56 UTC, Walter Bright wrote:
>> On 5/12/2014 12:12 AM, Manu via Digitalmars-d wrote:
>>> What? You've never offered me a practical solution.
>>
>> I have, you've just rejected them.
>>
>>
>>> What do I do?
>>
>> 1. you can simply do C++ style memory management. shared_ptr<>, etc.
>>
>> 2. you can have the non-pausible code running in a thread that is not registered with the gc, so the gc won't pause it. This requires that this thread not allocate gc memory, but it can use gc memory allocated by other threads, as long as those other threads retain a root to it.
>>
>> 3. D allows you to create and use any memory management scheme you want. You are simply not locked into GC. For example, I rewrote my Empire game into D and it did not do any allocation at all - no GC, not even malloc. I know that you'll need to do allocation, I'm just pointing out that GC allocations and pauses are hardly inevitable.
>>
>> 4. for my part, I have implemented @nogc so you can track down gc usage in code. I have also been working towards refactoring Phobos to eliminate unnecessary GC allocations and provide alternatives that do not allocate GC memory. Unfortunately, these PR's just sit there.
>>
>> 5. you can divide your app into multiple processes that communicate via interprocess communication. One of them pausing will not pause the others. You can even do things like turn off the GC collections in those processes, and when they run out of memory just kill them and restart them. (This is not an absurd idea, I've heard of people doing that effectively.)
>>
>> 6. If you call C++ libs, they won't be allocating memory with the D GC. D code can call C++ code. If you run those C++ libs in separate threads, they won't get paused, either (see (2)).
>>
>> 7. The Warp program I wrote avoids GC pauses by allocating ephemeral memory with malloc/free, and (ironically) only using GC for persistent data structures that should never be free'd. Then, I just turned off GC collections, because they'd never free anything anyway.
>>
>> 8. you can disable and enable collections, and you can cause collections to be run at times when nothing is happening (like when the user has not input anything for a while).
>>
>>
>> The point is, the fact that D has 'new' that allocates GC memory simply does not mean you are obliged to use it. The GC is not going to pause your program if you don't allocate with it. Nor will it ever run a collection at uncontrollable, random, asynchronous times.
>
> The only solutions to the libraries problem that I can see here require drastic separation of calls to said libraries from any even vaguely time critical code. This is quite restrictive.
>
> Yes, calling badly optimised libraries from a hot loop is a bad idea anyway, but the GC changes this from
>
> "well it might take a little more time than usual, but we can spare a few nano-seconds and it'll show up easily in the profiler"
>
> to
>
> "it might, sometimes, cause the GC to run a full collection on our 3.96 / 4.00 GB heap with an associated half-second pause."
>
> And here we go again, "I can't use that library, it's memory management scheme is incompatible with my needs, I'll have to rewrite it myself..."

A badly placed malloc() in library code can also trigger OS
virtualization mechanisms and make processes being swapped out to
disk, with the respective overhead in disk access and time spent
on kernel code.

So it is just not the "we can spare a few nano-seconds".

--
Paulo

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Marco Leise
in reply to Walter Bright

Marco Leise

Posted in reply to Walter Bright

Am Sun, 11 May 2014 22:11:28 -0700
schrieb Walter Bright <newshound2@digitalmars.com>:

> > But I thought ARC cannot be designed without GC to resolve cycles.
> 
> It can be, there are various schemes to deal with that, including "don't create cycles". GC is just one of them.
> 
> http://en.wikipedia.org/wiki/Reference_counting#Dealing_with_reference_cycles

Yes that article mentions:
a) "avoid creating them"
b) "explicitly forbid reference cycles"
c) "Judicious use of weak references"
d) "manually track that data structure's lifetime"
e) "tracing garbage collector"
f) adding to a root list all objects whose reference
   count is decremented to a non-zero value and periodically
   searching all objects reachable from those roots.

To pick up your statement again: »Your proposal still relies on a GC to provide the memory safety, […] it is a hybrid ARC/GC system.«

a) and b) let's assume never creating cycles is not a feasible
          option in a systems programming language
c) and d) don't provide said memory safety
e) and f) ARE tracing garbage collectors

ergo: »But I thought ARC cannot be designed without GC to resolve cycles.«

Your were arguing against Michel Fortin's proposal on the surface, when your requirement cannot even be fulfilled theoretically it seems. Which could mean that you don't like the idea of replacing D's GC with an ARC solution.

»This is the best horse I could find for the price. It is
 pretty fast and ...«
»No, it still has four legs.«

-- 
Marco

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Marco Leise
in reply to Walter Bright

Marco Leise

Posted in reply to Walter Bright

Am Mon, 12 May 2014 01:54:58 -0700
schrieb Walter Bright <newshound2@digitalmars.com>:

> On 5/11/2014 10:57 PM, Marco Leise wrote:
> > Am Sun, 11 May 2014 17:50:25 -0700
> > schrieb Walter Bright <newshound2@digitalmars.com>:
> >
> >> As long as those pointers don't escape. Am I right in that one cannot store a borrowed pointer into a global data structure?
> >
> > Right, and that's the point and entirely positive-to-do™.
> 
> This means that a global data structure in Rust has to decide what memory allocation scheme its contents must use, and cannot (without tagging) mix memory allocation schemes.
> 
> For example, let's say a compiler has internally a single hash table of strings. With a GC, those strings can be statically allocated, or on the GC heap, or anything with a lifetime longer than the table's. But I don't see how this could work in Rust.

:( Good question. I have no idea.

-- 
Marco

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Tommi
in reply to sclytrack

Tommi

Posted in reply to sclytrack

On Sunday, 11 May 2014 at 21:43:06 UTC, sclytrack wrote:
> I like this owner/unique, borrow thing.
>
> @ is managed (currently reference counted)
> ~ is owner
> & is borrow

I like it too. But a few notes:

1) The managed pointer @T has been deprecated and you should use the standard library types Gc<T> and Rc<T> instead.

2) The owned pointer ~T has been largely removed from the language and you should use the standard library type Box<T> instead.

The basic idea is that if a function needs to have ownership of its argument, the function should take its argument by value. And if the function doesn't need the ownership, it should take its argument either by a mutable or immutable reference (they don't like to call it "borrowed pointer" anymore, it's called simply a "reference" now). Owned types get moved by default when you pass them to a function that takes its argument by value. You call the 'clone' method to make a copy of a variable of an owned type.

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Marco Leise
in reply to Paulo Pinto

Marco Leise

Posted in reply to Paulo Pinto

Am Mon, 12 May 2014 09:32:58 +0000
schrieb "Paulo Pinto" <pjmlp@progtools.org>:

> On Monday, 12 May 2014 at 09:05:39 UTC, John Colvin wrote:
> > On Monday, 12 May 2014 at 08:45:56 UTC, Walter Bright wrote:
> >> On 5/12/2014 12:12 AM, Manu via Digitalmars-d wrote:
> >>> What? You've never offered me a practical solution.
> >>
> >> I have, you've just rejected them.
> >>
> >>
> >>> What do I do?
> >>
> >> 1. you can simply do C++ style memory management. shared_ptr<>, etc.
> >>
> >> 2. you can have the non-pausible code running in a thread that is not registered with the gc, so the gc won't pause it. This requires that this thread not allocate gc memory, but it can use gc memory allocated by other threads, as long as those other threads retain a root to it.
> >>
> >> 3. D allows you to create and use any memory management scheme you want. You are simply not locked into GC. For example, I rewrote my Empire game into D and it did not do any allocation at all - no GC, not even malloc. I know that you'll need to do allocation, I'm just pointing out that GC allocations and pauses are hardly inevitable.
> >>
> >> 4. for my part, I have implemented @nogc so you can track down gc usage in code. I have also been working towards refactoring Phobos to eliminate unnecessary GC allocations and provide alternatives that do not allocate GC memory. Unfortunately, these PR's just sit there.
> >>
> >> 5. you can divide your app into multiple processes that communicate via interprocess communication. One of them pausing will not pause the others. You can even do things like turn off the GC collections in those processes, and when they run out of memory just kill them and restart them. (This is not an absurd idea, I've heard of people doing that effectively.)
> >>
> >> 6. If you call C++ libs, they won't be allocating memory with the D GC. D code can call C++ code. If you run those C++ libs in separate threads, they won't get paused, either (see (2)).
> >>
> >> 7. The Warp program I wrote avoids GC pauses by allocating ephemeral memory with malloc/free, and (ironically) only using GC for persistent data structures that should never be free'd. Then, I just turned off GC collections, because they'd never free anything anyway.
> >>
> >> 8. you can disable and enable collections, and you can cause collections to be run at times when nothing is happening (like when the user has not input anything for a while).
> >>
> >>
> >> The point is, the fact that D has 'new' that allocates GC memory simply does not mean you are obliged to use it. The GC is not going to pause your program if you don't allocate with it. Nor will it ever run a collection at uncontrollable, random, asynchronous times.
> >
> > The only solutions to the libraries problem that I can see here require drastic separation of calls to said libraries from any even vaguely time critical code. This is quite restrictive.
> >
> > Yes, calling badly optimised libraries from a hot loop is a bad idea anyway, but the GC changes this from
> >
> > "well it might take a little more time than usual, but we can spare a few nano-seconds and it'll show up easily in the profiler"
> >
> > to
> >
> > "it might, sometimes, cause the GC to run a full collection on our 3.96 / 4.00 GB heap with an associated half-second pause."
> >
> > And here we go again, "I can't use that library, it's memory management scheme is incompatible with my needs, I'll have to rewrite it myself..."
> 
> A badly placed malloc() in library code can also trigger OS virtualization mechanisms and make processes being swapped out to disk, with the respective overhead in disk access and time spent on kernel code.
> 
> So it is just not the "we can spare a few nano-seconds".
> 
> --
> Paulo

Yes, it could easily extend to a longer wait. I think we all know programs that hang while the system is swapping out. Don't let it get to that! A PC game would typically reduce caches or texture resolutions before running out of RAM.

Linux has a threshold of free pages it tries to keep available at any time to satisfy occasional small allocations. http://www.science.unitn.it/~fiorella/guidelinux/tlk/node39.html

All-in-all malloc is less likely to cause long pauses. It just allocates and doesn't ask itself if there might be dead memory to salvage to satisfy a request.

Time will tell if all "well written" D libraries will be @nogc to move the question of allocations to the user.

-- 
Marco

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by John Colvin
in reply to Marco Leise

John Colvin

Posted in reply to Marco Leise

On Monday, 12 May 2014 at 10:51:33 UTC, Marco Leise wrote:

> Time will tell if all "well written" D libraries will be @nogc
> to move the question of allocations to the user.

If there was such a thing as "weakly &nogc" then we would could do this

//some library function
void foo(OR, IR)(OR o, IR i) @weak-nogc
{
    //take things from i and put them in o
}

allocations would be possible during the execution of foo, but *only* in the implementations of OR and IR, which means that the developer gets the control and guarantees they need, but doesn't have to explicitly pre-allocate (which might not even be possible).

I don't see how it would work with UFCS though...

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Timon Gehr
in reply to Walter Bright

Timon Gehr

Posted in reply to Walter Bright

On 05/12/2014 02:50 AM, Walter Bright wrote:
> On 5/11/2014 1:59 PM, Timon Gehr wrote:
>> On 05/11/2014 10:05 PM, Walter Bright wrote:
>>> That's clearly an additional benefit of the borrowed pointer notion. But
>>> have you examined generated Rust code for the cost of inc/dec? I
>>> haven't, but I don't see any way they could avoid this (very expensive)
>>> cost without borrowed pointers.
>> Sure, but performance is the additional benefit.
>
> One constant theme in this thread, one I find baffling, is the regular
> dismissal of the performance implications of inc/dec.

Irrelevant, I'm not doing that.

(And again, reference counting is not the only allocation mechanism in Rust. AFAICT, most allocations use owned pointers.)

> Borrowed pointers are not necessary to support raw pointers  - this can be (and is in some
> systems) supported by simply wrapping the raw pointer with a dummy
> reference count.
> ...

I have no idea what this part is trying to bring across.

> The reason for borrowed pointers is performance.

No, it is safety. Raw pointers give you all of the performance.

> Rust would be non-viable without them.
>

True in that it would fail to meet its design goals. Rust provides a tracing garbage collector as well, so it is at least as viable as D regarding performance of safe memory management. (Probably more, actually, because it does not provide precision-unfriendly constructs such as undiscriminated unions.)

> I strongly suggest writing a snippet in [[insert your favorite proven
> technology RC language here]] and disassembling the result, and have a
> look at what inc/dec entails.
> ...

I don't have trouble seeing the cost of reference counting. (And it's you who claimed that this is going to be the only memory allocation scheme in use in Rust code.)

>
>>> The thing is, if the compiler is capable of figuring out these lifetimes
>>> by examining the code,
>>
>> There are explicit lifetime annotations in function signatures.
>
> Yes, because the compiler cannot figure it out itself, so the programmer
> has to annotate.
> ...

You are saying 'if the compiler is capable of figuring out these lifetimes by examining the code, then ...' and then you are saying that the compiler is incapable of figuring them out itself. What is it that we are arguing here? Are you saying the Rust compiler should infer all memory management automatically or that it cannot possibly do that, or something else?

>
>> It is simply not true that type systems are inherently restricted to
>> checking
>> trivial properties. They can be made as strong as mathematical logic
>> without
>> much fuss.
>
> Again, Rust would not need borrowed pointers nor the annotations for
> them if this knowledge could be deduced by the compiler. Heck, if the
> compiler can deduce lifetimes accurately,

It does not deduce anything. It checks that borrowed pointers do not outlive their source. Lifetime parameters are used to transport the required information across function signatures.

> you can get rid of GC and RC,
> and just have the compiler insert malloc/free in the right spots.
> ...

That's a form of GC, and I already acknowledged that global region inference exists, but noted that this is not what is used.

> Note that there is a Java version that does this partway, sometimes it
> will replace a GC object with a stack allocated one if it is successful
> in deducing that the object lifetime does not exceed the lifetime of the
> function.
> ...

I know. Also, inference is harder to control and less efficient than simply making the relevant information part of type signatures.

http://en.wikipedia.org/wiki/Region-based_memory_management#Region_inference

"This work was completed in 1995[9] and integrated into the ML Kit, a version of ML based on region allocation in place of garbage collection. This permitted a direct comparison between the two on medium-sized test programs, yielding widely varying results ("between 10 times faster and four times slower") depending on how "region-friendly" the program was; compile times, however, were on the order of minutes."

>
>>> Yes, one is & and the other is @.
>>
>> No, actually currently one is & and the other is RC<T> AFAIK.
>
> Then Rust changed again. The document I read on borrowed pointers was
> likely out of date, though it had no date on it.
> ...

Yes, most documents on Rust are at least slightly out of date.

>
>> RC<T> is not more general. It cannot refer to stack-allocated data,
>> for instance.
>
> So there is no general pointer type that has an unbounded lifetime?
> ...

How can it be general and have an unbounded lifetime and be safe?

>
>> Sure, borrowing is very lightweight, but ultimately what is most
>> important is
>> that it solves the problem of multiple incompatible pointer types and
>> makes the
>> type system more expressive as well.
>
> Adding more pointer types makes a type system more expressive, by
> definition.
> ...

No. Also, this is irrelevant, because I was highlighting the _importance_ of the fact that it does in this case.

>
>> A function that uses none of the specific pointer capabilities is more
>> general,
>> so what other choice of 'default' makes sense?
>
> A function that doesn't have restrictions on what can be done with the
> pointers passed to it.

You mean a function that only accepts GC-managed pointers?
(Not having _any_ restrictions at all does not make any sense.)
I don't see how this is better, because such a function is usable in less contexts.

> Borrowed pointers have restrictions on their
> usage - this is explicitly stated in the Rust documentation.
> ...

Values of every type have restrictions on their usage. If your function requires less capabilities from it's arguments, then it is usable in more cases. Why is this not the most sensible thing to do whenever possible?

>
>> Convenience and reusable functions means using borrowed pointers
>> whenever possible.
>
> Of course. And writing fast code means making your functions fast
> whenever possible!
> ...

What is your point here?

>
>>> It seems clear that the decisions of borrow/managed are going to pervade
>>> Rust code.
>>
>> But they are often obvious.
>
> I've written a lot of ref counted code in the past, enough to know that
> such is a pretty optimistic statement. Dealing with was a significant
> and ongoing drain on my time, especially when the programs and data
> structures got more complex.
> ...

You said you don't have any experience writing Rust code, and claimed that it is not a proven technology, so I fail to see how you can have any relevant experience with borrowing.

>...
>
>> Borrowed pointers are not even superficially similar to near*. They are
>> compatible with everything else, because they can store data that was
>> borrowed
>> from anywhere else.
>
> As long as those pointers don't escape. Am I right in that one cannot
> store a borrowed pointer into a global data structure?

Sure. If you are going to escape a pointer, you need to think about how that memory is managed. That's how it is. And there are many options: You could simply transfer ownership, you might use reference counting, or a tracing GC.

> The similarity is
> that there are one way conversions from one to the other, and one of the
> types is more general.

And the difference is that the one you can convert to is faster here.

> I infer from your other statements about Rust
> that it doesn't actually have a general pointer type.

Memory can be borrowed no matter how it was allocated. That's what I call general. How do you fill that term?

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by bearophile
in reply to Timon Gehr

bearophile

Posted in reply to Timon Gehr

Timon Gehr:

> (Probably more, actually, because it does not provide precision-unfriendly constructs such as undiscriminated unions.)

I suggested to add an optional method named "onGC" to unions that if present is called at run-time by the GC to know what's the real type of stored data, to make tracing more precise.

Bye,
bearophile

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Marc Schütz
in reply to Rainer Schuetze

Marc Schütz

Posted in reply to Rainer Schuetze

On Sunday, 11 May 2014 at 18:18:41 UTC, Rainer Schuetze wrote:
> For a reasonable GC I currently see 2 possible directions:
>
> 1. Use a scheme that takes a snapshot of the heap, stack and registers at the moment of collection and do the actual collection in another thread/process while the application can continue to run. This is the way Leandro Lucarellas concurrent GC works (http://dconf.org/2013/talks/lucarella.html), but it relies on "fork" that doesn't exist on every OS/architecture. A manual copy of the memory won't scale to very large memory, though it might be compressed to possible pointers. Worst case it will need twice as much memory as the current heap.
>
> It would be very interesting how far we can push this model on the supported platforms.
>
> 2. Change the compiler to emit (library defined) write barriers for modifications of (possible) pointers. This will allow to experiment with more sophisticated GC algorithms (if the write barrier is written in D, we might also need pointers without barriers to implement it). I know Walter is against this, and I am also not sure if this adds acceptable overhead, but we don't have proof of the opposite, too.
>
> As we all know, the usual eager reference counting with atomic operations is not memory-safe, so my current favorite is "concurrent buffered reference counting" (see chapter 18.2/3 "The garbage collection handbook" by Richard Jones et al): reference count modifications are not performed directly by the write barrier, but it just logs the operation into a thread local buffer. This is then processed by a collector thread which also detects cycles (only on candidates which had their reference count decreased during the last cycle). Except for very large reference chains this scales with the number of executed pointer modifications and not with the heap size.

I'm surprised that you didn't include:

3. Thread-local GC, isolated zones (restricting where references to objects of a particular heap can be placed), exempting certain threads from GC completely, ...

May 12, 2014

Re: More radical ideas about gc and reference counting

Posted by Timon Gehr
in reply to Walter Bright

Timon Gehr

Posted in reply to Walter Bright

On 05/12/2014 10:54 AM, Walter Bright wrote:
> On 5/11/2014 10:57 PM, Marco Leise wrote:
>> Am Sun, 11 May 2014 17:50:25 -0700
>> schrieb Walter Bright <newshound2@digitalmars.com>:
>>
>>> As long as those pointers don't escape. Am I right in that one cannot
>>> store a
>>> borrowed pointer into a global data structure?
>>
>> Right, and that's the point and entirely positive-to-do™.
>
> This means that a global data structure in Rust has to decide what
> memory allocation scheme its contents must use,

Global variables are banned in Rust code outside of unsafe blocks.

> and cannot (without tagging) mix memory allocation schemes.
> ...

Tagging won't help with all memory allocation schemes.

> For example, let's say a compiler has internally a single hash table of
> strings. With a GC, those strings can be statically allocated, or on the
> GC heap, or anything with a lifetime longer than the table's.
> But I don't see how this could work in Rust.
>

It's possible if you don't make the table global.
(OTOH in D this is not going to work at all.)

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation