More radical ideas about gc and reference counting (page 27) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » More radical ideas about gc and reference counting (page 27)

May 11, 2014

Re: More radical ideas about gc and reference counting

Posted by Marc Schütz
in reply to Manu

Marc Schütz

Posted in reply to Manu

On Sunday, 11 May 2014 at 09:53:59 UTC, Manu via Digitalmars-d wrote:
> On 11 May 2014 17:52, Benjamin Thaut via Digitalmars-d
> <digitalmars-d@puremagic.com> wrote:
>> Am 06.05.2014 05:40, schrieb Manu via Digitalmars-d:
>>
>>> I support the notion that if the GC isn't removed as a foundational
>>> feature of D, then destructors should probably be removed from D.
>>>
>>> That said, I really want my destructors, and would be very upset to
>>> see them go. So... ARC?
>>>
>>
>> I think ARC could work, but should be extended with some sort of ownership
>> notation. Often a block of memory (e.g. an array of data) is exclusivly
>> owned by a single object. So it would be absolutly uneccessary to reference
>> count that block of memory. Instead I would want something like Rust has,
>> borrowed pointers. (We actually already have that, "scope" but its not
>> defined nor implemented for anything but delegates)
>
> Indeed, I also imagine that implementation of 'scope' would allow for
> a really decent ARC experience. D already has some advantages over
> other languages, but that one would be big.

Yes, together with an opImplicitCast of some sort it could probably even be implemented as a pure library type (std.typecons.RefCounted). std.typecons.scoped can also benefit from this.

This would allow safe implicit casting to non-RC types, i.e. passing to a function that accepts a non-RC scope parameter. This is extremely important if we want to be able to use RC types only in some cases; otherwise it would need to "infect" everything in order to be safe.

May 11, 2014

Re: More radical ideas about gc and reference counting

Posted by w0rp
in reply to Manu

w0rp

Posted in reply to Manu

The vast majority of software, at least as far as I can see, use web services. That makes up the vast majority of software on my Android phone. Garbage collection is definitely applicable for web servers, so there is a huge area where D and a garbage collector can apply nicely. I think the arguement that the vast majority of software should be real time now is very weak, I wouldn't argue that. I would simply argue that garbage collection isn't applicable to real time software, because that is a given.

I'm really not sure how anything but a manual memory management allocation scheme could work with real time software. It seems to me that if you are writing software where any cost in time is absolutely critical, and you must know exactly when you are allocated and deallocating, then the best you can hope to do is to write these things yourself.

I don't think it's possible for a computer out there to manage time for you at the most fundamental level, managing memory. If I was to write a real time application, I would not interact with a garbage collector and use primarily small data structures on a stack. If I needed to allocate objects on a heap, I would use something I could resize and destroy pretty manually, or at least in a scoped manner, like std::vector. I can't see how garbage collection or automatic reference counting would help me. I would want to have primarily scoped or unique references to data, not shared references.

May 11, 2014

Re: More radical ideas about gc and reference counting

Posted by Paulo Pinto
in reply to w0rp

Paulo Pinto

Posted in reply to w0rp

Am 11.05.2014 12:57, schrieb w0rp:
> The vast majority of software, at least as far as I can see, use web
> services. That makes up the vast majority of software on my Android
> phone. Garbage collection is definitely applicable for web servers, so
> there is a huge area where D and a garbage collector can apply nicely. I
> think the arguement that the vast majority of software should be real
> time now is very weak, I wouldn't argue that. I would simply argue that
> garbage collection isn't applicable to real time software, because that
> is a given.
>
> I'm really not sure how anything but a manual memory management
> allocation scheme could work with real time software. It seems to me
> that if you are writing software where any cost in time is absolutely
> critical, and you must know exactly when you are allocated and
> deallocating, then the best you can hope to do is to write these things
> yourself.
>
> I don't think it's possible for a computer out there to manage time for
> you at the most fundamental level, managing memory. If I was to write a
> real time application, I would not interact with a garbage collector and
> use primarily small data structures on a stack. If I needed to allocate
> objects on a heap, I would use something I could resize and destroy
> pretty manually, or at least in a scoped manner, like std::vector. I
> can't see how garbage collection or automatic reference counting would
> help me. I would want to have primarily scoped or unique references to
> data, not shared references.

Apparently the customers of Aicas, Aonix, IBM and IS2T have a different opinion.

--
Paulo

May 11, 2014

Re: More radical ideas about gc and reference counting

Posted by John Colvin
in reply to Manu

John Colvin

Posted in reply to Manu

On Tuesday, 6 May 2014 at 03:40:47 UTC, Manu via Digitalmars-d wrote:
> On 3 May 2014 18:49, Benjamin Thaut via Digitalmars-d
> <digitalmars-d@puremagic.com> wrote:
>> Am 30.04.2014 22:21, schrieb Andrei Alexandrescu:
>>>
>>> Walter and I have had a long chat in which we figured our current
>>> offering of abstractions could be improved. Here are some thoughts.
>>> There's a lot of work ahead of us on that and I wanted to make sure
>>> we're getting full community buy-in and backup.
>>>
>>> First off, we're considering eliminating destructor calls from within
>>> the GC entirely. It makes for a faster and better GC, but the real
>>> reason here is that destructors are philosophically bankrupt in a GC
>>> environment. I think there's no need to argue that in this community.
>>>
>>> The GC never guarantees calling destructors even today, so this decision
>>> would be just a point in the definition space (albeit an extreme one).
>>>
>>> That means classes that need cleanup (either directly or by having
>>> fields that are structs with destructors) would need to garner that by
>>> other means, such as reference counting or manual. We're considering
>>> deprecating ~this() for classes in the future.
>>>
>>> Also, we're considering a revamp of built-in slices, as follows. Slices
>>> of types without destructors stay as they are.
>>>
>>> Slices T[] of structs with destructors shall be silently lowered into
>>> RCSlice!T, defined inside object.d. That type would occupy THREE words,
>>> one of which being a pointer to a reference count. That type would
>>> redefine all slice primitives to update the reference count accordingly.
>>>
>>> RCSlice!T will not convert implicitly to void[]. Explicit cast(void[])
>>> will be allowed, and will ignore the reference count (so if a void[]
>>> extracted from a T[] via a cast outlives all slices, dangling pointers
>>> will ensue).
>>>
>>> I foresee any number of theoretical and practical issues with this
>>> approach. Let's discuss some of them here.
>>>
>>>
>>> Thanks,
>>>
>>> Andrei
>>
>>
>> Honestly, that sounds like the entierly wrong apporach to me. Your
>> approaching the problem in this way:
>>
>> "We can not implement a propper GC in D because the language design prevents
>> us from doing so. So lets remove destructors to migate the issue of false
>> pointers."
>>
>> While the approach should be.
>>
>> "The language does not allow to implement a propper GC (anything else then
>> dirty mark & sweep), what needs to be changed to allow a implementation of a
>> more sophisticated GC."
>
> Couldn't agree more.
> Abandoning destructors is a disaster.
> Without destructors, you effectively have manual memory management, or
> rather, manual 'resource' management, which is basically the same
> thing, even if you have a GC.
> It totally undermines the point of memory management as a foundational
> element of the language if most things are to require manual
> release/finalisation/destruction or whatever you wanna call it.
>
>
>> Also let me tell you that at work we have a large C# codebase which heavily
>> relies on resource management. So basically every class in there inherits
>> from C#'s IDisposable interface which is used to manually call the finalizer
>> on the class (but the C# GC will also call that finalizer!). Basically the
>> entire codebase feels like manual memory management. You have to think about
>> manually destroying every class and the entire advantage of having a GC,
>> e.g. not having to think about memory management and thus beeing more
>> productive, vanished. It really feels like writing C++ with C# syntax. Do we
>> really want that for D?
>
> This is interesting to hear someone else say this. I have always found
> C# - an alleged GC language - to result in extensive manual memory
> management in practise too.
> I've ranted enough about it already, but I have come to the firm
> conclusion that the entire premise of a mark&sweep GC is practically
> corrupt. Especially in D.
> Given this example that you raise with C#, and my own experience that
> absolutely parallels your example, I realise that GC's failure extends
> into far more cases than just the ones I'm usually representing.
>
> I also maintain that GC isn't future-proof in essence. Computers grow
> exponentially, and GC performance inversely tracks the volume of
> memory in the system. Anything with an exponential growth curve is
> fundamentally not future-proof.
> I predict a 2025 Wikipedia entry: "GC was a cute idea that existed for
> a few years in the early 2000's while memory ranged in the 100's mb -
> few gb's, but quickly became unsustainable as computer technology
> advanced".
>
>
>> And what if I want unsafe slices of structs with destructors, for
>> performance? Maybe I perfectly know that the memory behind the slice will
>> outlive the slice, and I don't want the overhead of all the reference
>> counthing behind it?
>>
>> If you actually deprecate ~this, there would be two options for me.
>> 1) Migrate my entire codebase to some user defiend finalizer function (which
>> doesn't have compiler support), which would be a lot of work.
>
> Does ~this() actually work, or just usually work?
> Do you call your destructors manually like C#?
>
>> 2) Quit D. (which is becomeing more and more an option when reading the
>> recent news group discussions.)
>
> I'm starting to fear the same outcome for myself.
> I don't have any idea how to reconcile this problem in my working
> environment, and very little community sympathy. I'm not seeing real
> solutions emerging, and the only one I can imagine that's even
> theoretically possible is wildly unpopular (ARC).
> For years, I just (naively?) assumed that the GC was immature, and
> would improve with time. Never gave it much thought; assumed there
> were people much smarter than me with a plan...
>
> I can't imagine any way out of this without significant breaking
> changes to the type system. Probably a new pointer type at least.
>
>
> This thread is starting to convince me that the GC should probably be
> repositioned as a convenience library provided *beside* the language,
> rather than a foundation of the language. It should be exclusively
> opt-in, not opt-out, and upon opting-in, you accept the associated
> problems.
> The revelation that established GC languages like C# are actually
> broken too hadn't occurred to me until now, but if I had to nominate
> the single biggest disaster in C# from my experience, that's certainly
> it; it's built on a GC, but isn't really compatible with it either.
> I'm constantly cleaning up manually in C#, which leaves very little
> value in the feature, and a definite tendency to produce unreliability
> by users who aren't experts on garbage collection and presume it
> should 'just work'.
>
> I support the notion that if the GC isn't removed as a foundational
> feature of D, then destructors should probably be removed from D.
> That said, I really want my destructors, and would be very upset to
> see them go. So... ARC?

You make some good arguments for ARC, but I think most of the discussion here is lacking in expertise. We need at least one serious memory management expert, possibly academic, to really set straight what is and isn't possible in D.

May 11, 2014

Re: More radical ideas about gc and reference counting

Posted by ponce
in reply to Walter Bright

ponce

Posted in reply to Walter Bright

On Sunday, 11 May 2014 at 08:59:42 UTC, Walter Bright wrote:
>
> D also cannot be performance competitive with C++ if pervasive ARC is used and memory safety is retained. Rust is attempting to solve this problem by using 'borrowed' pointers, but this is unproven technology, see my reply to Manu about it.

I work in a C++ shop and as I see it, resource management is becoming a solved problem:

- resource owners holds a std::unique_ptr<T> on them. Resource release is taken care by C++ destructors normally. That means to be exception-safe, each resource type must better have its class.
- resource users eventually "borrow" the resource by taking a raw pointer out of the unique pointer. What Rust would do with lifetimes here is ensure that the resource is still there since move semantics seems pervasive in this language. In C++ we ensure the resource holder outlive the users.
- std::shared_ptr is not needed with such constraints. This means no cycles in the object graph. TBH I have yet to find a dependency scheme that can't be done that way.

When I use D I can't help but think that releasing resources feels more manual and error-prone ("oops that resource should have been a struct not a class" and such traps).

I do not have huge concerns about D GC, but I would be glad to have more support for owned pointers (ie. Unique!T in Phobos or better). I have no idea how to make it safe ie. ensure the resource outlive its users.

May 11, 2014

Re: More radical ideas about gc and reference counting

Posted by Michel Fortin
in reply to Walter Bright

Michel Fortin

Posted in reply to Walter Bright

On 2014-05-11 08:29:13 +0000, Walter Bright <newshound2@digitalmars.com> said:

> Again, O-C and C++/CX ARC are not memory safe because in order to make it perform they provide unsafe escapes from it.

But D could provide memory-safe escapes. If we keep the current GC to collect cycles, we could also allow raw pointers managed by the GC alongside ARC.

Let's say we have two kinds of pointers: rc+gc pointers (the default) and gc_only pointers (on demand). When assigning from a rc+gc pointer to a gc_only pointer, the compiler emits code that disables destruction via the reference counting. This makes the GC solely responsible for destructing and deallocating that memory block. You can still assign the pointer to a rc+gc pointer later on, but the reference count is no longer reliable which is why RC-based destruction has been disabled.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

May 11, 2014

Re: More radical ideas about gc and reference counting

Posted by w0rp
in reply to Michel Fortin

w0rp

Posted in reply to Michel Fortin

On Sunday, 11 May 2014 at 12:52:29 UTC, Michel Fortin wrote:
> On 2014-05-11 08:29:13 +0000, Walter Bright <newshound2@digitalmars.com> said:
>
>> Again, O-C and C++/CX ARC are not memory safe because in order to make it perform they provide unsafe escapes from it.
>
> But D could provide memory-safe escapes. If we keep the current GC to collect cycles, we could also allow raw pointers managed by the GC alongside ARC.
>
> Let's say we have two kinds of pointers: rc+gc pointers (the default) and gc_only pointers (on demand). When assigning from a rc+gc pointer to a gc_only pointer, the compiler emits code that disables destruction via the reference counting. This makes the GC solely responsible for destructing and deallocating that memory block. You can still assign the pointer to a rc+gc pointer later on, but the reference count is no longer reliable which is why RC-based destruction has been disabled.

You know, this doesn't sound that far off from what Python does, unless I'm completely wrong about it. I believe that Python uses reference counting and uses GC to collect cycles in a way like you have described. I'm not sure how efficient it is. Python people don't tend to talk about speed that much.

May 11, 2014

Re: More radical ideas about gc and reference counting

Posted by Timon Gehr
in reply to Walter Bright

Timon Gehr

Posted in reply to Walter Bright

On 05/11/2014 10:29 AM, Walter Bright wrote:
> ...
>
> ------------- A Comment on Rust ------------------
>
> This is based on my very incomplete knowledge of Rust, i.e. just reading
> a few online documents on it. If I'm wrong, please correct me.
> ...

Well, region-based memory management is not new, and Rust's approach is natural, even more so when given some background.

Have you ever read up on type systems? Eg:

http://www.cis.upenn.edu/~bcpierce/tapl/
http://www.cis.upenn.edu/~bcpierce/attapl/

> Rust's designers apparently are well aware of the performance cost of
> pervasive ARC. Hence, they've added the notion of a "borrowed" pointer,
> which is an escape from ARC.

It is not an escape from ARC per se. It is a way to write type safe code which is not dependent on the allocation strategy of the processed data. (One can e.g. safely borrow mutable data as immutable and the type system ensures that during the time of the borrow, the data does not mutate.)

> The borrowed pointer is made memory safe by:
>
> 1. Introducing restrictions on what can be done with a borrowed pointer
> so the compiler can determine its lifetime. I do not know the extent of
> these restrictions.
> ...

The type system tracks lifetimes across function and data structure boundaries. The main restriction is that such a pointer cannot be escaped. (But borrowed pointers may still be stored in data structures.)

> 2. Introducing an annotation to distinguish a borrowed pointer from an
> ARC pointer. If you don't use the annotation, you get pervasive ARC with
> all the poor performance that entails.
> ...

No, this is completely inaccurate: Both choices are explicit, and reference counting is just one of the possible memory management schemes. (Hence borrowed pointers are used unless there is a reason not to.)

> Implicit in borrowed pointers is Rust did not solve the problem of
> having the compiler eliminate unnecessary inc/dec.
> ...

Avoiding inc/dec is not what justifies borrowed pointers.

>
> My experience with pointer annotations to improve performance is pretty
> compelling - almost nobody adds those annotations. They get their code
> to work with the default, and never get around to annotating it.

The 'default' way to pass by reference is by borrowed pointer.

> ...
> They do not mix. A function taking one type of
> pointer cannot be called with the other type.
> ...

A function taking a borrowed pointer can be called with a reference counted pointer. Abstracting over allocation strategy is the point of borrowing.

> Worse, these effects are transitive, making a function hierarchy rather
> inflexible.
>
> Are these valid concerns with Rust?

I don't think they are.

May 11, 2014

Re: More radical ideas about gc and reference counting

Posted by Marco Leise
in reply to Paulo Pinto

Marco Leise

Posted in reply to Paulo Pinto

Am Wed, 07 May 2014 06:50:33 +0000
schrieb "Paulo Pinto" <pjmlp@progtools.org>:

> A *nix package manager is brain dead idea for software development as it ties the language libraries to the specific OS one is using.

What do you have in mind here? I don't quite get the picture. Are you opposed to a developer writing a package only for his pet distribution of Linux and ignoring all others?

The typical packages I know come with some configure script
and offer enough hooks to custom tailer the installation, so
the library/application can work on any distribution.
Most of the Linux software is open source and and their
packages are maintained by the community around the specific
distribution. That doesn't preclude the use of package
managers like Cabal, CPAN, Maven, you-name-it. But for a
systems language integration with the existing C/C++ is of
utmost importance. After all, D compiles to 'regular' native
binaries. Typically when an application is useful, someone
will add it to their favorite distribution as a package
including all the libraries as dependencies.

> Good luck getting packages if the author did not consider your OS. Specially if they are only available in binary format, as it is standard in the enterprise world.

Then use dub. Oh wait... the packages on code.dlang.org are open source, too. And at least since the curl debacle we know that there is not one binary for all *nix systems. I don't know where you are trying to get with this argument. I think it has nothing to do with what dub strives for and is worth a separate topic "Binary D library distribution".

> With a language pakage manager I can produce package XPTO that will work on all OS, it won't conflict with the system packages, specially important on servers used for CI of multiple projects.
> 
> --
> Paulo

What is this XPTO that will magically work on all OS? I never
heard of it, but I'm almost certain it has to do with
languages that compile at most to machine independent byte
code.
And why do you run CI on the live system instead of a chroot
environment, if you are afraid of messing it up? :) I do trust
my package manager to correctly install libraries into a
chroot. It is as simple as prepending an environment variable
override. As a bonus you can then cleanly uninstall/update
libs in the chroot environment with all the sanity checks the
package manager may offer.
A language package manager is a good idea, but there are
certain limits for it you leave the development stage. At that
point the system package manager takes over. Both should be
considered with equal care.

-- 
Marco

May 11, 2014

Re: More radical ideas about gc and reference counting

Posted by Rainer Schuetze
in reply to Benjamin Thaut

Rainer Schuetze

Posted in reply to Benjamin Thaut

On 11.05.2014 10:22, Benjamin Thaut wrote:
> Am 10.05.2014 19:54, schrieb Andrei Alexandrescu:
>>
>>> The next sentence goes on to list the advantages of RC (issues we have
>>> wrestled with, like destructors), and then goes on to say the recent
>>> awesome RC is within 10% of "the fastest tracing collectors".
>>> Are you suggesting that D's GC is among 'the fastest tracing
>>> collectors'? Is such a GC possible in D?
>>
>> I believe it is.
>>
>
> While it might be possible to implement a good GC in D it would require
> major changes in the language and its librariers. In my opinion it would
> be way more work to implement a propper GC than to implement ARC.
>
> Every state of the art GC requires percise knowdelge of _all_ pointers.
> And thats exactly what we currently don't have in D.

I think most garbage collectors can work with a number of false pointers. The referenced memory area has to be treated as pinned and cannot be moved. Limiting the false pointers to stack and registers seems like a compromise, though most of the stack could even be annotated. Code for this does already exist in the debug info generation, though I suspect stack tracing could be unreliable.

Here's my current stance on the GC discussions:

I agree that the current GC is pretty lame, even if it were precise. "Stop-the-World" with complete tracing does not work for any interactive application that uses more than a few hundred MB of garbage collected memory (with or without soft-realtime requirements). Other applications with larger allocation requirements are easily dominated by collection time. Proposing to use manual memory management instead is admitting failure to me.

For a reasonable GC I currently see 2 possible directions:

1. Use a scheme that takes a snapshot of the heap, stack and registers at the moment of collection and do the actual collection in another thread/process while the application can continue to run. This is the way Leandro Lucarellas concurrent GC works (http://dconf.org/2013/talks/lucarella.html), but it relies on "fork" that doesn't exist on every OS/architecture. A manual copy of the memory won't scale to very large memory, though it might be compressed to possible pointers. Worst case it will need twice as much memory as the current heap.

It would be very interesting how far we can push this model on the supported platforms.

2. Change the compiler to emit (library defined) write barriers for modifications of (possible) pointers. This will allow to experiment with more sophisticated GC algorithms (if the write barrier is written in D, we might also need pointers without barriers to implement it). I know Walter is against this, and I am also not sure if this adds acceptable overhead, but we don't have proof of the opposite, too.

As we all know, the usual eager reference counting with atomic operations is not memory-safe, so my current favorite is "concurrent buffered reference counting" (see chapter 18.2/3 "The garbage collection handbook" by Richard Jones et al): reference count modifications are not performed directly by the write barrier, but it just logs the operation into a thread local buffer. This is then processed by a collector thread which also detects cycles (only on candidates which had their reference count decreased during the last cycle). Except for very large reference chains this scales with the number of executed pointer modifications and not with the heap size.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation