Garbage Collection is bad... other comments (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » D » Garbage Collection is bad... other comments (page 3)

January 24, 2003

Re: Garbage Collection is bad... other comments

Posted by Ilya Minkov
in reply to Bill Cox

Ilya Minkov

Posted in reply to Bill Cox

Hello.

Mind what you say.

Though recursive destruction is sufficent for a great number of problems, for some it is not. I hve caught myself organising data nicely in a tree, then giving the pointer to some small branch somewhere and destroying the tree. Done. The program doesn't spend even a bit too much memory. Even less than requiered. Perfect, right?

Furthermore, i've accidentally deleted an object which is still in use, but the holder doesn't get notified of it anyway. It just *has* to crash as soon as any access is done.

Such domains lead to some necessity to keep track of how many users an object has, and this was *sometimes hard* in Delphi. And error-prone.

This manner of organising data in recursively-destroyed structures is typical of experienced programmers, to have some central point to keep it all, and is not requiered in most cases where GC is used. It simply requeres a different management for data.

Avoid common roots. Only actual users may have references to objects. Then they shall be collected if they're no use and are in a way.

Those people like you with their old habits may try GC someday on some program of theirs, which was working well, and notice that GC has spoiled it, but the usual cause for it is not obeying these simple rules, which only simplify the design and not complicate it.

For some uses though, a GC appears to be of no use. For these cases, i see a necessity of a separate heap for by-design non-GCed objects, since there still might be objects which requiere GC in a multi-module programme, like strings and such. This restricts nothing, and the "mark" phase time would decimate vastly if a significant portion is not GCed, not to mention it would already be very short as soon as mark-methods are implemented.

A simple implementation would be that "non-GCed" objects have an empty or programmer-written "mark" method (that is, a method which is called by a GC, so that an object can mark the objects it's using), and should not be deleted if it has not been marked by any other objects. Simply by marking itself, or by more sophisticated means.

I admit that my idea, that every object which provedes a corresponding method would be thus informed of GC-collected information is not always implementable. In some implementations it is quite possible, that the only thing which it would get is whether it was marked at all or not, but not how often it has been done. But i'd wish to know how many references to "me" exist at the time of mark phase, if it's possible.

-i.

PS. Embedded comments follow

 |
 |
\|/
 '

Bill Cox wrote:
> Hi.
> 
> I'm new to this news group, but not to language design.  D can't be all things to all people, so we might as well live with it's focus.  Garbage collection is already there, and that defines the scope of the D language about as much as any other feature.
> 
> That said, here's my thoughts about GC in general:
> 
> GC in general buys very little.  Anyone writing complicated programs already have to write recursive destructors to remove objects from relationships with other objects.  The value of GC is simply that one extra line (delete myObject;) doesn't have to be typed.

It is a vast help in simple cases. Don't tell me that noone writes simple programs. Or doesn't want to write a complex programme in a simple manner. Simple units can very well evolve to complex programs, i hope in D to a much further extent as in C++.

> People argue that GC helps reduce dangling pointer bugs, but rather than a dangling pointer, you've got a memory leak.

Only without a GC, or with non-GC-friendly structures, which in my experience tend not to appear by themselves, but rather on purpose of conventional memory management. And don't tell me you've nevel leaked memory... You may have leaked tiny objects and wouldn't notice it. I did notice sometimes, but only ocassionaly.

A good language supported GC (eg Caml) *can't* leak memory at a full mark cycle, because it colects all memory not being referenced. Simply NULL out your pointers and you're done. As soon as your application or possibly some other application needs memory, they would be collected. Do you really care what your "mem-meter" shows you if you don't intend to use it?

Boehm GC can leak memory, but the reason is the absense of language support, and that it's thus unable to be sure of some things.

> While GC doesn't really hurt a language like D for most applications, it trashes it's use (as D's author points out) for real time programming (not going to see any kernels written in D any time soon).

Why? D is not Java. GC is not forced. And else it is inherently not even a bit less efficient than C. Rather more, if Walter decides to get rid of C calling convention for usual fuctions. :>

> There's also another weak reason I dislike GC: synthesis of datastructures into hardware really doesn't work in a GC based language.

???

>>> If you're doing your own memory allocation code, you have to do a LOT of work, and write a LOT of bug free code, to get to where the Java allocator is.
> 
> 
> In any current language, you'd be right.  However, highly optimized memory management can and should be generated by compilers.  We do this now where I work.  I type no extra code, but get automatically generated free-lists, and bug-free recursive destructors.  It's kind of nice.

Bug-free recursive destruction is not a problem in my eyes. But keeping an eye that you always recursively copy objects and not give away references to them, or that you manipulate the refcounter correctly is tedious. GC would improve performance in certain cases.

Copy-on-write? Then you need a ref-counter. Or a garbage collection. Generational GC would do very well in such case. Like Caml shows, which (almost) always uses copy-on-write, eg when you change something in a linked list.

I'd rather be for a system-wide GC usage. Example: data passed to OpenGL is being copied. It is rediculous, simpler would be to use copy-on-change (and OpenGL usually doesn't change data), and to GC the leftovers. It is very probable that the data doesn't have to be copied at all then, because applications rarely change most kinds of data during some frame is drawn. Well, OpenGL inludes ways to make data transfers as rare as possible, but their use becomes tedious.

> 
> Bill Cox
>

January 25, 2003

Re: Garbage Collection is bad... other comments

Posted by Bill Cox
in reply to Ilya Minkov

Bill Cox

Posted in reply to Ilya Minkov

Hi.

> Though recursive destruction is sufficent for a great number of problems, for some it is not. I hve caught myself organising data nicely in a tree, then giving the pointer to some small branch somewhere and destroying the tree. Done. The program doesn't spend even a bit too much memory. Even less than requiered. Perfect, right?
> Furthermore, i've accidentally deleted an object which is still in use, but the holder doesn't get notified of it anyway. It just *has* to crash as soon as any access is done.

I didn't make myself clear.  I don't write recursive destructors.  Our data management CASE tool generates them for us, 100% automaticallay. It gets it right every time.  We never have any unaccounted for pointers.  It's fast and productive.  In fact, we don't even buy Purify, even though we program in C.  We just don't need it.  Seg-v's are mostly a thing of the past for us.

People using GC are still writing their own destructors, and still getting them wrong.  A great way to improve productivity is to use CASE tools to do it for you.  I really don't see the need for GC.

As case tools are often a sign that the language needs to be extended, I've recently written a language that eliminates the need for the CASE tools we use at work.  You have to go further.  One goal of the language was optimized to work well with code generators.  Every relationship (container class) generates code in both the parent and child classes, and updates the recursive destructor in each class.  You can't do that with templates.  You don't need a new language to do this, though.  As I said, we do this at work in C with code generators.

>> There's also another weak reason I dislike GC: synthesis of datastructures into hardware really doesn't work in a GC based language.
> 
> 
> ???
> 

Basically, GC makes it harder to implement a program on hardware.  The field of reconfigurable computing is dedicated to doing this.  GC complicates things.

GC in D is fine.  You make your choices in language design, and obviously smart people often include GC.  It just makes the language less attractive for the few of us who need to code on the bleeding edge of speed (we write chip design software).

Bill Cox

January 25, 2003

Re: Garbage Collection is bad... other comments

Posted by Ilya Minkov
in reply to Bill Cox

Ilya Minkov

Posted in reply to Bill Cox

Bill Cox wrote:
> Hi.
> 
> I didn't make myself clear.  I don't write recursive destructors.  Our data management CASE tool generates them for us, 100% automaticallay. It gets it right every time.  We never have any unaccounted for pointers.  It's fast and productive.  In fact, we don't even buy Purify, even though we program in C.  We just don't need it.  Seg-v's are mostly a thing of the past for us.

What further work does it do?
Recursive destruction is by far not the only problem. How is the rest of memory management dealt with? I mean, to prevent destruction of useful and how to track down objects not used anymore.

> People using GC are still writing their own destructors, and still getting them wrong.  A great way to improve productivity is to use CASE tools to do it for you.  I really don't see the need for GC.

Destructors are not requiered anymore, except for the cases where you need to destroy some hardware together with an object. :>  Well, there are a few special cases.

People make mistakes with any model. But IMO making no mistakes with GC is very easy.

Although i'm an advocate of GC, i'm open to every other solution.

> As case tools are often a sign that the language needs to be extended, I've recently written a language that eliminates the need for the CASE tools we use at work.  You have to go further.  One goal of the language was optimized to work well with code generators.  Every relationship (container class) generates code in both the parent and child classes, and updates the recursive destructor in each class.  You can't do that with templates.  You don't need a new language to do this, though.  As I said, we do this at work in C with code generators.

Describe the extentions you consider useful in complete detail, else it's no use for Walter. Or give a link.

I appreciate that you have *done* something (esp. something that works), but what you've written here so far gives me no idea of what it is and how it should work. I'm not well-informed about CASE in general, and i bet the most of the newsgroup isn't as well.

There are currently multiple directions, where D should go. First, it has to become solid in its current form. Second, backends have to be written for different kinds of use, eg. a very portable VM. And then, i'll be making an effort to make writing compilers in D and language extentions to D easy, and i'll gladly take your proposal into account. Currently i'm looking at writing of COCOM-like library for D, enabling fast and flexible parsing and syntax tree transformation.

> Basically, GC makes it harder to implement a program on hardware.  The field of reconfigurable computing is dedicated to doing this.  GC complicates things.

Wouldn't you want to write a program in VHDL or such if you wanted to get hardware as output? Well, i can imagine that nowadays it is possible with higher-level languages. But isn't D an overkill?

Oh, you're working for an ASIC company. Then obviously "hard" management is better for *you* than a GC. But how should it work? A ref-counter can be a drop-in replacement for a GC (less reliable though), and how do you make the inegration of CASE?

-i.

January 25, 2003

Re: Garbage Collection is bad... other comments

Posted by Bill Cox
in reply to Ilya Minkov

Bill Cox

Posted in reply to Ilya Minkov

HI, Ilya.

Ilya Minkov wrote:
> Bill Cox wrote:
> 
>> Hi.
>>
>> I didn't make myself clear.  I don't write recursive destructors.  Our data management CASE tool generates them for us, 100% automaticallay. It gets it right every time.  We never have any unaccounted for pointers.  It's fast and productive.  In fact, we don't even buy Purify, even though we program in C.  We just don't need it.  Seg-v's are mostly a thing of the past for us.
> 
> 
> What further work does it do?
> Recursive destruction is by far not the only problem. How is the rest of memory management dealt with? I mean, to prevent destruction of useful and how to track down objects not used anymore.

Ok.  Here's a description of our CASE tool:

The case tool (DataDraw) is pretty simple.  It's open-source and compiles under Visual C++ on Windows.  I use it on Linux with wine (a Windows emulator).  You draw your class schema much as you would with a UML editor, or a database entity-relationship schema editor.  You add properties to classes by double clicking on them and adding the fields.  Then, to add relationships (container classes) between classes, you select a relationship drawing tool and draw arrows between the classes.  By double clicking on the relationship, you get to set it's type: one-to-one, linked list, tail-linked-list (linked list with a pointer to the last element), doubly-linked list, static array, dynamic array, name-based hash table, or static inheritance (standard C++ style inheritance).  You also get to specify whether or not the children should be deleted when the parent is deleted.  There are a couple other option: relationship names, and whether or not there is a back pointer to the parent embedded in the child.

After describing your data structures, it will generate code for you. There are several code generators to choose from, mostly variations of C coding styles, and one C++ style.  We use on of the C styles because it gives us dynamic inheritance, which is a big deal for the kind of software we write (integrated chip design tools that run off of a common database).  Static inheritance is great when you are constructing objects from scratch.  Our objects already exist in the database, and we need to extend them on-the-fly.  The code generator we use does this with no memory or speed penalty, since it represents all the class data in individual arrays of data.  To extend all the objects of a given class, the memory manager just allocates more arrays.  Instead of using C style pointers, we the code generator typedefs an index as the reference type for the class.  We get strong type checking in debug mode by typedefing the handles as pointers to dummy classes, and casting them to ints when accessed.  That sounds ugly, but it's all buried under the hood by automatically generated access macros that hide it all.

The memory managers in two of the code generators are quite advanced. First, each class has options you can select for how to manage it's memory.  Objects of some classes are never individually deleted, only all at once.  For those classes, the structures are put into a dynmic array.  Allocating an object typically is just a compare and an incremt (thee machine instructions, I think).  For most classes, however, you need both create and delete on the fly.  The code generator that generates more normal C code alloates memory one page at a time, and then sub-allocates your objects of a given class out of them, which optimizes both paging and cache hits.  Free lists are automatically generated and maintained.  Structures are also automatically alligned, with fields sorted by size.  The funkier code generator we use (the one that uses indexes into arrays instead of pointers to structures) uses dynamic arrays for all properties of all classes.  We've tested the performance of our code with both generators, and generally see a 10-20% speed increase using the dynamic array representation instead of structures.  This memory manager uses the trick of allocating 50% more memory each time a dynamic array needs more memory.

Recursive destructors are also generated, as are some other useful functions.  The recursive destructors delete each child in any relationship marked as needing the children deleted.  Other relationships are patched.  For example, code to remove the object from linked lists is generated.  Of particular interest to us is the very efficeint code for our container classes (relationships).  A linked list relationship generated by DataDraw embeds the next pointer in the child class, which is a far more efficeint implementation, in terms of both speed and memory.

In debug mode, all memory accesses are checked, including array indexing.

Some of the code generators also generate binary load/save, and full database integrety self-checks.  However, we've come to the conclusion that simple binary dumps make a poor save format, since it's hard to remain backwards compatible.  Also, since our pointers almost never get trashed, the database integrety self check is also no longer used.

There are ways our database can be trashed.  Users of fscanf run into it.  We sometimes get seg-v's with too few parameters to printf.  All in all, our coding pretty safe from these kinds of bugs.

...

>> As case tools are often a sign that the language needs to be extended, I've recently written a language that eliminates the need for the CASE tools we use at work.  You have to go further.  One goal of the language was optimized to work well with code generators.  Every relationship (container class) generates code in both the parent and child classes, and updates the recursive destructor in each class.  You can't do that with templates.  You don't need a new language to do this, though.  As I said, we do this at work in C with code generators.
> 
> 
> Describe the extentions you consider useful in complete detail, else it's no use for Walter. Or give a link.

I've written a simple code translator from a language that incorporates many of the featues of the DataDraw CASE tool.  To add the full power of code generators, you need to be able to compile and run code that generates code to be compiled.  This is basically templates on steriods.

I'm not familiar with the basic container class packages in D yet, so I'll just write out psuedo-code for an example code generator to add fields for a linked list to two classes.

generateRelFields(Relationship rel)
{
    Class parent = rel.parentClass;
    Class child = rel.childClass;
    if(rel.type == LINKED_LIST) {
        generate(parentName:parent.name, childName:child.name, relName:rel.name) {
            extend class <parentName> {
                struct <relName> {
                    <childName> first;
                }
                add(<childName> child) {
                    child.<relName>.next = this.<relName>.first;
                    this.<relName>.first = child;
                }
                remove(<childName> child) {
                ...
            }
            extend class <childName> {
                struct <relName> {
                    <childName> next;
                    <parentName> owner;
                }
            }
        }
        if(rel.deleteChildren) {
            generate(parentName:parent.name, childName:child.name, relName:rel.name) {
                extend class <parentName> {
                    extend destroy() {
                        <childName> child, nextChild;
                        for(child = this.<relName>.first; child != null; child = nextChild) {
                            nextChild = child.<relName>.next;
                            child.destroy();
                        }
                    }
                }
                extend class <childName> {
                    extend destroy() {
                        if(child.<relName>.owner != null) {
                            child.<relName>.owner.remove(child);
                        }
                    }
                }
            }
        }
    }
}

Note the addition of the generate statement, which is similar to templates.  Also notice that classes, methods, and functions can all be extended.  This is a code-generator friendly syntax.

>> Basically, GC makes it harder to implement a program on hardware.  The field of reconfigurable computing is dedicated to doing this.  GC complicates things.
> 
> 
> Wouldn't you want to write a program in VHDL or such if you wanted to get hardware as output? Well, i can imagine that nowadays it is possible with higher-level languages. But isn't D an overkill?
> 
> Oh, you're working for an ASIC company. Then obviously "hard" management is better for *you* than a GC. But how should it work? A ref-counter can be a drop-in replacement for a GC (less reliable though), and how do you make the inegration of CASE?

VHDL (or Verilog) is the standard way to describe hardware.  Note that VHDL has a generate statement for generating hardware.  IMO, it's VHDL's single most important advantage over Verilog.

The reconfigurable computing guys are trying to compile normal programs into hardware.  It shouldn't be a goal of D, but it is something I dabble in.

Bill Cox

January 28, 2003

Re: Garbage Collection is bad... other comments

Posted by Ilya Minkov
in reply to Evan McClanahan

Ilya Minkov

Posted in reply to Evan McClanahan

Hello.

There exists a GC algorithm, which can run without stopping the execution at all, in parallel, and is thus okay for realtime systems.

Three Colour Marking GC

Each object has a marking field, which can hold one of three colors:
 - white: objects to which no references were found. All objects start out white.
 - grey: object is known to be reachable (referenced), but wasn't scanned for references yet;
 - black: completely inspected. Shall not be revisited by a scanner, unless it changes color.

The only runtime modification is that whenever pointers are being modifed, their containers have to be re-colored from black to grey. After there are no more grey objects, white ones can be collected.

That's how it should work in principle. You can imagine some sufficently efficient implementatations.

This algorithm is also in a modified form used in OCaml - that's how come that games are written in it. After starting out in mid-seventies it has born a number of related algorithms.

Hm. This leads again to the question of a flexible code generation, so that it's possible to plug in one or another GC system, or possibly another memory managers. Or structural optimisations. A complex feature not only by itself, but also for interface design. But on the other hand it could take some burden off compiler writer and pass it to library writers.

-i.

January 30, 2003

Re: Garbage Collection is bad... other comments

Posted by Achillefs Margaritis
in reply to rafael baptista

Achillefs Margaritis

Posted in reply to rafael baptista

C++ can be fully garbage-collected. Just make a GC super class of all garbage collected classes that upon construction adds the object to a global internal list, then clear the list from time to time.

For example:


class GC
{
public:
    GC();
    virtual ~GC();
    static void Clear();
};


static CArray<GC*> _gc;


GC::GC()
{
   _gc += this;
}


GC::~GC()
{
    _gc -= this;
}

void GC::Clear()
{
    for(i = 0; i < _gc.GetSize(); i++)
    {
        GC *gc = _gc[i];
        _gc[i] = 0;
        delete gc;
    }
    _gc.SetSize(0);
}


class Foo : public GC
{
};


int main()
{
    //foo is garbage collected
    Foo *foo = new Foo;

    //clear dangling references; will delete 'foo'
    GC::Clear();
}


C++ is so flexible!!! D should be that way also. Garbage collection should be an option of the programmer.


"rafael baptista" <rafael_member@pathlink.com> wrote in message news:avsfoq$6nc$1@digitaldaemon.com...
> I just saw the D spec today... and it is quite wonderful. There are so
many ways
> that C and C++ could be better and D addresses them. But I think D is a
little
> too ambitious to its detriment. It has a good side: it fixes syntactic and semantic problems with C/C++. It eliminates all kinds of misfeatures and
cruft
> in C++.
>
> And D has a bad side: it attempts to implement all kinds of features that
are
> best left to the programmer - not the language spec: built in string
comparison,
> dynamic arrays, associative arrays, but the worst in my mind is the
inclusion of
> garbage collection.
>
> As is pointed out in the D overview garbage collection makes D unusable
for real
> time programming - this makes it unusable for Kernel programming, device drivers, computer games, any real time graphics, industrial control etc.
etc. It
> limits the language in a way that is really not necessary. Why not just
add in
> language features that make it easier to integrate GC without making it a
part
> of the language.
>
> For my part I wouldn't use GC on any project in any case. My experience
has been
> that GC is inferior to manually managing memory in every case anyway. Why?
>
> Garbage collection is slower: There is an argument in the Garbage
collection
> section of the D documents that GC is somehow faster than manually
managing
> memory. Essentially the argument boils down to that totally bungling
manual
> memory management is slower than GC. Ok that is true. If I program with
tons of
> redundant destructors, and have massive memory leaks, and spend lots of
time
> manually allocating and deallocating small buffers, and constantly
twiddling
> thousands of ref counts then managing memory manually is a disaster. Most programs manage memory properly and the overhead of memory management is
not a
> problem.
>
> The argument that GC does not suffer from memory leaks is also invalid. In
GC'd
> system, if I leave around references to memory that I never intend to use
again
> I am leaking memory just as surely as if I did not call delete on an
allocated
> buffer in c++.
>
> Finally there is the argument that manually managing memory is somehow
difficult
> and error prone. In C++ managing memory is simple - you create a few
templatized
> container classes at the start of a project. They will allocate and free
*all*
> of the memory in your program. You verify that they work correctly and
then
> write the rest of your program using these classes. If you are calling
"new" all
> over your C++ code you are asking for trouble. The advantage is that you
have
> full control over how memory is managed in your program. I can plan
exactly how
> I will deal with the different types of memory, where my data structures
are
> placed in memory. I can code explicity to make sure that certain data
structures
> will be in cache memory at the same time.
>
> I had a similar reaction of having the compiler automatically build me
hash
> tables, string classes and dynamic arrays. I normally code these things
myself
> with particular attention to how they are going to be used. The unit test
stuff
> is also redundant. I can chose to implement that stuff whether the
language
> supports it or not. I normally make unit test programs that generate
output that
> I can regress against earlier builds - but I wouldn't want to saddle
arbitrary
> programs with having to run my often quite extensive and time consuming
unit
> tests at startup. Imaginary numbers and bit fields should similarly be
left up
> to the programmer.
>
> I suppose I can just chose not to use these features if I don't want to -
but it
> strikes me as unnecessary cruft - and I subscribe to the idea beautifully expressed in the D overview that removing cruft makes a language better.
>
> The GC is worse though - because its in there operating - arbitrarily
stalling
> my program whether I use it or not.
>
> I urge the implementor of D to think about it this way: you have lots of
ideas.
> Many of them are really good - but it is likely that despite your best
efforts
> not all of them are. It would make your project much more likely to
succeed if
> you decoupled the ideas as much as possible such that the good ones can
succeed
> without being weighed down by the bad. So wherever possible I think you
should
> move functionality from the language spec to a standard library. This way
the
> library can continue to improve, while the language proper could stabilize early.
>
> This is probably old ground - I assume lots of people have probably
already
> commented on the GC. I looked for a thread on GC and could not find one.
>
> I for one would find D a lot more attractive if it did not have GC.
>
>
>
>
>
>
>
>
>
>

January 31, 2003

Re: Garbage Collection is bad... other comments

Posted by Bill Cox
in reply to Achillefs Margaritis

Bill Cox

Posted in reply to Achillefs Margaritis

HI, Achillefs.

Achillefs Margaritis wrote:
> C++ can be fully garbage-collected. Just make a GC super class of all
> garbage collected classes that upon construction adds the object to a global
> internal list, then clear the list from time to time.
> 
> For example:
> 
> 
> class GC
> {
> public:
>     GC();
>     virtual ~GC();
>     static void Clear();
> };
> 
> 
> static CArray<GC*> _gc;
> 
> 
> GC::GC()
> {
>    _gc += this;
> }
> 
> 
> GC::~GC()
> {
>     _gc -= this;
> }
> 
> void GC::Clear()
> {
>     for(i = 0; i < _gc.GetSize(); i++)
>     {
>         GC *gc = _gc[i];
>         _gc[i] = 0;
>         delete gc;
>     }
>     _gc.SetSize(0);
> }
> 
> 
> class Foo : public GC
> {
> };
> 
> 
> int main()
> {
>     //foo is garbage collected
>     Foo *foo = new Foo;
> 
>     //clear dangling references; will delete 'foo'
>     GC::Clear();
> }
> 
> 
> C++ is so flexible!!! D should be that way also. Garbage collection should
> be an option of the programmer.

This looks like fairly useful code in some cases, but it's not real GC.  It simply deletes ALL GC derived objects that have every been allocated but not manually deleted.

To make C++ do real GC is both very slow and very hard.  You can use handle classes that are actual class objects.  They get added to the root set when they are declared as local variables.  When they're destroyed, they are removed from the root set.  The container classes need to use handles that aren't in the root set, but which can be traversed given a pointer to the containing a GC derived object.  Yuk.

The more common way in C++ is reference counting, which can be done with handle classes easily, but at a significant performance cost.  Also, it doens't really work.  It can GC tree structures, but it fails to delete anything with a loop in it, which most data structures have.  Even a linked list typically has the child objects pointing back at the parent.  It's a loop.

If you want good GC, it needs to be supported directly in the language.

I just don't have any use for GC.  As I've said before, it doesn't really buy you much.  All it does is change a delete statement into a direct call to the destructor, which you have to write anyway, or you won't get the object out of it's relationships with other objects.  A better way is to automatically generate recursive destructors.  That's how it's done in relational databases (well, almost -- they do it in an interpreted way).  Just declare that a relationship delete's it children when the parent is deleted.  It couldn't be easier.

I've written languages that use GC (a Scheme interpreter).  I even used the cool in-place GC (not mark-and-sweep), that runs just as fast as stop-and-copy (faster if you take caching and paging into account). Sure GC can be fast.  But why complicate the language?  It just isn't worth it.

Bill

January 31, 2003

Re: Garbage Collection is bad... other comments

Posted by Ilya Minkov
in reply to Achillefs Margaritis

Ilya Minkov

Posted in reply to Achillefs Margaritis

Achillefs Margaritis wrote:
> C++ is so flexible!!! D should be that way also. Garbage collection should
> be an option of the programmer.

Walter considers C++ too flexible. And he doesn't rob you of C++, he gives you another language. And you can't implement an efficient GC in C. However, i can't see why such a C++ one can't be implemented on D - if you consider it better of course.

A GC can profit a lot from modified code generation. A compiler implementation could choose one of automatic memory management solutions, which would not requiere any changes in code, that is are equivalent to the programmer but having different efficiency characteristics.

 - simple mark-and-sweep collector, like D is using now is just the beginning - i have heard that it doesn't profit from code generation yet and hence keeps executable size small;
 - mark phase can be accelerated, if the compiler automatically generates an optimised marking method for each class. This is going to have a small size overhead, but reduce the total cost of GC to almost zero;
 - coloring collector: can work in background, and has very good run-time characteristics that it doesn't stop the main process, doesn't necessarily cause wild paging, and keeps memory footprint rather tight. It might slow down an application a bit, in part because it requieres wrapping pointer writes in a GC method. Might also increase the code size;
 - smart reference counting: i have stumbled over a document describing a reference counting memory manager, which can also deal with self-referencing structures. Its advantage is that the application uses exactly as much memory as it needs, but the slowdown is to be much higher than with a coloring GC, pauses might be the case, and a code size would increase significantly.

I'd say an implementation may be free to provide these or other automatic memory management systems, so far they are compliant with a GC interface.

I suppose that the memory manager interface is minimal on purpose, since things i have proposed before, like "mark-notification", are not possible with any managers except for mark-and-sweep.

I've read yet another, and this time better done comparison of "static" and "scripting" languages, and it turns out that the decisions of Perl and Python, to allow the programmer to do complex tasks in a simple way, usually shaves off half (or more) of the time requiered for writing an application. It may not be the most optimal way, but it is generally good for programme correctness. There, hundreds of programmers wrote a simple application in C, C++, Perl, Python, Ruby, Java, and TCL. It shows that though C and C++ might lead to an optimal solution, most often they lead to a very bad one since the programmer tries to accomplish a task in a limited time and simplifies the job. In that example, all entries done in languages with native, simple support for associative arrays were correct and run reasonbly fast, while the others lead to a number of broken programmes. It shows again that Walter's decision makes sense, since it makes it possible to write utilities in D about as simply and with more confidence, and resulting in faster execution, than in Perl. I'm pretty much sure D would devastatingly win this comparison.

-i.

January 31, 2003

Re: Garbage Collection is bad... other comments

Posted by Ilya Minkov
in reply to Bill Cox

Ilya Minkov

Posted in reply to Bill Cox

Hi.

Bill Cox wrote:
> The case tool (DataDraw) is pretty simple.  It's open-source and compiles under Visual C++ on Windows. 

Seems to be kept secret though. I was searching for it, and was unable
to find.

The description is though very interesting, i still don't seem to
understand some mechanics about object deletion. This all seems to be
about having a central object repository this or another way, and scope
of objects has to be known explicitly. That is, AFAIU it automates the normal C/C++/Delphi data management ideoms?

I still don't understand how this simple (and for D vital) example is
handled: string operations.

Imagine, some function has created a string, and returns it. Now, some
operations were done on the string: copying, slicing, and so on. Copying
and some cases of slicing (i guess) don't change the string, so it is
not being copied for efficiency reasons. Now, the same piece of data has
multiple users. Now some user makes a change in data, desiring to keep
the changes private. Then there are two cases:
  - Noone else owns the data. It can be changed in-place;
  - Data has another users. It has to be copied.
This means it needs to know whether the piece of data has further users.
Delphi solves this for strings by maintainig the reference counter.
GC simplifies it, by allowing for "late" copying of data and then it
would find out what data needs to be removed. It was possible to prove,
that GC is more efficient that reference counting in a general case of
tiny objects and strings.

Come to think of it: why did the X-Window servers constantly leak memory before they became a GC attached? Because they handle small structures, not cenralized. And how do you get to know that you don't need a certain part in a centralized data storage? Or explain me what i get wrong.

-i.

January 31, 2003

Re: Garbage Collection is bad... other comments

Posted by Burton Radons
in reply to Ilya Minkov

Burton Radons

Posted in reply to Ilya Minkov

Ilya Minkov wrote:
> A GC can profit a lot from modified code generation. A compiler implementation could choose one of automatic memory management solutions, which would not requiere any changes in code, that is are equivalent to the programmer but having different efficiency characteristics.

Let's see what happens when type-based scanning is in first; nobody knows what effect it'll have on scanning, but in a small utility I'm making (fractal browsing sample for dig) scanning takes 60 milliseconds but is scanning... 802 pointers over 1120001 dwords of data, so 1396 times too much data.  Once that is pared down I predict I'll be able to call a full collection every second without having to worry about delays in almost every program.

The exception will be VERY heavily pointer-oriented programs.  We're talking Lisp level and programs where you have an "Int" class.  I don't care about those.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation