An interesting read: Scalable Computer Programming Languages (page 4)

August 18, 2003
Re: An interesting read: Scalable Computer Programming Languages
Posted by Mike Wynn
in reply to Achilleas Margaritis
Permalink
Mike Wynn
Posted in reply to Achilleas Margaritis
Permalink
"Achilleas Margaritis" <axilmar@b-online.gr> wrote in message news:bg0g9d$286f$1@digitaldaemon.com...
>
> But if it is a thread, it means that for every pointer that it can be accessed by the GC, it has to provide synchronization. Which in turn,
means,
> provide a mutex locking for each pointer. Which in turn means to enter the kernel a lot of times. Now, a program can have thousands of pointers lying around. I am asking you, what is the fastest way ? to enter the kernel
1000
> times to protect each pointer or to pause a little and clean up the memory
?
> I know what I want. Furthermore, a 2nd thread makes the implementation terribly complicated. When the Java's GC kicks in, although in theory running in parallel, the program freezes.
this is not the case, there are ways to do fully concurrent GC, without any
locks on pointers
(you nead a write barrier (check that unwalked object is not being put into
a fully walked object) and return barrier (check when you leave a function))
but they only need to be active if the GC is active)

> GC is a mistake, in my opinion. I've never had memory leaks with C++,
since
> I always 'delete' what I 'new'.
I fear it is you who are mistaken, your faith that you've deleted all you've newed implies to me they you have either not found them, have only worked on a project that requires simple data structures, or use a large amount of stack based object and/or lots of copying.

GC is a good idea, it (assuming you trust the GC writer) gives you certainty that you not only will your objects get cleaned up, but that you will never delete an object that you should not have (or delete something twice), once you start working with data structures that have more than one "owner" GC allows you to design much more compact structures and potentially faster code (no copies, not manual checks, or ref couts etc)

> But if you have to hand-tune the allocation type, it breaks the promise of ''just only allocate the objects you want, and forget about everything else". And this "hand-tuning" that you are saying is a tough nut to crack. For example, a lot of code goes into our Java applications for reusing the objects. Well, If I have to make such a big effort to "hand-tune", I
better
> take over memory allocation and delete the objects myself.
you may find that this was good on older JVM's but more modern ones hold
cache chains of heap blocks so allocation of frequently used objects is fast
(there should be a block of the right size al ready waiting for use) and by
holding a set of object "live" but outside your program you are doing two
things that may not be desireable, one, your back to the situation that you
may have a reference held to the object you have manually cached so when
"re-alocated" someone else will get a shock.
and you are also potentially increasing the work the GC does by having a
large root set (all your cached objects)

> And I am talking again about real-life programming languages.
what about real-life programming ?

> > > But GC uses reference counting. If it did not, how the GC mechanism
will
> > > know if something is referenced or not ?
> >
> > No, it doesn't. A GC tracks allocation of all objects, and whenever the time comes it scans the stack for pointers to allocated objects. These are in turn scannned for pointers. Each object which GC comes across in this process, is marked as "reachable". Afterwards, all objects which have not been marked can be deleted.
>
> It can't be using a stack, since a stack is a LIFO thing. Pointers can be nullified in any order. Are you saying that each 'pointer' is allocated
from
> a special area in memory ? if it is so, what happens with member pointers
?
> what is their implementation in reality ? Is a member pointer a pointer to
a
> pointer in reality ? if it is so, it's bad. Really bad.
no one mentioned stacks. the GC subsytems know if a pointer size area of memory points to an object or is a value of some kind.

> And how does the GC marks an object as unreachable ? it has to count how many pointers track it. Otherwise, it does not know how many references
are
> there to it. So, it means reference counting, in reality.
I think you need to spend some time and read up on GC, its obvious that you
do not understand the basics.
good GC's do not reference count, they do not care howmany ref's all they
care about it that there is more than 0 refs.
think of gc as a process that takes a piece of string and ties it to all the
object it can find, by starting at the "root set" which is all statics and
the stacks of any running threads. then once its "walked" all objects it
pulls the piece of string, anything not attached it obviously garbage and it
is "unreachable"

> If it does not use any way of reference counting as you imply, it has
first
> to reset the 'reachable' flag for every object, then scan pointers and set the 'reachable' flag for those objects that they have pointers that point
to
> them. And I am asking you, how is that more efficient than simple
reference
> counting (which is local, i.e. only when a new pointer is
created/destroyed,
> the actual reference counter integer is affected).
this is called "mark and sweep", first you mark all objects (as described
above)
the mark phase requires one of two things, either a method of determining is
a pointer sized (and aligned usually) value is a pointer to object or not
(this is what D does)
or by having "ref bits" somewhere on the stack and within the object header
etc to determine the "object tree".
next the sweep, now you walk the heap(s) which it a linear walk either
resetting the objects header to "unwalked" or adding to the free list if it
is still "unwalked".

> So, as you can see, automated refcounting works like a breeze. And you
also
> get the benefit of determinism: you know when destructors are called; and then, you can have stack objects that, when destroyed, do away with all
the
> side effects (for example, a File object closes the file automatically
when
> destroyed).
GC are more efficient than ref counting as they only do the work when
needed.
D has "auto" objects to give you this determinism.
personally I prefer "try, catch, finally" for doing close on exit, using
stack objects can cause problems if you pass them to someone else library
code (and it for some reason holds onto them)

> >
> > Thus, it turns out that "total" GC is significantly less overhead than "total" reference counting.
>
> Nope, it does not, as I have demonstrated above.
you have not, and I doubt you ever will, I too was a sceptic, until I worked closely with a GC designer for a while and started to see how GC's actually reduce the work done to perform automated resource management.

> If the working set is not in the cache, it means a lot of cache misses,
thus
> a slow program. Refcounting only gives 4 bytes extra to each object. If
you
> really want to know when to delete an object, I'll tell you the right moment: when it is no more referenced. And how do you achieve that ? with refcounting.
GC can give 0 overhead (apart from the 4 bytes needed by the heap manager to
store length)
you only need 2 bits for gc info so having 4 byte aligned object starts
(which you want anyway [or 16 byte for speed on some systems]) gives you 2
free bits in the heap length field
but most also do have a header, and again its usually 4 bytes.

> As I told earlier, the trick is to use refcounting where it must be used.
In
> other words, not for pointers allocated on the stack.
ref counting has to do more work than GC (more spread out and deterministic and it is true in hard real time env, this determinism can be better).

> >
> > In Sather, e.g. INT is a library object, however, because it's immutable it works just as fast as C int. And in fact resolves one-to-one to it, with stack storage, copying, and all. You can create your own types which behave like that easily.
>
> Real-life programming languages only, please. You still don't give me an example of how initialization fails with aliasing.
what do you call real world ? I know commerial projects that use Python,Perl
and Java.
what makes Sather "not real world"

> >
> > How come it doesn't have memoty leaks? Sorry, i don't know ADA. Either it uses a kind of automatic memory management, or it *does* have memory leaks. What kind of constraint is there? I have some Delphi experience, and Pascal/Delphi is quite prone to leaks, evenif they are not so often due to some reason, be it possibilities for better program organisation or similar things.
>
> At first I thought too that ADA was similar to PASCAL. Well, it is syntactically similar, but that's about it. It's pointer usage is constrained. For example, you can do pointer arithmetic, but it is bounds-checked. You can't have pointer casting, unless it is explicitely specified as an alias on the stack.
>
You can haemorage memory in delphi, just as easy as you can in C++, any you can free stuff you should not from under your own feet too. pointer might be a bit more restricted in their use, but thats not the cause of most memory leaks.

> All the other languages are well on a theoritical basis. The only problem
I
> see with C++ is the lack of standard (and free!!!) libraries across different operating systems, especially for the UI.
I guess GTK being on all Unix, Mac and Win32 is not enough, do a web search
for VxWindow or VGui
and I believe you can get a MFC for Mac and Unix (you'll have to pay
though).
Forums