February 01, 2003
This is why functional languages always copy data.

I would think that the correct semantics is to always copy on write unless the compiler can tell for absolute sure that there are no other users of the data, in which case it can optimize to do the operation in place.  An array passed by reference would be exempt from this;  however D currently has no way at all to pass arrays by value, so we're forced to deal with the "is it mine, is it yours, is it ours" mess on every front (user code, library code, system code).

Ownership is the kind of thing that can't be verified in a precondition or postcondition;  there's no way to tell in general without knowing the entire program.  That is a job better suited to a machine than a human programmer.

That is why I would like the default to be pass by value, copy on write, *unless* either the programmer requests pass by reference or the compiler can tell for sure, due to scoping/data visibility/data flow analysis that it's safe not to copy on write.  Err on the side of safety.

Sean

"Ilya Minkov" <ilminkov@planet-interkom.de> wrote in message news:b1ek0e$u42$1@digitaldaemon.com...
> Hi.
>
> Bill Cox wrote:
> > The case tool (DataDraw) is pretty simple.  It's open-source and compiles under Visual C++ on Windows.
>
> Seems to be kept secret though. I was searching for it, and was unable to find.
>
> The description is though very interesting, i still don't seem to understand some mechanics about object deletion. This all seems to be about having a central object repository this or another way, and scope of objects has to be known explicitly. That is, AFAIU it automates the normal C/C++/Delphi data management ideoms?
>
> I still don't understand how this simple (and for D vital) example is
> handled: string operations.
>
> Imagine, some function has created a string, and returns it. Now, some
> operations were done on the string: copying, slicing, and so on. Copying
> and some cases of slicing (i guess) don't change the string, so it is
> not being copied for efficiency reasons. Now, the same piece of data has
> multiple users. Now some user makes a change in data, desiring to keep
> the changes private. Then there are two cases:
>    - Noone else owns the data. It can be changed in-place;
>    - Data has another users. It has to be copied.
> This means it needs to know whether the piece of data has further users.
> Delphi solves this for strings by maintainig the reference counter.
> GC simplifies it, by allowing for "late" copying of data and then it
> would find out what data needs to be removed. It was possible to prove,
> that GC is more efficient that reference counting in a general case of
> tiny objects and strings.
>
> Come to think of it: why did the X-Window servers constantly leak memory before they became a GC attached? Because they handle small structures, not cenralized. And how do you get to know that you don't need a certain part in a centralized data storage? Or explain me what i get wrong.
>
> -i.
>


February 02, 2003
"Sean L. Palmer" <seanpalmer@directvinternet.com> wrote in message news:b1ge7e$29kd$1@digitaldaemon.com...
> This is why functional languages always copy data.
>
> I would think that the correct semantics is to always copy on write unless the compiler can tell for absolute sure that there are no other users of
the
> data, in which case it can optimize to do the operation in place.  An
array
> passed by reference would be exempt from this;  however D currently has no way at all to pass arrays by value, so we're forced to deal with the "is
it
> mine, is it yours, is it ours" mess on every front (user code, library
code,
> system code).

it is, but there is an issue with threading,

if the function has a big delay, say it signal one event and then waits on
an other event, before reading or writing the item, which it expects is a
copy, and another thread is waiting for that first signal modifies the
original (which is was passed by ref or owns)
and signals the second event, the value with be the changed and not the
previous.
this can be avoided by copying in the function prolog if a write is detected
any where in the function, or programmer understanding the effects of lazy
copy on write. arrays already have odd 'in' behaviour, so I guess its o.k.
to perform a lazy copy as I doubt the condition would occur often and as
long as there is a way to force a copy
array.length += 0 ; // for instance (safe to do on a 0 length array unlike
array[0] = array[0];)

Mike.



February 02, 2003
"Sean L. Palmer" <seanpalmer@directvinternet.com> wrote in message news:b1ge7e$29kd$1@digitaldaemon.com...
> This is why functional languages always copy data.
>
> I would think that the correct semantics is to always copy on write unless the compiler can tell for absolute sure that there are no other users of
the
> data, in which case it can optimize to do the operation in place.  An
array
> passed by reference would be exempt from this;  however D currently has no way at all to pass arrays by value, so we're forced to deal with the "is
it
> mine, is it yours, is it ours" mess on every front (user code, library
code,
> system code).

it is, but there is an issue with threading,

if the function has a big delay, say it signal one event and then waits on
an other event, before reading or writing the item, which it expects is a
copy, and another thread is waiting for that first signal modifies the
original (which is was passed by ref or owns)
and signals the second event, the value with be the changed and not the
previous.
this can be avoided by copying in the function prolog if a write is detected
any where in the function, or programmer understanding the effects of lazy
copy on write. arrays already have odd 'in' behaviour, so I guess its o.k.
to perform a lazy copy as I doubt the condition would occur often and as
long as there is a way to force a copy
array.length += 0 ; // for instance (safe to do on a 0 length array unlike
array[0] = array[0];)

Mike.





February 02, 2003
Hi, Ilya.

Ilya Minkov wrote:
> Hi.
> 
> Bill Cox wrote:
> 
>> The case tool (DataDraw) is pretty simple.  It's open-source and compiles under Visual C++ on Windows. 
> 
> 
> Seems to be kept secret though. I was searching for it, and was unable
> to find.

Try: www.viasic.com/download/datadraw.tar.gz

There ins't any active community working on it.  The code dates back to 1991, and is a bit dated, and it's gotten pretty hacked up over the years.

Tools like these generally don't get used much.  I find it really odd. It seems to me that programmers aren't really focused on productivity. The entire field of CASE is full of good tools that no one ever uses.

> The description is though very interesting, i still don't seem to
> understand some mechanics about object deletion. This all seems to be
> about having a central object repository this or another way, and scope
> of objects has to be known explicitly. That is, AFAIU it automates the normal C/C++/Delphi data management ideoms?

Most of the code generators generate simple custom object databases. There are ways, for example, of traversing all the objects of a given class.  However, there's really no magic here.  It just outputs code that you might be tempted to write, given the class definitions and relationships.  The nice thing is that it automates writing code that you might not have time to get right every time, like allocating memory for objects of a given class in pages, and then sub-allocating individual objects out of each page, and maintaining free lists for the class.

> I still don't understand how this simple (and for D vital) example is
> handled: string operations.
> 
> Imagine, some function has created a string, and returns it. Now, some
> operations were done on the string: copying, slicing, and so on. Copying
> and some cases of slicing (i guess) don't change the string, so it is
> not being copied for efficiency reasons. Now, the same piece of data has
> multiple users. Now some user makes a change in data, desiring to keep
> the changes private. Then there are two cases:
>   - Noone else owns the data. It can be changed in-place;
>   - Data has another users. It has to be copied.
> This means it needs to know whether the piece of data has further users.
> Delphi solves this for strings by maintainig the reference counter.
> GC simplifies it, by allowing for "late" copying of data and then it
> would find out what data needs to be removed. It was possible to prove,
> that GC is more efficient that reference counting in a general case of
> tiny objects and strings.

Strings fall into a difficult space.  They're not really quit full blown
objects that represent something concrete.  DataDraw doesn't deal with
them, other than to allow you to copy them in and out of objects in the
database.

Strings, vectors, and matricies are "value" classes, rather than "object" classes.  "Values" tend to live on the stack, and go away at the end of scope.  "Objects" tend to live on the heap and have complex relationships to other objects.  The problem with strings is having to copy all that data around as you would a simple int value.  Yuk.

> Come to think of it: why did the X-Window servers constantly leak memory before they became a GC attached? Because they handle small structures, not cenralized. And how do you get to know that you don't need a certain part in a centralized data storage? Or explain me what i get wrong.

Objects in the centralized storage go away when deleted.  It's really not much different than in a GC based system, since you still need to remove objects from their relationships to other objects when you want them to go away.  The anoying value objects with associated memory are a problem, but not for the centralized storage if you don't allow them there.

String manipulation is typically a significant focus for both language design and benchmarks.  Strings are a tricky issues that I feel tends to distract us from other important language applications.  D seems to have strings nailed.

Bill

February 04, 2003
"Achillefs Margaritis" <axilmar@b-online.gr> wrote in message news:b1ce36$1ir3$1@digitaldaemon.com...
> C++ can be fully garbage-collected. Just make a GC super class of all garbage collected classes that upon construction adds the object to a
global
> internal list, then clear the list from time to time.

I know, I use GC in my C++ programs. There are some problems, though, because the C++ compiler provides no type information to the runtime. The way a C++ compiler is implemented restricts the kind of GC algorithms that can be used.


February 04, 2003
"Burton Radons" <loth@users.sourceforge.net> wrote in message news:b1eo7q$15m6$1@digitaldaemon.com...
> Let's see what happens when type-based scanning is in first; nobody knows what effect it'll have on scanning, but in a small utility I'm making (fractal browsing sample for dig) scanning takes 60 milliseconds but is scanning... 802 pointers over 1120001 dwords of data, so 1396 times too much data.  Once that is pared down I predict I'll be able to call a full collection every second without having to worry about delays in almost every program.
>
> The exception will be VERY heavily pointer-oriented programs.  We're talking Lisp level and programs where you have an "Int" class.  I don't care about those.

I have a lot of ideas about improving GC, such as making it a copying generational collector, implementing hardware write barriers, and using type info to mark whole regions as "not containing any pointers", etc. There is a LOT that can be done to boost performance.


February 05, 2003
"Sean L. Palmer" <seanpalmer@directvinternet.com> wrote in message news:b1ge7e$29kd$1@digitaldaemon.com...
> I would think that the correct semantics is to always copy on write unless the compiler can tell for absolute sure that there are no other users of
the
> data, in which case it can optimize to do the operation in place.  An
array
> passed by reference would be exempt from this;  however D currently has no way at all to pass arrays by value, so we're forced to deal with the "is
it
> mine, is it yours, is it ours" mess on every front (user code, library
code,
> system code).

The best solution is, as you suggest, copy on write. In D, you don't really need to know who owns it. Just practice copy on write. Any extra copies will get cleaned up by the gc.


February 05, 2003
Walter wrote:
> 
> The best solution is, as you suggest, copy on write. In D, you don't really
> need to know who owns it. Just practice copy on write. Any extra copies will
> get cleaned up by the gc.
> 

Granted the GC is on. If it's off, these all will pollute the memory. That's why a was arguing for 2 separate heaps, so that objects are created on GCed heap by default, or can (if the programmer is confident) created on a heap completely ignored by a GC.
I know that being on GC heap shouldn't be a performance hit, after GC is improved. Except for the cases when objects are very pointer-rich (trees, lists and such), but in some cases it just doesn't make sense to scan objects, because the programmer has decided to follow his old C habits and they are all referenced from somewhere anyway.
These should also be leak- and dangler-guarded at debug compile, and completely ignored by GC at release compile.

February 05, 2003
"Ilya Minkov" <midiclub@8ung.at> wrote in message news:b1rrn0$gps$1@digitaldaemon.com...
> Granted the GC is on. If it's off, these all will pollute the memory.
> That's why a was arguing for 2 separate heaps, so that objects are
> created on GCed heap by default, or can (if the programmer is confident)
> created on a heap completely ignored by a GC.
> I know that being on GC heap shouldn't be a performance hit, after GC is
> improved. Except for the cases when objects are very pointer-rich
> (trees, lists and such), but in some cases it just doesn't make sense to
> scan objects, because the programmer has decided to follow his old C
> habits and they are all referenced from somewhere anyway.
> These should also be leak- and dangler-guarded at debug compile, and
> completely ignored by GC at release compile.

I hear you, but I really think it's worth just taking advantage of the GC and not worrying about it.


February 09, 2003
Walter wrote:
> "Burton Radons" <loth@users.sourceforge.net> wrote in message
> news:b1eo7q$15m6$1@digitaldaemon.com...
> 
>>Let's see what happens when type-based scanning is in first; nobody
>>knows what effect it'll have on scanning, but in a small utility I'm
>>making (fractal browsing sample for dig) scanning takes 60 milliseconds
>>but is scanning... 802 pointers over 1120001 dwords of data, so 1396
>>times too much data.  Once that is pared down I predict I'll be able to
>>call a full collection every second without having to worry about delays
>>in almost every program.
>>
>>The exception will be VERY heavily pointer-oriented programs.  We're
>>talking Lisp level and programs where you have an "Int" class.  I don't
>>care about those.
> 
> 
> I have a lot of ideas about improving GC, such as making it a copying
> generational collector, implementing hardware write barriers, and using type
> info to mark whole regions as "not containing any pointers", etc. There is a
> LOT that can be done to boost performance.

Implementing hardware write barriers?  I didn't think that the hardware was there on most memory controllers.

Evan