September 07, 2008
Jarrett Billingsley:

> ..Or just, you know, arrays.  Since that's exactly what they are.

Right, but I don't want to use them because dynamic arrays always set their memory to .init. The point of that data structure is to have a fast allocation too (it's not necessary to clean it if you know how many items you are using and the items don't contain a GC-managed pointer (see at the bottom of this post)).


> > addRoot(new_block.next);
> > addRoot(new_block.data);
> Adding roots is certainly possible but it's not very elegant.

It's not just inelegant: it seems the program doesn't work yet:

import std.gc: malloc, hasNoPointers, addRoot, fullCollect, hasPointers; import std.stdio: writefln;

int min(int a, int b) {
    return a <= b ? a : b;
}


class IntStack {
    struct Block {
        int len;
        Block* next;
        int* data;
    }

    Block* list_head, list_tail;
    int total_len;
    int used_last_block; // how many items are used in the last (tail) block


    this() {
        this.list_head = block_alloc(4);
        this.list_tail = list_head;
    }


    ubyte* mem_alloc(int nbytes) {
        auto p  = cast(ubyte*)malloc(nbytes);
        if (p is null)
            throw new Error("not enough mem");
        hasNoPointers(p);
        return p;
    }


    Block* block_alloc(int nints) {
        auto new_block = cast(Block*)mem_alloc(Block.sizeof);
        new_block.data = cast(int*)mem_alloc(nints * int.sizeof);
        new_block.len = nints;
        new_block.next = null;
        addRoot(new_block.next);
        addRoot(new_block.data);
        return new_block;
    }


    final void opCatAssign(int x) {
        if (this.used_last_block == this.list_tail.len) {
            // then last block is full and we must create a new one
            this.used_last_block = 0;
            auto new_block = block_alloc(this.list_tail.len * 2);

            // append to the linked list
            this.list_tail.next = new_block;
            this.list_tail = new_block;
        }

        this.list_tail.data[this.used_last_block] = x;
        this.used_last_block++;
        this.total_len++;
    }


    int[] toarray() {
        if (!this.total_len)
            return null;

        int* result_ptr = cast(int*)mem_alloc(this.total_len * int.sizeof);
        int[] result = result_ptr[0 .. this.total_len];

        Block* aux_ptr = this.list_head;
        int pos;

        while (aux_ptr != null) {
            // in the last block copy only up to the total_len
            int ncopy = min(aux_ptr.len, this.total_len - pos);
            writefln(ncopy, " ", aux_ptr.len); // ****

            result[pos .. pos + ncopy] = aux_ptr.data[0 .. ncopy];
            pos += ncopy;
            aux_ptr = aux_ptr.next;
        }

        return result;
    }
}


void main() {
    auto st = new IntStack();

    for (int i; i < 70_000; i++)
        st ~= i;

    fullCollect();
    auto aux = st.toarray;
}




> It means that that object will not be collected until you remove it as a root manually.  You can very easily end up with a memory leak that way.

Uhm... I don't understand fully: do you mean roots don't become deleted when they have nothing that points to them?
The docs say:
"Roots are references to memory allocated by the collector that are maintained in memory outside the collector pool."

If roots don't become deleted when they have nothing that points to them then I don't want to use roots in my code. What's the way to add a pointer to the pool of pointers scanned by the GC that become deleted when nothing points to it? :-)
(In this data structure I can remove roots manually during inside the ~this(), but I'd prefer to not have to).


>The simplest implementation, though, is to use arrays, like you thought earlier.

I'm trying to learn how to use the GC because I'd like to create quite more complex data structures.
I have already solved this specific stack problem in a different way, that's the faster that I have found (it just keeps a single memory zone that I realloc() to make it grow geometrically. This solution is the faster among many different ones, I don't know why).

I have created a recursive template that tells if a data structure contains a reference. If they aren't present, I don't need to set memory to their .init. My template is not perfect, because it doesn't tell apart normal references from GC-references, so if you create a stack (ArrayBuilder) of not-GC pointers, they become set to null even if they don't need such thing, wasting a bit of time. On the other hand the only way to tell if a reference is GC-managed is to use (at runtime) TypeInfo.flags(), but someone has told that it's buggy (Phobos).


>You don't need to muck with the GC really at all in this case, other than to set the allocated arrays as having no pointers.<
> ...
> auto d = new int[nints];
> hasNoPointers(d.ptr);
> ...

That hasNoPointers is quite useless, because the GC knows that 'd' is a dynamic array of ints, so it knows its memory doesn't contain GC-managed pointers. Isn't muching with the GC nice? :-)

Thank you very much,
bearophile
September 07, 2008
On Sun, Sep 7, 2008 at 12:59 PM, bearophile <bearophileHUGS@lycos.com> wrote:
> Jarrett Billingsley:
>
>> ..Or just, you know, arrays.  Since that's exactly what they are.
>
> Right, but I don't want to use them because dynamic arrays always set their memory to .init. The point of that data structure is to have a fast allocation too (it's not necessary to clean it if you know how many items you are using and the items don't contain a GC-managed pointer (see at the bottom of this post)).

The array is only initialized when you allocate it.  So, just allocate a raw block of uninitialized memory and slice it to get an array.  I'm not so much trying to say that using a struct is the wrong way to do it, but rather that it's pointless to use a struct { int* data, size_t length } when that's _precisely_ what an array is, and arrays give you nicer syntax.

> Uhm... I don't understand fully: do you mean roots don't become deleted when they have nothing that points to them?
> The docs say:
> "Roots are references to memory allocated by the collector that are maintained in memory outside the collector pool."

I think what you should be using instead is addRange.  addRoot more or less tells the GC "never collect memory referenced by this pointer", but it doesn't actually scan the memory _pointed to_ by the roots for pointers.  addRange tells the GC "scan this range of memory for pointers", which is what you want to tell the GC to do for Blocks. But then you run into another problem -- addRange and removeRange are terribly inefficient.  So.. stop making things hard on yourself ;)

> If roots don't become deleted when they have nothing that points to them then I don't want to use roots in my code. What's the way to add a pointer to the pool of pointers scanned by the GC that become deleted when nothing points to it? :-)
> (In this data structure I can remove roots manually during inside the ~this(), but I'd prefer to not have to).

Uh, you don't have to do anything special.  You just use "new" to allocate a block of memory, and if it could contain pointers, and that block of memory is transitively accessible from the stack/registers/globals, then any pointers it contains will be scanned, and any memory pointed to by it will not be freed.

> I'm trying to learn how to use the GC because I'd like to create quite more complex data structures.

I think you're making things way more complicated than they need to be.

> I have created a recursive template that tells if a data structure contains a reference. If they aren't present, I don't need to set memory to their .init. My template is not perfect, because it doesn't tell apart normal references from GC-references, so if you create a stack (ArrayBuilder) of not-GC pointers, they become set to null even if they don't need such thing, wasting a bit of time. On the other hand the only way to tell if a reference is GC-managed is to use (at runtime) TypeInfo.flags(), but someone has told that it's buggy (Phobos).

You are obsessed with performance.  Please stop it, it's not everything.

>>You don't need to muck with the GC really at all in this case, other than to set the allocated arrays as having no pointers.<
>> ...
>> auto d = new int[nints];
>> hasNoPointers(d.ptr);
>> ...
>
> That hasNoPointers is quite useless, because the GC knows that 'd' is a dynamic array of ints, so it knows its memory doesn't contain GC-managed pointers. Isn't muching with the GC nice? :-)

Ah.

Well, you can leave it in if you replace "new int[nints]" with "cast(int[])std.gc.malloc(nints * int.sizeof)".  That'll get you an uninitialized array, but I think you still need to explicitly set the "has no pointers" flag.
September 07, 2008
Jarrett Billingsley:
> You are obsessed with performance.  Please stop it, it's not everything.

I use D only when/because I need more performance, when performance isn't strictly essential, I always use Python (or other languages).
The <vector> of the C++ STL that I can use with MinGW is 30-40% faster than the faster code I have written for the stack in D. I think the designers of that vector are more obsessed than me. If most D programmers don't care for efficient data structures, and they are happy with the current bad ones (I am not talking about Tango now), then maybe I can just stop bothering with D and stick with the much uglier/crancky/bug-prone C++.

bearophile
September 07, 2008
On Sun, 07 Sep 2008 14:11:09 -0400, Jarrett Billingsley wrote:

> You are obsessed with performance.  Please stop it, it's not everything.

Being obsessed with performance is a good thing if it's a piece of code that is used often and can be wrapped in a library.

Such piece of library does often yet exist, but sometimes it does not.
September 07, 2008
On Mon, Sep 8, 2008 at 1:59 AM, bearophile <bearophileHUGS@lycos.com> wrote:
> Jarrett Billingsley:
>
>> ..Or just, you know, arrays.  Since that's exactly what they are.
>
> Right, but I don't want to use them because dynamic arrays always set their memory to .init. The point of that data structure is to have a fast allocation too (it's not necessary to clean it if you know how many items you are using and the items don't contain a GC-managed pointer (see at the bottom of this post)).

At some point you used to be able to create variables using an "=void" initializer to prevent the automatic initialization.  Is that no longer the case?

--bb
September 07, 2008
Bill Baxter:
> At some point you used to be able to create variables using an "=void" initializer to prevent the automatic initialization.  Is that no longer the case?

I think the = void can be used with static arrays/variables, but not with dynamic arrays.

Note: I'll write a better answer to the gentle Jarrett Billingsley, sorry for my nervous answer of before :-)

Bye,
bearophile
September 07, 2008
On Mon, Sep 8, 2008 at 6:52 AM, bearophile <bearophileHUGS@lycos.com> wrote:
> Bill Baxter:
>> At some point you used to be able to create variables using an "=void" initializer to prevent the automatic initialization.  Is that no longer the case?
>
> I think the = void can be used with static arrays/variables, but not with dynamic arrays.

If there is no way to do the same thing with dynamic arrays, I think that's a hole in the language.

--bb
September 07, 2008
On Sun, Sep 7, 2008 at 5:58 PM, Bill Baxter <wbaxter@gmail.com> wrote:
> On Mon, Sep 8, 2008 at 6:52 AM, bearophile <bearophileHUGS@lycos.com> wrote:
>> Bill Baxter:
>>> At some point you used to be able to create variables using an "=void" initializer to prevent the automatic initialization.  Is that no longer the case?
>>
>> I think the = void can be used with static arrays/variables, but not with dynamic arrays.
>
> If there is no way to do the same thing with dynamic arrays, I think that's a hole in the language.
>
> --bb
>

There's no way to do it with any heap-allocated value, which does seem a bit like a hole to me.
September 07, 2008
Jarrett Billingsley:
> There's no way to do it with any heap-allocated value, which does seem a bit like a hole to me.

I think there's both a syntax problem (what syntax to use?), and the fact that when possible D wants to avoid programming mistakes, so it wants to initialize memory to a clean state (this is even more important for the GC).

A first possible syntax?
auto a = new int[10] = void;

That syntax is also usable to replace this:
auto a = new int[10];
a[] = 5;

With:
auto a = new int[10] = 5;

But maybe this isn't important enough.

Bye,
bearophile
September 08, 2008
Jarrett Billingsley:

>I think what you should be using instead is addRange.  addRoot more or less tells the GC "never collect memory referenced by this pointer", but it doesn't actually scan the memory _pointed to_ by the roots for pointers.  addRange tells the GC "scan this range of memory for pointers", which is what you want to tell the GC to do for Blocks. But then you run into another problem -- addRange and removeRange are terribly inefficient.<

Uhm... saddening.


>Uh, you don't have to do anything special.  You just use "new" to allocate a block of memory, and if it could contain pointers, and that block of memory is transitively accessible from the stack/registers/globals, then any pointers it contains will be scanned, and any memory pointed to by it will not be freed.<

That may be not enough to design a bit more complex data strutures, I don't know, we'll see.


>I think you're making things way more complicated than they need to be.<

For this toy program yes, but to create more complex data structures I'll need to learn more things about the GC.

Bye,
bearophile