Jump to page: 1 2
Thread overview
My Language Feature Requests
Dec 22, 2007
Craig Black
Dec 22, 2007
Craig Black
Dec 22, 2007
Craig Black
Dec 23, 2007
Christopher Wright
Dec 23, 2007
Craig Black
Dec 23, 2007
Craig Black
Dec 23, 2007
Christopher Wright
Dec 23, 2007
Craig Black
Dec 23, 2007
Christopher Wright
Dec 23, 2007
Craig Black
Dec 23, 2007
Christopher Wright
Dec 24, 2007
Craig Black
Dec 23, 2007
Frits van Bommel
Dec 23, 2007
Christopher Wright
Dec 23, 2007
Christopher Wright
Dec 23, 2007
Craig Black
Dec 23, 2007
Craig Black
Dec 23, 2007
Christopher Wright
Dec 23, 2007
Craig Black
December 22, 2007
So now that the const thing seems like it might be settled, I would like to put my vote in as to what features should be included next.  I can only think of 2 features that would be high on my priority list.  My votes go for the following language features that would enhance the performance of D.

1)  Adding better support for structs including ctors/dtors, inheritance, copy semantics, etc.  This would allow for more efficient data structures that perform heap allocation without relying on the GC.  Making applications GC-lite is fastest path to high-performance D applications.  The big reason I want this is that I would then be able to write an efficient array template that does not rely on GC.  Since I use arrays so much, this would provide a huge performance improvement for me.

2) Adding language features that would allow for a moving GC.  A modern, moving GC would also be a huge performance win.  I think we would have a safety problem if we currently implemented a moving GC.  Languages that have moving GC greatly restrict what can be done with pointers.  We need to provide a syntax that will allow pointers to be used when memory is explicitly managed, but disallow pointers for GC memory.

So, here's one idea for making D more safe for moving GC.

a) Disallow overloading new and delete for classes, and make classes strictly for GC, perhaps with an exception for classes instantiated on the stack using scope.
b) Allow new and delete to work with structs, and allocate them on the malloc heap.  I would still want to be able to override new and delete for structs, specifically to be able to use nedmalloc.

Then the compiler could disallow taking the address of a class field, since we know the resulting pointer would pointer to the GC heap.  Note that this would be a compile-time check, and so would not degrade run-time performance.

Another idea would be to be able to pin GC objects.  C# allows this via the fixed keyword.  In D, it could work like this:

a) Preceding a pointer declaration with fixed would allow that pointer to take the address in the GC heap.
b) Pointer arithmetic would be disallowed for fixed pointers.
c) A fixed pointer will mark the corresponding GC object as "pinned" so that the GC knows not to move the object.
d) When the fixed pointer is changed or deallocated, it will unpin the object, and pin any new object that it refers to.

The fixed pointer will have to know whether or not it points to GC memory so that it doesn't pin non-GC objects.  Using the first idea, we can determine at compile time whether a pointer points to the heap or not.

Yes, this would be a big change, but not as big as const IMO.  I feel if any feature warrants breaking some code, it would be high-performance GC.  But maybe someone else can find a solution that doesn't break compatibility.

Thoughts?

-Craig


December 22, 2007
> Using the first idea, we can determine at compile time whether a pointer points to the heap or not.

Another option would be to only allow fixed pointers to point to the heap. It might simplify the implementation.

December 22, 2007
"Craig Black" <craigblack2@cox.net> wrote in message news:fkk79p$226d$1@digitalmars.com...
>> Using the first idea, we can determine at compile time whether a pointer points to the heap or not.
>
> Another option would be to only allow fixed pointers to point to the heap. It might simplify the implementation.

When I said "heap", I meant the GC heap of course. 

December 23, 2007
Craig Black wrote:
> 2) Adding language features that would allow for a moving GC.  A modern, moving GC would also be a huge performance win.  I think we would have a safety problem if we currently implemented a moving GC.  Languages that have moving GC greatly restrict what can be done with pointers.  We need to provide a syntax that will allow pointers to be used when memory is explicitly managed, but disallow pointers for GC memory.
> 
> So, here's one idea for making D more safe for moving GC.
> 
> a) Disallow overloading new and delete for classes, and make classes strictly for GC, perhaps with an exception for classes instantiated on the stack using scope.

Don't see the point of this. You'd map a single old value to a single new value...or map an old range to a new one. You're changing one equality check and one assignment to two comparisons and an addition. And this is when you're looking through the entire address space of the program.

> b) Allow new and delete to work with structs, and allocate them on the malloc heap.  I would still want to be able to override new and delete for structs, specifically to be able to use nedmalloc.

This can allow polymorphism for structs, actually, but it is a bit of a performance hit.

> Then the compiler could disallow taking the address of a class field, since we know the resulting pointer would pointer to the GC heap.  Note that this would be a compile-time check, and so would not degrade run-time performance.

Ugly.

What do you do for taking the address of a class variable? Well, okay, you have to take the address of the reference; you can't take the address of the variable directly. The current method is ugly and undefined behavior:
*cast(void**)&obj;

And you can assume that all pointers that point to that region of memory have to be moved.

The problem is granularity.

class Foo {
   Foo next;
   size_t i, j, k, l, m, n, o, p;
}

Here, the current regime would mark *Foo as hasPointers. If i, j, k, l, m, n, o, or p just happened to look like a pointer, they'd be changed. You'd need to find where each object begins, then you'd need to go through the offset type info to see which elements are really pointers.

Since you're running the garbage collector, that's doable, if the offset type info is currently available (I think it wasn't, last I checked, but I don't really recall).

> Another idea would be to be able to pin GC objects.  C# allows this via the fixed keyword.  In D, it could work like this:
> 
> a) Preceding a pointer declaration with fixed would allow that pointer to take the address in the GC heap.
> b) Pointer arithmetic would be disallowed for fixed pointers.

Why?

fixed float* four_floats = std.gc.malloc(4 * float.sizeof);
fixed float* float_one = four_floats;
fixed float* float_two = four_floats + 1;
fixed float* float_three = four_floats + 2;
fixed float* float_four = four_floats + 3;

Seems fine to me. You might go beyond the allocated space, but that's already undefined behavior.

> c) A fixed pointer will mark the corresponding GC object as "pinned" so that the GC knows not to move the object.
> d) When the fixed pointer is changed or deallocated, it will unpin the object, and pin any new object that it refers to.

While there is a fixed reference to the GC object, it is pinned. If that reference is rebound to another GC object, the original object is unpinned and the new one is pinned.

How to mark these is a difficult problem. On a 64-bit machine, I'd say you just use the most significant bit as a flag; you're not going to use petabytes of address space.

> The fixed pointer will have to know whether or not it points to GC memory so that it doesn't pin non-GC objects.  Using the first idea, we can determine at compile time whether a pointer points to the heap or not.

The fixed pointer will just stand there shouting "I am a fixed pointer! Look on me and despair!" And the garbage collector will look where it's pointing; if it is pointing at GC memory, the garbage collector will indeed look on it and despair. Otherwise, it will ignore the fixedness.

> Yes, this would be a big change, but not as big as const IMO.  I feel if any feature warrants breaking some code, it would be high-performance GC.  But maybe someone else can find a solution that doesn't break compatibility.
> 
> Thoughts?
> 
> -Craig
> 
> 
December 23, 2007
"Christopher Wright" <dhasenan@gmail.com> wrote in message news:fkkm0i$2oa9$1@digitalmars.com...
> Craig Black wrote:
>> 2) Adding language features that would allow for a moving GC.  A modern, moving GC would also be a huge performance win.  I think we would have a safety problem if we currently implemented a moving GC.  Languages that have moving GC greatly restrict what can be done with pointers.  We need to provide a syntax that will allow pointers to be used when memory is explicitly managed, but disallow pointers for GC memory.
>>
>> So, here's one idea for making D more safe for moving GC.
>>
>> a) Disallow overloading new and delete for classes, and make classes strictly for GC, perhaps with an exception for classes instantiated on the stack using scope.
>
> Don't see the point of this. You'd map a single old value to a single new value...or map an old range to a new one. You're changing one equality check and one assignment to two comparisons and an addition. And this is when you're looking through the entire address space of the program.

I'm not exactly sure what you are talking about, but you mention computation performed at run-time.  The concept here is that it will be a compile-time restriction.

The reason to disallow new and delete is to ensure that all instances of a class not instantiated using "scope" will be GC objects.  This gives the compiler the information necessary to enforce pointer assignment restrictions at compile-time.

>> b) Allow new and delete to work with structs, and allocate them on the malloc heap.  I would still want to be able to override new and delete for structs, specifically to be able to use nedmalloc.
>
> This can allow polymorphism for structs, actually, but it is a bit of a performance hit.

Yes, polymorphism for structs could be allowed.  I don't know why you think that would be a performance hit.  C++ structs and classes allow polymorphism, but do not take any performance hit or memory overhead when polymorphism is not used.  If polymorphism is used, it doesn't affect the performance of non-polymorphic functions, and only requires a pointer to be stored in each object in order to reference the vtable.

Maybe you think I am implying that ALL structs will be allocated on the malloc heap.  No, no, no.  I am suggesting that a struct could be allocated on the heap or on the stack.  How would the syntax look?  Structs allocated on the stack would retain the same syntax.  The ones allocated on the heap would be allocated with the new operator.  These could be referenced using pointers, or maybe some form of reference type.  But the reference types would need to be explicitly declared like.

struct A { A(int x) {} }

A a = A(1); // stack allocation
A *a = new A(1); // possible syntax for heap allocation
A &a = new A(1); // another possible syntax, I'm sure there are other ideas.

>> Then the compiler could disallow taking the address of a class field, since we know the resulting pointer would pointer to the GC heap.  Note that this would be a compile-time check, and so would not degrade run-time performance.
>
> Ugly.
>
> What do you do for taking the address of a class variable? Well, okay, you have to take the address of the reference; you can't take the address of the variable directly. The current method is ugly and undefined behavior:
> *cast(void**)&obj;
>
> And you can assume that all pointers that point to that region of memory have to be moved.
>
> The problem is granularity.
>
> class Foo {
>    Foo next;
>    size_t i, j, k, l, m, n, o, p;
> }
>
> Here, the current regime would mark *Foo as hasPointers. If i, j, k, l, m, n, o, or p just happened to look like a pointer, they'd be changed. You'd need to find where each object begins, then you'd need to go through the offset type info to see which elements are really pointers.
>
> Since you're running the garbage collector, that's doable, if the offset type info is currently available (I think it wasn't, last I checked, but I don't really recall).

I'm not sure you understand what I'm proposing.  What you are talking about is run-time information used by the garbage collecter.  I'm talking about a compile-time restriction.  No checking anything at run-time, and so no performance hit.  Maybe the confusion stems from the fact that I didn't describe in detail how this would work.  That's because I haven't thought it through yet.  But I'm confident that there is a good way this restriction could enforced at compile-time.

>> Another idea would be to be able to pin GC objects.  C# allows this via the fixed keyword.  In D, it could work like this:
>>
>> a) Preceding a pointer declaration with fixed would allow that pointer to take the address in the GC heap.
>> b) Pointer arithmetic would be disallowed for fixed pointers.
>
> Why?
>
> fixed float* four_floats = std.gc.malloc(4 * float.sizeof);
> fixed float* float_one = four_floats;
> fixed float* float_two = four_floats + 1;
> fixed float* float_three = four_floats + 2;
> fixed float* float_four = four_floats + 3;
>
> Seems fine to me. You might go beyond the allocated space, but that's already undefined behavior.

Ok, point taken.  Pointer arithmetic might be useful.  I'm just trying to make it as safe as possible, and maybe disallowing this is going too far. However, your above example could be implemented without pointer arithmetic using a static array.

>> c) A fixed pointer will mark the corresponding GC object as "pinned" so that the GC knows not to move the object.
>> d) When the fixed pointer is changed or deallocated, it will unpin the object, and pin any new object that it refers to.
>
> While there is a fixed reference to the GC object, it is pinned. If that reference is rebound to another GC object, the original object is unpinned and the new one is pinned.

Right.  Pointer is the wrong word.  Sorry.

> How to mark these is a difficult problem. On a 64-bit machine, I'd say you just use the most significant bit as a flag; you're not going to use petabytes of address space.

I'm not sure what the best way would be because I don't know a lot of details about D's GC.

>> The fixed pointer will have to know whether or not it points to GC memory so that it doesn't pin non-GC objects.  Using the first idea, we can determine at compile time whether a pointer points to the heap or not.
>
> The fixed pointer will just stand there shouting "I am a fixed pointer! Look on me and despair!" And the garbage collector will look where it's pointing; if it is pointing at GC memory, the garbage collector will indeed look on it and despair. Otherwise, it will ignore the fixedness.

Yes, that will work, but requires a run-time check (and a branch).  The run-time overhead for what you propose might end up being trivial, but I think it could be done at compile-time.

>> Yes, this would be a big change, but not as big as const IMO.  I feel if any feature warrants breaking some code, it would be high-performance GC. But maybe someone else can find a solution that doesn't break compatibility.
>>
>> Thoughts?
>>
>> -Craig
>> 
December 23, 2007
> struct A { A(int x) {} }

Sorry ... I use C++ a lot at work.  Should read:

struct A { this(int x) {} }

(Further, this code is based on the hypothesis that we may get struct ctors.) 

December 23, 2007
Christopher Wright wrote:
> While there is a fixed reference to the GC object, it is pinned. If that reference is rebound to another GC object, the original object is unpinned and the new one is pinned.
> 
> How to mark these is a difficult problem. On a 64-bit machine, I'd say you just use the most significant bit as a flag; you're not going to use petabytes of address space.

Since "fixedness" as proposed would be a compile-time property, and you already need metadata to find pointers to implement a moving GC, such a flag could be in that metadata instead of in the pointer itself. (The OffsetTypeInfo could say "there's a pointer at offset 8, of type Object, and it's fixed")
If run-time pinning is used instead (where whether the GC cell pointed to by a pointer is pinned is not known at compile time), it could be a simple (synchronized) counter that starts out at 0 for each memory cell, that's incremented when pinned and decremented when unpinned. The GC is then only allowed to move cells whose counter is 0.
December 23, 2007
Craig Black wrote:
> 
> "Christopher Wright" <dhasenan@gmail.com> wrote in message news:fkkm0i$2oa9$1@digitalmars.com...
>> Craig Black wrote:
>>> 2) Adding language features that would allow for a moving GC.  A modern, moving GC would also be a huge performance win.  I think we would have a safety problem if we currently implemented a moving GC.  Languages that have moving GC greatly restrict what can be done with pointers.  We need to provide a syntax that will allow pointers to be used when memory is explicitly managed, but disallow pointers for GC memory.
>>>
>>> So, here's one idea for making D more safe for moving GC.
>>>
>>> a) Disallow overloading new and delete for classes, and make classes strictly for GC, perhaps with an exception for classes instantiated on the stack using scope.
>>
>> Don't see the point of this. You'd map a single old value to a single new value...or map an old range to a new one. You're changing one equality check and one assignment to two comparisons and an addition. And this is when you're looking through the entire address space of the program.
> 
> I'm not exactly sure what you are talking about, but you mention computation performed at run-time.  The concept here is that it will be a compile-time restriction.
> 
> The reason to disallow new and delete is to ensure that all instances of a class not instantiated using "scope" will be GC objects.  This gives the compiler the information necessary to enforce pointer assignment restrictions at compile-time.

I misplaced the text and am now feeling stupid.

>>> b) Allow new and delete to work with structs, and allocate them on the malloc heap.  I would still want to be able to override new and delete for structs, specifically to be able to use nedmalloc.
>>
>> This can allow polymorphism for structs, actually, but it is a bit of a performance hit.
> 
> Yes, polymorphism for structs could be allowed.  I don't know why you think that would be a performance hit.  C++ structs and classes allow polymorphism, but do not take any performance hit or memory overhead when polymorphism is not used.  If polymorphism is used, it doesn't affect the performance of non-polymorphic functions, and only requires a pointer to be stored in each object in order to reference the vtable.

It requires you to store a struct by reference. Thus, performance hit.

> Maybe you think I am implying that ALL structs will be allocated on the malloc heap.  No, no, no.  I am suggesting that a struct could be allocated on the heap or on the stack.  How would the syntax look?  Structs allocated on the stack would retain the same syntax.  The ones allocated on the heap would be allocated with the new operator.  These could be referenced using pointers, or maybe some form of reference type.  But the reference types would need to be explicitly declared like.
> 
> struct A { A(int x) {} }
> 
> A a = A(1); // stack allocation
> A *a = new A(1); // possible syntax for heap allocation
> A &a = new A(1); // another possible syntax, I'm sure there are other ideas.
> 
>>> Then the compiler could disallow taking the address of a class field, since we know the resulting pointer would pointer to the GC heap.  Note that this would be a compile-time check, and so would not degrade run-time performance.
>>
>> Ugly.
>>
>> What do you do for taking the address of a class variable? Well, okay, you have to take the address of the reference; you can't take the address of the variable directly. The current method is ugly and undefined behavior:
>> *cast(void**)&obj;
>>
>> And you can assume that all pointers that point to that region of memory have to be moved.
>>
>> The problem is granularity.
>>
>> class Foo {
>>    Foo next;
>>    size_t i, j, k, l, m, n, o, p;
>> }
>>
>> Here, the current regime would mark *Foo as hasPointers. If i, j, k, l, m, n, o, or p just happened to look like a pointer, they'd be changed. You'd need to find where each object begins, then you'd need to go through the offset type info to see which elements are really pointers.
>>
>> Since you're running the garbage collector, that's doable, if the offset type info is currently available (I think it wasn't, last I checked, but I don't really recall).
> 
> I'm not sure you understand what I'm proposing.  What you are talking about is run-time information used by the garbage collecter.  I'm talking about a compile-time restriction.  No checking anything at run-time, and so no performance hit.  Maybe the confusion stems from the fact that I didn't describe in detail how this would work.  That's because I haven't thought it through yet.  But I'm confident that there is a good way this restriction could enforced at compile-time.

Okay, I swapped that section of text with the previous one that was out of place.

>>> The fixed pointer will have to know whether or not it points to GC memory so that it doesn't pin non-GC objects.  Using the first idea, we can determine at compile time whether a pointer points to the heap or not.
>>
>> The fixed pointer will just stand there shouting "I am a fixed pointer! Look on me and despair!" And the garbage collector will look where it's pointing; if it is pointing at GC memory, the garbage collector will indeed look on it and despair. Otherwise, it will ignore the fixedness.
> 
> Yes, that will work, but requires a run-time check (and a branch).  The run-time overhead for what you propose might end up being trivial, but I think it could be done at compile-time.

I'm not so sure. You'd have to make it undefined behavior to assign a non-fixed address to a fixed pointer. The reverse is fine, of course.

Since class references are pointers, you'd have to have the fixed storage class apply to them as well. Any reference type, really.
December 23, 2007
This is to fix the stuff I botched with my other reply.

Craig Black wrote:
> a) Disallow overloading new and delete for classes, and make classes strictly for GC, perhaps with an exception for classes instantiated on the stack using scope.

You are just making sure that the garbage collector is handling all memory that is associated with objects. I don't see a point to this. The collector won't try to move memory that it doesn't control.

You could do bad things with overloading new/delete, but those are hardly unique situations.

> Then the compiler could disallow taking the address of a class field,
> since we know the resulting pointer would pointer to the GC heap.
> Note that this would be a compile-time check, and so would not degrade
> run-time performance.

That's not necessary, since you can map a source range to a destination range. It would be a simplifying assumption that improves performance, by changing two comparisons and an addition for each pointer (plus one subtraction per move) to one comparison and one assignment for each pointer. But you're going through a large amount of memory, so that's not a serious concern, I think.

> a) Preceding a pointer declaration with fixed would allow that pointer to take the address in the GC heap.

It'd be undefined behavior to do otherwise. But safe as long as no collections happen before you use the pointer.

> 
> The fixed pointer will have to know whether or not it points to GC memory so that it doesn't pin non-GC objects.  Using the first idea, we can determine at compile time whether a pointer points to the heap or not.
> 
> Yes, this would be a big change, but not as big as const IMO.  I feel if any feature warrants breaking some code, it would be high-performance GC.  But maybe someone else can find a solution that doesn't break compatibility.
> 
> Thoughts?
> 
> -Craig
> 
> 
December 23, 2007
> It requires you to store a struct by reference. Thus, performance hit.

No it doesn't.  Structs will be able to be allocated on the stack, without any referencing.  As an OPTION, you will be able to store a struct by reference.  C++ does this very same thing and it is very efficient.

>> Yes, that will work, but requires a run-time check (and a branch).  The run-time overhead for what you propose might end up being trivial, but I think it could be done at compile-time.
>
> I'm not so sure. You'd have to make it undefined behavior to assign a non-fixed address to a fixed pointer. The reverse is fine, of course.
>
> Since class references are pointers, you'd have to have the fixed storage class apply to them as well. Any reference type, really.

Yes and all class fields would be fixed as well, unless the class object was instantiated using scope.  This means that when you take the address of them, it results in a fixed pointer. 

« First   ‹ Prev
1 2