April 28, 2011
On 28.04.2011 12:46, Daniel Gibson wrote:

> ... so IMHO it makes sense for D to have the same syntax as the majority of these languages for the same feature (if that syntax doesn't have big enough drawbacks like <> for templates).

  That's my point too. And especially this is true when some features are already there for long time.

/Alexander
April 28, 2011
On Wed, 27 Apr 2011 18:15:26 -0400, Alexander <aldem+dmars@nk7.net> wrote:

> On 27.04.2011 22:42, Steven Schveighoffer wrote:
>
>> For non-garbage-collected languages, yes.  For GC languages, delete is to be discouraged (that is what the GC is for).
>
>   delete() is 99% of the cases O(1) operation (thanks to free lists),

Not exactly, it still needs to update the metadata for the block, which involves a binary search (O(lgn)) + you have to take the global lock, which means you could be making your multi-threaded application worse than a single-threaded one.  I have seen this in practice when one uses memory allocation or deallocation frequently.

clear is definitely O(1) (well, as much as you thought delete was O(1), it depends on the dtor).

> while invocation of GC is O(?) (no one knows how many objects are pending deallocation, and when exactly it will be invoked).

The point is that it runs infrequently enough to avoid hindering performance.  D's GC is pretty horrible at performance, so there are definitely improvements yet to be seen there (David recently fixed some really bad ones, the next version of dmd should be much faster at allocation).

Yes, if you run the collector every time you are done using an object, performance will be much worse than using delete to destroy the object, but that is also not encouraged.

>   I agree that in normal applications (mostly) this is rarely an issue, but there are not normal applications, which I mentioned previously - RT & OS, and some others.

As mentioned, RT applications and OS kernels would need a specialized runtime, D does not support that with the current GC/runtime.  This does not mean that you would be using delete for those, you'd probably use a specialized runtime function using that specialized runtime.

>   Additionally, memory management hooks supported by the compiler are faster than any other solution (templates & co).

Absolutely untrue.  Runtime hooks cannot be inlined for instance.

>> Java and C# code do not have much use for delete either
>
>   What about those coming from C++ and D1 (including D2 up to this point)?

C++ users will have to adjust to an environment that provides a GC.  D1/D2 users should use delete sparingly anyways.  Migrating those few cases to use a new method shouldn't be too difficult.

>   But, actually, I am really interested in only one thing... I agree, that some may feel discomfort using delete, but what is the reason to remove it from the language? Probably, I've missed something, but why not to leave it as is for those who need it?

I think (and I'm not sure, Andrei was the one behind this) that the issue is that a keyword operation is very simple/easy to write, and deleting memory should be discouraged.  Having a dedicated keyword makes it seem sanctioned and promoted by the language, when in fact, it should almost never be used.

-Steve
April 28, 2011
On 28.04.2011 16:52, Steven Schveighoffer wrote:

>>   delete() is 99% of the cases O(1) operation (thanks to free lists),
> 
> Not exactly, it still needs to update the metadata for the block, which involves a binary search (O(lgn)) + you have to take the global lock, which means you could be making your multi-threaded application worse than a single-threaded one.

  Well, I was assuming that metadata is attached to the block - there is no need to search. Of course, it depends on implementation.

  OTOH, when there are deletes performed in a bunch (+ time to scan the data for references), it could really take time. Taking a lock, IMHO, is a bit less expensive comparing to pausing all threads. But I may be wrong...

> The point is that it runs infrequently enough to avoid hindering performance.

  Then there is another catch lurking around - the less frequently it runs, the more objects will be collected, so performance is hindered even more.

> D's GC is pretty horrible at performance, so there are definitely improvements yet to be seen there (David recently fixed some really bad ones, the next version of dmd
> should be much faster at allocation).

  Is it possible to adapt (or use directly) Boehm's GC? Just an idea...

> As mentioned, RT applications and OS kernels would need a specialized runtime, D does not support that with the current GC/runtime.  This does not mean that you would be using delete for those, you'd probably use a specialized runtime function using
> that specialized runtime.

  Right, but using "delete" is a bit more "natural". IMHO, of course :)

>>   Additionally, memory management hooks supported by the compiler are faster than any other solution (templates & co).
> 
> Absolutely untrue.  Runtime hooks cannot be inlined for instance.

  Unless compiler *knows* that there are hooks. Some believe that compiler must not handle anything specifically, but, to be really effective, this is difficult to avoid. Intrinsic functions are very useful.

> Having a dedicated keyword makes it seem sanctioned and promoted by the language, when in fact, it should almost never be used.

  D has some features which are discouraged anyway (like pointers & co.), but - D is a tool, and I believe that decision how to use a tool should be left to the user, that's all :)

/Alexander
April 28, 2011
On Thu, 28 Apr 2011 11:11:41 -0400, Alexander <aldem+dmars@nk7.net> wrote:

> On 28.04.2011 16:52, Steven Schveighoffer wrote:
>
>>>   delete() is 99% of the cases O(1) operation (thanks to free lists),
>>
>> Not exactly, it still needs to update the metadata for the block, which involves a binary search (O(lgn)) + you have to take the global lock, which means you could be making your multi-threaded application worse than a single-threaded one.
>
>   Well, I was assuming that metadata is attached to the block - there is no need to search. Of course, it depends on implementation.

This is not how it's implemented.  The metadata is kept in a separate page from the memory itself.

>
>   OTOH, when there are deletes performed in a bunch (+ time to scan the data for references), it could really take time. Taking a lock, IMHO, is a bit less expensive comparing to pausing all threads. But I may be wrong...

The point is, you only do it once in a while.  If you delete objects 100 times a second, and the GC runs only every 10 seconds, then you have to consider the cost of pausing all threads and running a collection cycle against 10,000 times taking the lock.

IMO, the GC runs too often in D, but we can improve that metric.

>> The point is that it runs infrequently enough to avoid hindering performance.
>
>   Then there is another catch lurking around - the less frequently it runs, the more objects will be collected, so performance is hindered even more.

Not really.  The more objects there are to collect, the less scanning needs to be done.  You only process blocks that have references to them.

There is also a certain amount of overhead that runs regardless of how many objects are ready to be collected.

>
>> D's GC is pretty horrible at performance, so there are definitely improvements yet to be seen there (David recently fixed some really bad ones, the next version of dmd
>> should be much faster at allocation).
>
>   Is it possible to adapt (or use directly) Boehm's GC? Just an idea...

Yes, the GC is swappable.  You need to re-compile the runtime with a different GC, all the hooks are movable.  In fact, there's a GC stub that just calls C malloc for all allocations.

>> As mentioned, RT applications and OS kernels would need a specialized runtime, D does not support that with the current GC/runtime.  This does not mean that you would be using delete for those, you'd probably use a specialized runtime function using
>> that specialized runtime.
>
>   Right, but using "delete" is a bit more "natural". IMHO, of course :)

For some, free might be more "natural" :)  I think you can find another term to free data that is not a keyword easily enough.

>>>   Additionally, memory management hooks supported by the compiler are faster than any other solution (templates & co).
>>
>> Absolutely untrue.  Runtime hooks cannot be inlined for instance.
>
>   Unless compiler *knows* that there are hooks. Some believe that compiler must not handle anything specifically, but, to be really effective, this is difficult to avoid. Intrinsic functions are very useful.

The compiler cannot possibly make any assumptions about how memory management is done.  It must dutifully call the runtime function, or else you could not implement interesting things in the GC.

>> Having a dedicated keyword makes it seem sanctioned and promoted by the language, when in fact, it should almost never be used.
>
>   D has some features which are discouraged anyway (like pointers & co.), but - D is a tool, and I believe that decision how to use a tool should be left to the user, that's all :)

But delete doesn't allow you to do things that you can't already do otherwise.  Delete is not going away without replacing it with an equivalent functionality.  Pointers and casting are a necessary evil for optimizing code and forcing things in the type system.  There is no way to implement pointers and casting in the runtime.

-Steve
April 28, 2011
On 28.04.2011 17:32, Steven Schveighoffer wrote:

> This is not how it's implemented.  The metadata is kept in a separate page from the memory itself.

  Huh? Could you please elaborate, what do you mean by "metadata" and why it has to be kept on a separate page?

  I've seen at least one GC implementation with free-lists and metadata preceding allocated block, so even most of the allocations took O(1) time (unless there was a shortage in memory).

> Not really.  The more objects there are to collect, the less scanning needs to be done.  You only process blocks that have references to them.

  Theoretically, any heap/stack-allocated block may have references. Stack scanning is relatively cheap (we know its current size, and usually it is small), but number of objects on the heap may be quite big, and heap itself as well. Or do I
mistunderstood something?

/Alexander
April 28, 2011
On Thu, 28 Apr 2011 11:57:00 -0400, Alexander <aldem+dmars@nk7.net> wrote:

> On 28.04.2011 17:32, Steven Schveighoffer wrote:
>
>> This is not how it's implemented.  The metadata is kept in a separate page from the memory itself.
>
>   Huh? Could you please elaborate, what do you mean by "metadata" and why it has to be kept on a separate page?

Flags for the memory, such as whether it is free, or whether it's reachable (during a scan), whether it has pointers, etc.

Why it is kept on a separate page, I'm not sure, I didn't write that.  It would seem making the metadata be a constant offset from the data page would be better, but of course, it cannot be that way for multi-page blocks.  If you have ideas on how to improve the performance, I encourage you to learn how the GC works and submit some patches!  It actually can be pretty fun.

>   I've seen at least one GC implementation with free-lists and metadata preceding allocated block, so even most of the allocations took O(1) time (unless there was a shortage in memory).

Allocations are going to be less, because the memory is hopefully in a free list, and hopefully the metadata is pointed at by the free list (not sure about current GC implementation).

Yes, the allocation and free performance could be improved, but it doesn't change the fact that delete manually is not a huge performance gain, if at all.  If you want to gain performance, don't use the GC at all (for example scope classes), or use a custom allocator that is tailored to your needs.

>> Not really.  The more objects there are to collect, the less scanning needs to be done.  You only process blocks that have references to them.
>
>   Theoretically, any heap/stack-allocated block may have references. Stack scanning is relatively cheap (we know its current size, and usually it is small), but number of objects on the heap may be quite big, and heap itself as well. Or do I
> mistunderstood something?

If an object is to be collected, there will be no references to that object.  This means you do not have to scan that object's memory to see if it points to something else.  Less memory needs to be scanned.

You start from your roots, and then start scanning from there.  Your roots are your thread local storage, stack space, globals, and roots added manually.

-Steve
April 28, 2011
Steven Schveighoffer wrote:
> Yes, the allocation and free performance could be improved, but it doesn't change the fact that delete manually is not a huge performance gain, if at all.

My benchmark shows that in the current implementation manual deletion of GC memory is at least 15 times faster than leaving it up to the GC. (better cache behavior of the data unmeasured)

> If you want to gain performance, don't use the GC at all (for
> example scope classes), or use a custom allocator that is tailored to your
> needs.

Agreed. But writing a very well performing custom allocator that scales well with the application is a not-exactly-trivial task.
April 28, 2011
On 4/28/11 2:27 PM, Timon Gehr wrote:
> Steven Schveighoffer wrote:
>> Yes, the allocation and free performance could be improved, but it doesn't
>> change the fact that delete manually is not a huge performance gain, if at
>> all.
>
> My benchmark shows that in the current implementation manual deletion of GC memory
> is at least 15 times faster than leaving it up to the GC. (better cache behavior
> of the data unmeasured)

Thanks for the data point. Another good one would be comparing new/delete with malloc/free.

Andrei
April 28, 2011
On Thu, 28 Apr 2011 15:27:16 -0400, Timon Gehr <timon.gehr@gmx.ch> wrote:

> Steven Schveighoffer wrote:
>> Yes, the allocation and free performance could be improved, but it doesn't
>> change the fact that delete manually is not a huge performance gain, if at
>> all.
>
> My benchmark shows that in the current implementation manual deletion of GC memory
> is at least 15 times faster than leaving it up to the GC. (better cache behavior
> of the data unmeasured)

I would need to see what your benchmark is before I'd say it was conclusive.

>
>> If you want to gain performance, don't use the GC at all (for
>> example scope classes), or use a custom allocator that is tailored to your
>> needs.
>
> Agreed. But writing a very well performing custom allocator that scales well with
> the application is a not-exactly-trivial task.

True.  Generally, GC has good enough performance for most applications.  But again, all of this is somewhat moot since delete isn't exactly going away, it's just going away as a keyword.

-Steve
April 28, 2011
Steven Schveighoffer wrote:
> I would need to see what your benchmark is before I'd say it was conclusive.

Ok. Version with delete is GC_hints (delete does not give many semantic guarantees, ergo it is a hint), version without delete is GC. It is also attached for convenience.

This benchmark tests quite high loads on the GC heap, so for usual applications the GC won't perform all that bad. But then again, usual applications don't come with a memory allocation bottleneck.

@Andrei: The benchmark also includes a comparison to malloc/free (version c_heap).

Timon


/* Benchmark GC vs GC & hints vs freelist vs C Heap
 * My measurements on Intel Core Duo 2x2.4 GhZ on Ubuntu 10.04 (lucid) Amd64
 * version=GC:       1m45s
 * version=GC_hints: ~6.8s
 * //using custom allocators:
 * version=freelist: ~3.7s
 * version=c_heap:   ~4.5s
 */


import std.stdio;
import std.c.stdlib;
import std.c.string;
import std.algorithm:swap;


version=GC;//default
//version=GC_hints;
//version=freelist;
//version=c_heap;


version(freelist) version=calldel;
version(GC_hints) version=calldel;
version(c_heap) version=calldel;

class foo{
	char data[1020]=void;
	static foo freelist;
	foo next;
	version(freelist){
		new(size_t s){
			if(freelist !is null){
				void *r=cast(void*)freelist;
				freelist=freelist.next;
				return r;
			}
			return cast(void*)new char[s];//could use malloc(s); effects are similar
		}
		delete(void *p){
			if(p){
				foo oldfl=freelist;
				freelist=cast(foo)p;
				freelist.next=oldfl;
			}
		}
	}
	version(c_heap){
		new(size_t s){return malloc(s);}
		delete(void *p){if(p) free(p);}
	}
};

foo[100000] a;

void main(){
	srand(100);
	int top;
	int dir;//0: expected growing, 1: expected shrinking
	foreach(t;0..10000000){
		if(!top){
			a[top++]=new foo;
			dir=0;
		}else if(top==100000){
			top--;
			version(calldel) delete a[top];
			dir=1;
		}else if(dir^!(rand()%3)){
			top--;
			version(calldel) delete a[top];
		}else a[top++]=new foo;
		if(!rand()%100) for(int i=0;i<100;i++) swap(a[0],a[rand()%top]);//mess around
	}
}