April 24, 2006
"Dan" <Dan_member@pathlink.com> wrote in message news:e2im9o$29gi$1@digitaldaemon.com...
>
> Actually, having "if statements" is absolutely TERRIBLE for trying to
> improve
> branch prediction.  By it's very nature, if statements are branches, so
> you are
> causing at least one branch misprediction anyways.
>
> Furthermore, pointers are not O(1) algorithms.  What the computer is doing
> is
> performing a:
>
> LEA (reg) [(v_pointer)];
> J(cc)|JMP [(reg)];

Maybe you are just way over my head but I'm not quite following how this is anything bug O(1).

> My understanding is that the v_pointer address is known at compile time,
> however
> it implies:
>
> that the v_table's cache has to be loaded every time a function is called
> that extra lea buggers instruction pairing tagging on several cycles
> the potential that the v_table is a cache miss (highly unlikely a page
> miss)
>
> Interestingly however, if one were to prepend some Self Modifying Code to
> the
> program that instead loaded the vtable into a set of addresses attached to
> the
> vtable, then one could achieve the benefits of static functions and most
> of the
> benefits of virtual functions at the cost of having an extra void*.sizeof
> for
> every function call, and the SMC code overhead.]

OK, now you are definitely over my head. :)

> The problem is that *some* functions would thereafter be virtual so that a
> single pointer could be used to point to any of a set of functions - a
> case not
> easily rendered with static functions (but still possible remembering that
> all
> code is just data)
>
> That said, I think the overhead involved in using virtual functions is
> quite
> reasonable and comparable to the overhead in using 'calling conventions'
> versus
> a internally-optimized system with primarily register arguments.  It's 7-8
> clocks every function call with a little bit of cache overhead and the
> potential
> for a cache miss.
>
> Ultimately though, I think the GC causes the highest level of performance
> troubles on the benchmarks.  I love it to death though and wish someone
> would
> get their hands dirty making it smaller and sexier.

Or (truly) optional. :)  The notion that the GC is optional in D is really misleading.  Everything uses it.  While you could certainly make it faster at the expense of more complexity, I don't think that GC that scans the stack testing for non-null pointers will ever provide the performance of C's malloc and free.  GC is simply performing much more work.  I have heard many arguments to the contrary, but I still don't see how it is possible that GC could compete with malloc and free.

-Craig


April 24, 2006
In article <e2inmo$2bjp$1@digitaldaemon.com>, Craig Black says...
>
>"Dan" <Dan_member@pathlink.com> wrote in message news:e2im9o$29gi$1@digitaldaemon.com...
>>
>> Ultimately though, I think the GC causes the highest level of performance
>> troubles on the benchmarks.  I love it to death though and wish someone
>> would
>> get their hands dirty making it smaller and sexier.
>
>Or (truly) optional. :)  The notion that the GC is optional in D is really misleading.  Everything uses it.  While you could certainly make it faster at the expense of more complexity, I don't think that GC that scans the stack testing for non-null pointers will ever provide the performance of C's malloc and free.  GC is simply performing much more work.  I have heard many arguments to the contrary, but I still don't see how it is possible that GC could compete with malloc and free.
>


A bad GC will definitely make for bad performance. And any GC will be slower than an optimum malloc/free solution. However (and I’m still undecided on this one) I think the argument is that in any non trivial project, manually freeing all of the memory without any memory leaks or axing something to soon would require more overhead than using GC. The trade-off is not in the freeing time but in the can-I-free-this check.


April 24, 2006
"BCS" <BCS_member@pathlink.com> wrote in message news:e2isuc$2jmd$1@digitaldaemon.com...
> In article <e2inmo$2bjp$1@digitaldaemon.com>, Craig Black says...
>>
>>"Dan" <Dan_member@pathlink.com> wrote in message news:e2im9o$29gi$1@digitaldaemon.com...
>>>
>>> Ultimately though, I think the GC causes the highest level of
>>> performance
>>> troubles on the benchmarks.  I love it to death though and wish someone
>>> would
>>> get their hands dirty making it smaller and sexier.
>>
>>Or (truly) optional. :)  The notion that the GC is optional in D is really
>>misleading.  Everything uses it.  While you could certainly make it faster
>>at the expense of more complexity, I don't think that GC that scans the
>>stack testing for non-null pointers will ever provide the performance of
>>C's
>>malloc and free.  GC is simply performing much more work.  I have heard
>>many
>>arguments to the contrary, but I still don't see how it is possible that
>>GC
>>could compete with malloc and free.
>>
>
>
> A bad GC will definitely make for bad performance. And any GC will be
> slower
> than an optimum malloc/free solution. However (and I'm still undecided on
> this
> one) I think the argument is that in any non trivial project, manually
> freeing
> all of the memory without any memory leaks or axing something to soon
> would
> require more overhead than using GC. The trade-off is not in the freeing
> time
> but in the can-I-free-this check.

I have heard similar.  "The code required to manually free everything outweighs the overhead of automatic GC."  Unless you are doing something rediculous in your code when freeing heap objects, this statement is false. Manual freeing does not require gobs of code or eat up lots of processor cycles.  Typically it involves testing a pointer to see if it is null, calling free() if it is not, and then setting it to null.  How could this ever be slower than scanning the entire stack for non-null pointers?  GC is simply doing way more work.  IMO this statement is simply obsurd.

The benefit of GC is not performance, but convenience.  If you want the convenience of not having to free memory manually then use GC.  However if you want performance, use malloc and free (new and delete).

That is why GC should be completely optional, not just on a per class basis. If I want to completely eliminate GC from my application, I should be able to do so and still be able to make use of any standard library capabilities.

-Craig


April 24, 2006
BCS wrote:
> 
> A bad GC will definitely make for bad performance. And any GC will be slower
> than an optimum malloc/free solution. However (and I’m still undecided on this
> one) I think the argument is that in any non trivial project, manually freeing
> all of the memory without any memory leaks or axing something to soon would
> require more overhead than using GC. The trade-off is not in the freeing time
> but in the can-I-free-this check.

I believe this is indeed the common argument.  In C++, smart pointers have become a very popular way to manage memory, and while smart pointers are technically a garbage collection strategy, I think they aren't considered as such here.  In any case, the cost comes from updating the reference count, not from the malloc/free, as the updates must typically be performed atomically.  That said, an accurate performance comparison between GC and manual memory management is extremely difficult because implementation details and use strategies have a tremendous impact on results.  For anyone interested, there's been an extensive argu- um, dialog on the topic recently on comp.lang.c++.moderated titled "Can GC be benifical":

http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/84253d37f970dd2b/#


Sean
April 24, 2006
Craig Black wrote:
> "BCS" <BCS_member@pathlink.com> wrote in message news:e2isuc$2jmd$1@digitaldaemon.com...
> 
>>In article <e2inmo$2bjp$1@digitaldaemon.com>, Craig Black says...
>>
>>>"Dan" <Dan_member@pathlink.com> wrote in message
>>>news:e2im9o$29gi$1@digitaldaemon.com...
>>>
>>>>Ultimately though, I think the GC causes the highest level of performance
>>>>troubles on the benchmarks.  I love it to death though and wish someone
>>>>would
>>>>get their hands dirty making it smaller and sexier.
>>>
>>>Or (truly) optional. :)  The notion that the GC is optional in D is really
>>>misleading.  Everything uses it.  While you could certainly make it faster
>>>at the expense of more complexity, I don't think that GC that scans the
>>>stack testing for non-null pointers will ever provide the performance of C's
>>>malloc and free.  GC is simply performing much more work.  I have heard many
>>>arguments to the contrary, but I still don't see how it is possible that GC
>>>could compete with malloc and free.
>>>
>>
>>
>>A bad GC will definitely make for bad performance. And any GC will be slower
>>than an optimum malloc/free solution. However (and I'm still undecided on this
>>one) I think the argument is that in any non trivial project, manually freeing
>>all of the memory without any memory leaks or axing something to soon would
>>require more overhead than using GC. The trade-off is not in the freeing time
>>but in the can-I-free-this check.
> 
> 
> I have heard similar.  "The code required to manually free everything outweighs the overhead of automatic GC."  Unless you are doing something rediculous in your code when freeing heap objects, this statement is false. Manual freeing does not require gobs of code or eat up lots of processor cycles.  Typically it involves testing a pointer to see if it is null, calling free() if it is not, and then setting it to null.  How could this ever be slower than scanning the entire stack for non-null pointers?  GC is simply doing way more work.  IMO this statement is simply obsurd.
> 
> The benefit of GC is not performance, but convenience.  If you want the convenience of not having to free memory manually then use GC.  However if you want performance, use malloc and free (new and delete).
> 
> That is why GC should be completely optional, not just on a per class basis. If I want to completely eliminate GC from my application, I should be able to do so and still be able to make use of any standard library capabilities.
> 
> -Craig 
> 
> 


The nice thing about D, as a language, is that it supports both approaches. The bad thing, for your needs, is that phobos is tightly bound to the GC. My particular gripes are with the GC activity caused via utf conversions, std.string and so on. There's a surprising, perhaps troublesome, amount of GC activity generated from within phobos itself. This is one of the reasons alternative libraries exist.
April 25, 2006
> The nice thing about D, as a language, is that it supports both approaches. The bad thing, for your needs, is that phobos is tightly bound to the GC. My particular gripes are with the GC activity caused via utf conversions, std.string and so on. There's a surprising, perhaps troublesome, amount of GC activity generated from within phobos itself. This is one of the reasons alternative libraries exist.

Correct.  But since the language core relies on Phobos I don't think that there is a reasonable way to completely decouple D from GC.  To me this situation is quite ugly.  It would be nice if there was a way to replace Phobos completely.  After writing standard libraries that don't rely on GC, we could easily produce D non-GC vs. D GC benchmarks.

Some steps would have to be taken in order to keep some source compatibility between the two modes, so that it would't turn into two completely different languages.  Manual memory management code could be versioned out when compiled in GC mode.

-Craig


April 25, 2006
Craig Black wrote:
> "BCS" <BCS_member@pathlink.com> wrote in message news:e2isuc$2jmd$1@digitaldaemon.com...
>> In article <e2inmo$2bjp$1@digitaldaemon.com>, Craig Black says...
>>> "Dan" <Dan_member@pathlink.com> wrote in message
>>> news:e2im9o$29gi$1@digitaldaemon.com...
>>>> Ultimately though, I think the GC causes the highest level of performance
>>>> troubles on the benchmarks.  I love it to death though and wish someone
>>>> would
>>>> get their hands dirty making it smaller and sexier.
>>> Or (truly) optional. :)  The notion that the GC is optional in D is really
>>> misleading.  Everything uses it.  While you could certainly make it faster
>>> at the expense of more complexity, I don't think that GC that scans the
>>> stack testing for non-null pointers will ever provide the performance of C's
>>> malloc and free.  GC is simply performing much more work.  I have heard many
>>> arguments to the contrary, but I still don't see how it is possible that GC
>>> could compete with malloc and free.
>>>
>>
>> A bad GC will definitely make for bad performance. And any GC will be slower
>> than an optimum malloc/free solution. However (and I'm still undecided on this
>> one) I think the argument is that in any non trivial project, manually freeing
>> all of the memory without any memory leaks or axing something to soon would
>> require more overhead than using GC. The trade-off is not in the freeing time
>> but in the can-I-free-this check.
> 
> I have heard similar.  "The code required to manually free everything outweighs the overhead of automatic GC."  Unless you are doing something rediculous in your code when freeing heap objects, this statement is false. Manual freeing does not require gobs of code or eat up lots of processor cycles.  Typically it involves testing a pointer to see if it is null, calling free() if it is not, and then setting it to null.  How could this ever be slower than scanning the entire stack for non-null pointers?  GC is simply doing way more work.  IMO this statement is simply obsurd.

There's more to it than that. One of the big benefits of GC is that fewer allocations are necessary - and a sure way to better allocation performance is to do less of them. The reason fewer allocations are necessary is that the normal behavior in C/C++, when faced with two pointers to the same object, is to copy the object. That way, there's no grief with worrying about freeing the object that someone else points to.

To see this effect at work, see www.digitalmars.com/d/cppstrings.html.

This problem is bad enough that the notion of shared_ptr<> has appeared. shared_ptr<> implements reference counting to eliminate the need for extra copies. But it has its own overhead problems:

1) for every allocation, shared_ptr<> allocates an *extra* block of memory to hold its accounting information. So you've got, out of the box, twice as many allocations/deallocations happening.

2) for every copy of the shared_ptr<>, like passing one to a function, returning one, etc., the reference count has to be incremented, then decremented. The inc/dec must be done in an exception safe manner, i.e. they need to be unwindable. Even worse, for multithreaded apps these inc/dec's need to be synchronized.

And even with all that falderol, shared_ptr<> still can't do slicing, which are an important efficiency improvement.
April 25, 2006
"Walter Bright" <newshound@digitalmars.com> wrote in message news:e2k12i$19bi$1@digitaldaemon.com...
> Craig Black wrote:
>> "BCS" <BCS_member@pathlink.com> wrote in message news:e2isuc$2jmd$1@digitaldaemon.com...
>>> In article <e2inmo$2bjp$1@digitaldaemon.com>, Craig Black says...
>>>> "Dan" <Dan_member@pathlink.com> wrote in message news:e2im9o$29gi$1@digitaldaemon.com...
>>>>> Ultimately though, I think the GC causes the highest level of
>>>>> performance
>>>>> troubles on the benchmarks.  I love it to death though and wish
>>>>> someone
>>>>> would
>>>>> get their hands dirty making it smaller and sexier.
>>>> Or (truly) optional. :)  The notion that the GC is optional in D is
>>>> really
>>>> misleading.  Everything uses it.  While you could certainly make it
>>>> faster
>>>> at the expense of more complexity, I don't think that GC that scans the
>>>> stack testing for non-null pointers will ever provide the performance
>>>> of C's
>>>> malloc and free.  GC is simply performing much more work.  I have heard
>>>> many
>>>> arguments to the contrary, but I still don't see how it is possible
>>>> that GC
>>>> could compete with malloc and free.
>>>>
>>>
>>> A bad GC will definitely make for bad performance. And any GC will be
>>> slower
>>> than an optimum malloc/free solution. However (and I'm still undecided
>>> on this
>>> one) I think the argument is that in any non trivial project, manually
>>> freeing
>>> all of the memory without any memory leaks or axing something to soon
>>> would
>>> require more overhead than using GC. The trade-off is not in the freeing
>>> time
>>> but in the can-I-free-this check.
>>
>> I have heard similar.  "The code required to manually free everything outweighs the overhead of automatic GC."  Unless you are doing something rediculous in your code when freeing heap objects, this statement is false. Manual freeing does not require gobs of code or eat up lots of processor cycles.  Typically it involves testing a pointer to see if it is null, calling free() if it is not, and then setting it to null.  How could this ever be slower than scanning the entire stack for non-null pointers?  GC is simply doing way more work.  IMO this statement is simply obsurd.
>
> There's more to it than that. One of the big benefits of GC is that fewer allocations are necessary - and a sure way to better allocation performance is to do less of them. The reason fewer allocations are necessary is that the normal behavior in C/C++, when faced with two pointers to the same object, is to copy the object. That way, there's no grief with worrying about freeing the object that someone else points to.
>
> To see this effect at work, see www.digitalmars.com/d/cppstrings.html.
>
> This problem is bad enough that the notion of shared_ptr<> has appeared. shared_ptr<> implements reference counting to eliminate the need for extra copies. But it has its own overhead problems:
>
> 1) for every allocation, shared_ptr<> allocates an *extra* block of memory to hold its accounting information. So you've got, out of the box, twice as many allocations/deallocations happening.
>
> 2) for every copy of the shared_ptr<>, like passing one to a function, returning one, etc., the reference count has to be incremented, then decremented. The inc/dec must be done in an exception safe manner, i.e. they need to be unwindable. Even worse, for multithreaded apps these inc/dec's need to be synchronized.
>
> And even with all that falderol, shared_ptr<> still can't do slicing, which are an important efficiency improvement.

Hmmm.  I have glanced at the word count example many times but was not aware of why it was faster in D until now.  Is it really the GC paradigm that allows this performance?  I think I'm beginning to finally see some method to the madness.  Perhaps I need to rethink my stance on GC a little.

It would seem then that GC can come out on top if there is a lot of sharing of allocated memory e.g. multiple references to each object.  However, if you are certain that there will be only a single reference to each object, manual memory management should always outperform GC.  Am I on the right track here?

-Craig


April 25, 2006
Craig Black wrote:
> Hmmm.  I have glanced at the word count example many times but was not aware of why it was faster in D until now.  Is it really the GC paradigm that allows this performance?

Yes. It turns out to be very difficult for C++ to match it.

> I think I'm beginning to finally see some method to the madness.  Perhaps I need to rethink my stance on GC a little.
> 
> It would seem then that GC can come out on top if there is a lot of sharing of allocated memory e.g. multiple references to each object.  However, if you are certain that there will be only a single reference to each object, manual memory management should always outperform GC.  Am I on the right track here?

Even in that case, gc can win. For example, for explicit memory allocation in the presence of exception handling, one has to carefully set up handlers that will delete any temporaries if an exception is thrown:

	S *p = new S();
	p->a = foo();
	bar(p);
	delete p;

What if foo() throws an exception? We've got a memory leak in p, so we've got to add code to fix that. Dealing with that is an endless source of bugs and tedium in C++. With gc, nothing needs to be added.

Another advantage of gc (that isn't implemented, but could be, in D's gc) is that allocated memory can be compacted. Compacting existing allocated memory means that allocating a new object is as easy as bumping a pointer, whereas the usual malloc algorithm requires searching a free list for a best match.

So, the obvious question then becomes, if gc is faster, why is Java slower than C++? (I know, lots of Java people will claim it really isn't slower <g>.) The reason is that Java's design requires a lot more objects to be allocated than in C++ - for example, you cannot aggregate arrays or other objects inside other objects, they have to be allocated separately. More objects allocated means lower performance.
April 25, 2006
Craig Black wrote:
>>The nice thing about D, as a language, is that it supports both approaches. The bad thing, for your needs, is that phobos is tightly bound to the GC. My particular gripes are with the GC activity caused via utf conversions, std.string and so on. There's a surprising, perhaps troublesome, amount of GC activity generated from within phobos itself. This is one of the reasons alternative libraries exist.
> 
> 
> Correct.  But since the language core relies on Phobos I don't think that there is a reasonable way to completely decouple D from GC.  To me this situation is quite ugly.  It would be nice if there was a way to replace Phobos completely.


That's really quite easy, Craig; start with Ares, and then apply whatever you need on top. For example, Ares and Mango go together really well, and both make a point about not allocating memory where there's a sensible alternative. In fact, the Mango HttpServer does not touch the GC even once per service request, once it warms up -- all hail to the groovy array-slice, and behold the power of the trout-slap :)

(kudos to IRC #d)

There again, I suspect fear of the GC is somewhat overplayed; although it certainly troubles me when it is abused. Thus, it doesn't bother me at all that Ares kinda' needs a GC also. Perhaps it's more a question of libraries, and code in general, treating it with a little respect :)

After all, using any kind of malloc() should perhaps be a question-mark for high-performance code

- Kris