April 25, 2006
In article <e2k12i$19bi$1@digitaldaemon.com>, Walter Bright says...
>
>1) for every allocation, shared_ptr<> allocates an *extra* block of memory to hold its accounting information. So you've got, out of the box, twice as many allocations/deallocations happening.

Not true for intrusive refcounting. (This isn't always feasible if you're dealing with third-party classes, which is why shared_ptr isn't intrusive, but if you _can_ use it it's very close to ideal.)

>2) for every copy of the shared_ptr<>, like passing one to a function, returning one, etc., the reference count has to be incremented, then decremented.

Again, not true for intrusive refcounting. You can assign an intrusive smartpointer from a raw pointer or reference, so most of the time you can use those to pass/return. Refcount twiddling is only needed when changing an "owning" pointer, plus the rare "safety" case where you have a raw pointer but are calling code which might drop the last "real" owning ref.

>And even with all that falderol, shared_ptr<> still can't do slicing, which are an important efficiency improvement.

Agreed, but couldn't you do something very similar with a simple ptr class containing a pointer to the (refcounted) base string plus a couple of range end indices? That's worked well for me in the past, but there may be benefits of D slices that I'm missing.

[And in a subsequent posting]

> Another advantage of gc (that isn't implemented, but could be, in D's gc) is that allocated memory can be compacted.

Has there ever been a language that successfully combined compacting GC with raw pointers? How would this work? And it's going to be tough to drop raw pointers without breaking ease-of-use for C libraries.

cheers
Mike


April 25, 2006
"Walter Bright" <newshound@digitalmars.com> wrote in message news:e2kffd$1rl5$1@digitaldaemon.com...
> Craig Black wrote:
>> Hmmm.  I have glanced at the word count example many times but was not aware of why it was faster in D until now.  Is it really the GC paradigm that allows this performance?
>
> Yes. It turns out to be very difficult for C++ to match it.
>
>> I think I'm beginning to finally see some method to the madness.  Perhaps I need to rethink my stance on GC a little.
>>
>> It would seem then that GC can come out on top if there is a lot of sharing of allocated memory e.g. multiple references to each object. However, if you are certain that there will be only a single reference to each object, manual memory management should always outperform GC.  Am I on the right track here?
>
> Even in that case, gc can win. For example, for explicit memory allocation in the presence of exception handling, one has to carefully set up handlers that will delete any temporaries if an exception is thrown:
>
> S *p = new S();
> p->a = foo();
> bar(p);
> delete p;
>
> What if foo() throws an exception? We've got a memory leak in p, so we've got to add code to fix that. Dealing with that is an endless source of bugs and tedium in C++. With gc, nothing needs to be added.

I don't see a performance issue here.  Nor do think this would be terribly difficult to solve in D, especially with the fancy-shmancy new scope() features, as long as scope(exit) works properly with exceptions.  This would, however, pose a problem for C++.

> Another advantage of gc (that isn't implemented, but could be, in D's gc) is that allocated memory can be compacted. Compacting existing allocated memory means that allocating a new object is as easy as bumping a pointer, whereas the usual malloc algorithm requires searching a free list for a best match.

I can see the obvious performance improvement here, but I am skeptical of whether it can actually be done with D, given it's liberal syntax with raw pointers.  Seems to me that if the GC started moving pointers around, the compiler would have to enforce more safety rules with regards to pointers.

> So, the obvious question then becomes, if gc is faster, why is Java slower than C++? (I know, lots of Java people will claim it really isn't slower <g>.) The reason is that Java's design requires a lot more objects to be allocated than in C++ - for example, you cannot aggregate arrays or other objects inside other objects, they have to be allocated separately. More objects allocated means lower performance.

Well, I think you can do this with C# since it supports stack objects. Even still, C# is typically slower than Java (not by much though).  I confess that I don't know all the technical details with VM's, but I have run my own benchmarks.  VM-based code is just way slower than native code.  Can't really tell you exactly why though because I don't really know or care to know what goes on inside a VM. (Except for the purpose of explaining to people why it is so much slower.)

-Craig


April 25, 2006
> Another advantage of gc (that isn't implemented, but could be, in D's gc) is that allocated memory can be compacted. Compacting existing allocated memory means that allocating a new object is as easy as bumping a pointer, whereas the usual malloc algorithm requires searching a free list for a best match.

It is also worth mentioning that compacting can be performed without scanning the stack.  Manual memory management can still be used along with a compacting allocator.  An compacting allocator would certainly outperform CRT malloc and free.  There is a tradeoff here because you can no longer use petal-to-the-metal algorithms that make use of raw pointers and pointer arithmetic, or if you do so, you would have to turn the allocator off temporarily to ensure safety.

-Craig


April 25, 2006
"Mike Capp" <mike.capp@gmail.com> wrote in message news:e2l4g1$2n5s$1@digitaldaemon.com...
> In article <e2k12i$19bi$1@digitaldaemon.com>, Walter Bright says...
>>
>>1) for every allocation, shared_ptr<> allocates an *extra* block of memory to hold its accounting information. So you've got, out of the box, twice as many allocations/deallocations happening.
>
> Not true for intrusive refcounting. (This isn't always feasible if you're
> dealing with third-party classes, which is why shared_ptr isn't intrusive,
> but
> if you _can_ use it it's very close to ideal.)

What's intrusive refcounting?  Please use small words. My brain is almost full right now. :)

>>2) for every copy of the shared_ptr<>, like passing one to a function, returning one, etc., the reference count has to be incremented, then decremented.
>
> Again, not true for intrusive refcounting. You can assign an intrusive
> smartpointer from a raw pointer or reference, so most of the time you can
> use
> those to pass/return. Refcount twiddling is only needed when changing an
> "owning" pointer, plus the rare "safety" case where you have a raw pointer
> but
> are calling code which might drop the last "real" owning ref.
>
>>And even with all that falderol, shared_ptr<> still can't do slicing, which are an important efficiency improvement.
>
> Agreed, but couldn't you do something very similar with a simple ptr class
> containing a pointer to the (refcounted) base string plus a couple of
> range end
> indices? That's worked well for me in the past, but there may be benefits
> of D
> slices that I'm missing.

If you think this is doable, then you should write a C++ version of wordcount based on this principal.  If you can get comparable performance to the D version then you may be on to something.

-Craig


April 25, 2006
kris wrote:
> 
> There again, I suspect fear of the GC is somewhat overplayed; although it certainly troubles me when it is abused. Thus, it doesn't bother me at all that Ares kinda' needs a GC also. Perhaps it's more a question of libraries, and code in general, treating it with a little respect :)

At the very least, a GC is required if you want to do much of anything useful with dynamic arrays.  For apps with a huge footprint, I may follow this route and define a custom allocator for all object types.

> After all, using any kind of malloc() should perhaps be a question-mark for high-performance code

True enough.


Sean
April 25, 2006
Craig Black wrote:
> 
> Hmmm.  I have glanced at the word count example many times but was not aware of why it was faster in D until now.  Is it really the GC paradigm that allows this performance?  I think I'm beginning to finally see some method to the madness.  Perhaps I need to rethink my stance on GC a little.

For what it's worth, the performance improvement may be attributed to two factors: the C++ map class is a balanced binary tree while D uses a hash table, and C++ indeed copies strings around while D passes references.

> It would seem then that GC can come out on top if there is a lot of sharing of allocated memory e.g. multiple references to each object.  However, if you are certain that there will be only a single reference to each object, manual memory management should always outperform GC.  Am I on the right track here?

D allows manual memory management as well, so you aren't forced to use the GC if you don't want it in most cases.  However, the lack of copy ctors does essentially eliminate the possibility of using smart pointers.  One downside of a GC is that it needs to store bookkeeping data, which may amount to as much as 50% memory overhead.  This can be problematic for very large applications.


Sean
April 25, 2006
Craig Black wrote:
> 
> Well, I think you can do this with C# since it supports stack objects. Even still, C# is typically slower than Java (not by much though).  I confess that I don't know all the technical details with VM's, but I have run my own benchmarks.  VM-based code is just way slower than native code.  Can't really tell you exactly why though because I don't really know or care to know what goes on inside a VM. (Except for the purpose of explaining to people why it is so much slower.)

One reason is that Java compilers perform basically no compile-time optimization, and the JIT does very little as well.  C/C++, on the other hand, can see huge performance improvements with optimization turned on.  C# does a bit better in this regard, but not tremendously so. However, some VM languages (such as the Lisp variants) seem to perform quite well compared to natively compiled code, though I don't have the experience to say why.


Sean
April 25, 2006
Craig Black wrote:
> "Mike Capp" <mike.capp@gmail.com> wrote in message news:e2l4g1$2n5s$1@digitaldaemon.com...
>> In article <e2k12i$19bi$1@digitaldaemon.com>, Walter Bright says...
>>> 1) for every allocation, shared_ptr<> allocates an *extra* block of
>>> memory to hold its accounting information. So you've got, out of the
>>> box, twice as many allocations/deallocations happening.
>> Not true for intrusive refcounting. (This isn't always feasible if you're
>> dealing with third-party classes, which is why shared_ptr isn't intrusive, but
>> if you _can_ use it it's very close to ideal.)
> 
> What's intrusive refcounting?  Please use small words. My brain is almost full right now. :)

Maintaining the reference count within the class being pointed to instead of within the shared pointer code.

>>> 2) for every copy of the shared_ptr<>, like passing one to a function,
>>> returning one, etc., the reference count has to be incremented, then
>>> decremented.
>> Again, not true for intrusive refcounting. You can assign an intrusive
>> smartpointer from a raw pointer or reference, so most of the time you can use
>> those to pass/return. Refcount twiddling is only needed when changing an
>> "owning" pointer, plus the rare "safety" case where you have a raw pointer but
>> are calling code which might drop the last "real" owning ref.
>>
>>> And even with all that falderol, shared_ptr<> still can't do slicing,
>>> which are an important efficiency improvement.
>> Agreed, but couldn't you do something very similar with a simple ptr class
>> containing a pointer to the (refcounted) base string plus a couple of range end
>> indices? That's worked well for me in the past, but there may be benefits of D
>> slices that I'm missing.
> 
> If you think this is doable, then you should write a C++ version of wordcount based on this principal.  If you can get comparable performance to the D version then you may be on to something.

I'd meant to do this and never got around to it.  However, the wordcount example is meant to compare performance using typical programming techniques for each language, and writing your own specialized containers isn't typical :-)


Sean
April 25, 2006
"Sean Kelly" <sean@f4.ca> wrote in message news:e2lgsh$5tf$1@digitaldaemon.com...
> Craig Black wrote:
>> "Mike Capp" <mike.capp@gmail.com> wrote in message news:e2l4g1$2n5s$1@digitaldaemon.com...
>>> In article <e2k12i$19bi$1@digitaldaemon.com>, Walter Bright says...
>>>> 1) for every allocation, shared_ptr<> allocates an *extra* block of memory to hold its accounting information. So you've got, out of the box, twice as many allocations/deallocations happening.
>>> Not true for intrusive refcounting. (This isn't always feasible if
>>> you're
>>> dealing with third-party classes, which is why shared_ptr isn't
>>> intrusive, but
>>> if you _can_ use it it's very close to ideal.)
>>
>> What's intrusive refcounting?  Please use small words. My brain is almost full right now. :)
>
> Maintaining the reference count within the class being pointed to instead of within the shared pointer code.
>
>>>> 2) for every copy of the shared_ptr<>, like passing one to a function, returning one, etc., the reference count has to be incremented, then decremented.
>>> Again, not true for intrusive refcounting. You can assign an intrusive
>>> smartpointer from a raw pointer or reference, so most of the time you
>>> can use
>>> those to pass/return. Refcount twiddling is only needed when changing an
>>> "owning" pointer, plus the rare "safety" case where you have a raw
>>> pointer but
>>> are calling code which might drop the last "real" owning ref.
>>>
>>>> And even with all that falderol, shared_ptr<> still can't do slicing, which are an important efficiency improvement.
>>> Agreed, but couldn't you do something very similar with a simple ptr
>>> class
>>> containing a pointer to the (refcounted) base string plus a couple of
>>> range end
>>> indices? That's worked well for me in the past, but there may be
>>> benefits of D
>>> slices that I'm missing.
>>
>> If you think this is doable, then you should write a C++ version of wordcount based on this principal.  If you can get comparable performance to the D version then you may be on to something.
>
> I'd meant to do this and never got around to it.  However, the wordcount example is meant to compare performance using typical programming techniques for each language, and writing your own specialized containers isn't typical :-)

That's not the point.  It demonstrates that the same performance can be achieved without GC, and all the baggage that accompanies it.  So performance-wise it would be like having your cake and eating it too.  If this was possible, perhaps D could be streamlined to take advantage of this somehow.

-Craig


April 25, 2006
In article <e2lhhq$6qv$1@digitaldaemon.com>, Craig Black says...
>
>
>"Sean Kelly" <sean@f4.ca> wrote in message news:e2lgsh$5tf$1@digitaldaemon.com...
>> Craig Black wrote:
>>>
>>> If you think this is doable, then you should write a C++ version of wordcount based on this principal.  If you can get comparable performance to the D version then you may be on to something.
>>
>> I'd meant to do this and never got around to it.  However, the wordcount example is meant to compare performance using typical programming techniques for each language, and writing your own specialized containers isn't typical :-)
>
>That's not the point.  It demonstrates that the same performance can be achieved without GC, and all the baggage that accompanies it.  So performance-wise it would be like having your cake and eating it too.  If this was possible, perhaps D could be streamlined to take advantage of this somehow.

I merely meant that that's not the point of the wordcount example.  However, I'll try and find the time to recode it using slices and hashtables in the next few days.  IIRC that brings performance in line with the D version, so I suppose that means you can "have your cake and eat it too."  It's worth mentioning, however, that slicing with explicit memory management doesn't have quite the utility it does with garbage collection.  It works great for apps where the entire buffer can be loaded en masse, as with the wordcount example, but in other cases it's a bit more complicated to manage the lifetime of the referenced data.  Thus the GC version is still a clear win in terms of ease of use, whether or not performance can be matched with specialized code elsewhere.


Sean