May 12, 2004
Walter wrote:

> I do the same sorts of optimizations in C++ when I want performance. Storage
> allocation is always a ripe target for performance tuning.

One would think so, but an excellent paper from OOPSLA '02 has hard evidence to the contrary: Reconsidering custom memory allocation (Berger, et. al). Paper is at http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf

The only time it seems that a custom memory allocator wins is in the "bulk free" case, when an entire heap of objects (like a compiler's AST for a class <hint>) gets deallocated. Otherwise, custom storage management is almost a complete waste of time.

I haven't taken Berger's HeapLayers or reaps allocation libraries out for a spin yet, but it looks interesting.


-scooter
May 12, 2004
> Even better for nearly sorted data is bidirectional short-circuit bubble sort.

Hmm, I couldn't find anything on this do you have some more info ?

Charlie

On Wed, 12 May 2004 17:09:30 +0100, Stewart Gordon <smjg_1998@yahoo.com> wrote:

> J Anderson wrote:
>
> <snip>
>> Right!  Even sort algorithms like short-circuit bubble sort can be useful if they are used in right application.  Short-circuit bubble sort is most useful when you know that most-of-the the array is sorted because then you get o(n). With standard quicksort you'd get O(n^2) for a sorted array.
>
> Even better for nearly sorted data is bidirectional short-circuit bubble sort.
>
> Stewart.
>



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
May 12, 2004
In article <c7tlva$22e8$1@digitaldaemon.com>, -scooter- says...
>
>Walter wrote:
>
>> I do the same sorts of optimizations in C++ when I want performance. Storage allocation is always a ripe target for performance tuning.
>
>One would think so, but an excellent paper from OOPSLA '02 has hard evidence to the contrary: Reconsidering custom memory allocation (Berger, et. al). Paper is at http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf
>
>The only time it seems that a custom memory allocator wins is in the "bulk free" case, when an entire heap of objects (like a compiler's AST for a class <hint>) gets deallocated. Otherwise, custom storage management is almost a complete waste of time.
>
>I haven't taken Berger's HeapLayers or reaps allocation libraries out for a spin yet, but it looks interesting.
>
>
>-scooter

In a GC program, you can avoid GC overhead by keeping free lists of anything you allocate.  The garbage collector won't run because you aren't calling new(). This was a tip for "real time" programming.  Note that you are avoiding benefit #1 of a garbage collector: not having to manage delete[] yourself.  You also run the risk of old pointers to supposedly "new" objects, if it wasn't ready to be deleted.  On the plus side, if you are not sure its garbage, dont put it on the list: the GC will eventually hunt it down if you are, in fact, leaking.

Kevin






May 12, 2004
"christopher diggins" <cdiggins@users.sourceforge.net> wrote in message news:c7t5qi$19oe$1@digitaldaemon.com...
> In retrospect, I am starting to be of the opinion that the example may
fact
> rigged as you mentioned. There is no way that I can justify a design that uses ICompare as a class with virtual functions.

I'd also be careful about the GetMagnitude() function. The time spent in the two multiplies may swamp other effects. I've seen many benchmark programs that purport to measure X, when they actually are measuring Y that the benchmark writer thought was innocuous. Compiler vendors have been known to exploit these naive benchmarks by optimizing the heck out of Y (which may be much easier than optimizing X), and letting the benchmark reporter misinterpret it into saying that X was optimized.

I've even seen crafty vendors create benchmarks specifically to mislead people into thinking X was being optimized when it was actually Y. (This is why I tend to not bother creating benchmarks for any but my own use, I like to use benchmarks written by people not affiliated with DM to avoid even the appearance of impropriety.)

In your benchmark, I'd check the multiplies, I'd also check to see if maybe one compiler does a better job of function inlining or register allocation before concluding that the speed difference is due to virtual dispatch. Running a disassembler over the generated code and being familiar with performance assembler coding is essential. The results can be quite surprising.


May 12, 2004
Here's something interesting for everyone.  I took the heapsort.d example, redid
it to be a bit more D-
like (used a true array rather than an array class, used opCmp instead of a
Compare method, etc.)

Anyways, I decided to test my version against Walter's QuickSort.  Here are the
results (the QuickSort
and HeapSort files are identical except one calls HeapSort(a) and the other
calls a.sort):

DMD QuickSort - 93 msecs
DMD HeapSort - 62 msecs

GDC QuickSort - 76 msecs
GDC HeapSort - 62 msecs

The times are more or less constant across runs.  Only the GDC HeapSort varies,
and it only be an msec
or two.

This brings to light two things: First, that the HeapSort is potentially more
viable than the QuickSort.
And secondly, that the GDC backend is a good bit more powerful than the DMD
backend.  Also, I'm
pretty sure that GDC's libphobos has debugging enabled, so it might be able to
go faster!

Comment? Thoughts?

Owen

P.S. I've attached the code for this test, in case I've made a silly mistake that is favoring the HeapSort.

In article <c7sncs$k3e$1@digitaldaemon.com>, Walter says...
>I'd have to do testing to make sure, but I'm pretty sure it's due to quicksort being algorithmically superior. One of the reasons I built sort into arrays is because people implement inefficient sorts over and over, and I wanted to make it real easy to use the best algorithm. (A job I had once was to optimize a large program written by others; in poking around in it I found 3 different hacky implementations of bubblesort. Replaced them all with a call to C's qsort(), and I was an instant hero for the speedup <g>.)


May 12, 2004
Oops.  Forgot the attachment.  Here it is.

In article <c7sncs$k3e$1@digitaldaemon.com>, Walter says...
>I'd have to do testing to make sure, but I'm pretty sure it's due to quicksort being algorithmically superior. One of the reasons I built sort into arrays is because people implement inefficient sorts over and over, and I wanted to make it real easy to use the best algorithm. (A job I had once was to optimize a large program written by others; in poking around in it I found 3 different hacky implementations of bubblesort. Replaced them all with a call to C's qsort(), and I was an instant hero for the speedup <g>.)


May 12, 2004
Walter wrote:

> "christopher diggins" <cdiggins@users.sourceforge.net> wrote in message
> news:c7r63p$1bv1$1@digitaldaemon.com...
> 
>>Thanks for the feedback, Walter. I had only used -O but not -inline
>>and -release which sped up the execution to 703 msec and the memory usage
>>dropped to 2MB on my system. I have now updated the web page as a result.
> 
> 
> Thanks! Could you also please hyperlink the "Digital Mars Compiler" to
> www.digitalmars.com/d/ ? You'll also get faster results if you use 'final'
> on all member functions that are not overrided (essentially all the
> functions you don't mark as 'virtual' in the C++ version).
> 

I think I've read in the docs that the D compiler autodetects if a function should be virtual or not. Why is there such a speedgain by explicitely marking methods as final, when the compiler can do this implicitely?

Regards,
Jörg
May 12, 2004
On Tue, 11 May 2004 14:47:58 -0700, Walter wrote:

> 
> You're welcome. You should also note that arrays have a built-in sort property, which uses quicksort. It'll blow away heapsort <g>. Doing that cuts the sort time on my machine from 453 ms to 63 ms. Source attached.
> 

Why are your not using introspective sort ?? http://www.nist.gov/dads/HTML/introspectiveSort.html
May 13, 2004
resistor AT nospam DOT mac DOT com wrote:

>Here's something interesting for everyone.  I took the heapsort.d example, redid
>it to be a bit more D-
>like (used a true array rather than an array class, used opCmp instead of a
>Compare method, etc.)
>
>Anyways, I decided to test my version against Walter's QuickSort.  Here are the
>results (the QuickSort and HeapSort files are identical except one calls HeapSort(a) and the other
>calls a.sort):
>
>DMD QuickSort - 93 msecs
>DMD HeapSort - 62 msecs
>
>GDC QuickSort - 76 msecs
>GDC HeapSort - 62 msecs
>
>The times are more or less constant across runs.  Only the GDC HeapSort varies,
>and it only be an msec or two.
>
>This brings to light two things: First, that the HeapSort is potentially more
>viable than the QuickSort.  And secondly, that the GDC backend is a good bit more powerful than the DMD
>backend.  Also, I'm pretty sure that GDC's libphobos has debugging enabled, so it might be able to
>go faster!
>
>Comment? Thoughts?
>
>Owen
>
>P.S. I've attached the code for this test, in case I've made a silly mistake
>that is favoring the HeapSort.
>
>In article <c7sncs$k3e$1@digitaldaemon.com>, Walter says...
>  
>
>>I'd have to do testing to make sure, but I'm pretty sure it's due to
>>quicksort being algorithmically superior. One of the reasons I built sort
>>into arrays is because people implement inefficient sorts over and over, and
>>I wanted to make it real easy to use the best algorithm. (A job I had once
>>was to optimize a large program written by others; in poking around in it I
>>found 3 different hacky implementations of bubblesort. Replaced them all
>>with a call to C's qsort(), and I was an instant hero for the speedup <g>.)
>>    
>>
Interesting.

It would also be interesting to see how something like radix sort compares.

-- 
-Anderson: http://badmama.com.au/~anderson/
May 13, 2004
C wrote:

>> Even better for nearly sorted data is bidirectional short-circuit bubble sort.
>
>
> Hmm, I couldn't find anything on this do you have some more info ?
>
> Charlie
>
> On Wed, 12 May 2004 17:09:30 +0100, Stewart Gordon <smjg_1998@yahoo.com> wrote:
>
>> J Anderson wrote:
>>
>> <snip>
>>
>>> Right!  Even sort algorithms like short-circuit bubble sort can be useful if they are used in right application.  Short-circuit bubble sort is most useful when you know that most-of-the the array is sorted because then you get o(n). With standard quicksort you'd get O(n^2) for a sorted array.
>>
>>
>> Even better for nearly sorted data is bidirectional short-circuit bubble sort.
>>
>> Stewart.
>>
>
>
>
In a bubble sort small elements are *bubbled* up to the top of the array.  In a bidirectional sort elements sink down to the bottom as well.  Of course short-circuit bubble sort still wins in some circumstances.

Here's an interesting page on sorting (includes bidirectional):
http://www.cs.ubc.ca/spider/harrison/Java/sorting-demo.html

BTW: In uni I had to do a project with 18 completely different sort algorithms.  So I really got to know them well.

-- 
-Anderson: http://badmama.com.au/~anderson/