May 13, 2004
"Knud Sørensen" <knud@NetRunner.all-technology.com> wrote in message news:pan.2004.05.12.22.37.18.149835@NetRunner.all-technology.com...
> On Tue, 11 May 2004 14:47:58 -0700, Walter wrote:
>
> >
> > You're welcome. You should also note that arrays have a built-in sort property, which uses quicksort. It'll blow away heapsort <g>. Doing that cuts the sort time on my machine from 453 ms to 63 ms. Source attached.
> >
>
> Why are your not using introspective sort ?? http://www.nist.gov/dads/HTML/introspectiveSort.html

Actually, it does just that (switch to heapsort as necessary).


May 13, 2004
"Jörg Rüppel" <joerg@sharky-x.de> wrote in message news:c7u7ru$4dt$1@digitaldaemon.com...
> Walter wrote:
>
> > "christopher diggins" <cdiggins@users.sourceforge.net> wrote in message news:c7r63p$1bv1$1@digitaldaemon.com...
> >
> >>Thanks for the feedback, Walter. I had only used -O but not -inline and -release which sped up the execution to 703 msec and the memory
usage
> >>dropped to 2MB on my system. I have now updated the web page as a
result.
> >
> >
> > Thanks! Could you also please hyperlink the "Digital Mars Compiler" to www.digitalmars.com/d/ ? You'll also get faster results if you use
'final'
> > on all member functions that are not overrided (essentially all the functions you don't mark as 'virtual' in the C++ version).
> >
>
> I think I've read in the docs that the D compiler autodetects if a function should be virtual or not. Why is there such a speedgain by explicitely marking methods as final, when the compiler can do this implicitely?

Because the compiler is not sophisticated enough to do it automatically yet.


May 13, 2004
In article <c7tbbi$1i8q$1@digitaldaemon.com>, Stewart Gordon says...
>
>Jan-Eric Duden wrote:
>
>>>Can you give an example of an O(n) case of quicksort?
>> 
>> A sorted array. You only get O(n) in a slightly modified version of q-sort that quits if no swaps occured.
>
>By 'swaps' do you mean the swapping of an element with the hole into which the key element is to go?
>
>How can one element being already in the right place imply that the whole array is sorted?
>
><snip>
>> Yep, that's why sometimes heapsort is preferred over quicksort.
>> 
>> My question was more related to the constants. I wasn't aware that heapsort is that much slower.
>
>What constants?

O(N), O(N^2), O(N*lnN) and so on, describe how the number of operations required
is affected by the size of the input set.  This is called "big O notation".

We say a sort takes O(N^2) when the time required is proportional to the square
of the input set size (N).  This is called "order of N squared".

The real speed of the algorithm is something like K * N^2.  If one algorithm is twice as fast as another for any value of N, then it is said to have K=2 while the other has K=1.  K is the constant.

All of this analysis ignores effects like cache locality, because they are difficult to summarize in any simple and general way.  Normally, O(...), or "order" is the most important criteria.  If O(...) is the same for both designs, then the secondary effects apply.

For large data sets, cache and locality issues are most important, whereas for small data sets, the "constant" is probably more important.  "Small" here probably means "fits, or almost fits, in memory".

Special data sets can trip up some algorithms, for example, carelessly written qsort will take forever on already-sorted data.

Bubble sort is the classic "bad" sorting algorithm.  It still has its defenders, for special cases.  Like insertion sort, it probably survives because it is easy to code for small hand written linked lists in languages like C.

Often large, old, programs in C will have dozens or hundreds of data structures of the type "small-structs-in-a-short-list", half of these will have a ten line hand coded insertion sort (shudder).  "Don't get out of the boat".

I always liked "shell sorts" and "natural merge sorts" but noone uses those any more.  Probably because they're really slow... oh well to each his own I guess.

Kevin



May 13, 2004
"-scooter-" <scottm@cs.ucla.edu> wrote in message news:c7tlva$22e8$1@digitaldaemon.com...
> Walter wrote:
>
> > I do the same sorts of optimizations in C++ when I want performance.
Storage
> > allocation is always a ripe target for performance tuning.
>
> One would think so, but an excellent paper from OOPSLA '02 has hard
evidence
> to the contrary: Reconsidering custom memory allocation (Berger, et. al). Paper is at http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf
>
> The only time it seems that a custom memory allocator wins is in the "bulk free" case, when an entire heap of objects (like a compiler's AST for a class <hint>) gets deallocated. Otherwise, custom storage management is almost a complete waste of time.
>
> I haven't taken Berger's HeapLayers or reaps allocation libraries out for
a
> spin yet, but it looks interesting.

The article is right in that sometimes performance optimizations yield unexpected results. Just goes to show one should rely on a good performance profiler when doing optimization.


May 13, 2004
C wrote:

>> Even better for nearly sorted data is bidirectional short-circuit bubble sort.
> 
> 
> Hmm, I couldn't find anything on this do you have some more info ?
<snip>

Several years ago I wrote a simple database application.  When adding a record, it would add it to the end of the file.  When deleting one it would fill the gap with the record from the end.  When updating it would just update in place.

The sort implementation was a bidirectional bubble sort.  Hence the records added since the last sort would bubble up to their correct positions, and the records moved to fill gaps would bubble down back towards the end.  Updated records (where the change rendered the record out of order) could bubble in either direction.

The algorithm was also quite suited to the program's rather simplistic dynamic record loading.  At a time it would keep 50 consecutive records in memory, moving the 'window' as necessary.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment.  Please keep replies on the 'group where everyone may benefit.
May 13, 2004
Kevin Bealer wrote:

<snip>
> O(N), O(N^2), O(N*lnN) and so on, describe how the number of operations required
> is affected by the size of the input set.  This is called "big O notation".
> 
> We say a sort takes O(N^2) when the time required is proportional to the square
> of the input set size (N).  This is called "order of N squared".

You seldom get direct proportionality in practice.  In fact, it means that there exists some K for which F(N) < K*N^2, for all N>N0 for some N0.

And if you replace < with >, you get big Omega notation.  And if you consider both together, you get big Theta notation.

I guess that people commonly use big O when they really mean big Theta, probably because O is on most keyboards/character sets, and considering that anything Theta(N^2) is O(N^2) anyway.

> The real speed of the algorithm is something like K * N^2.  If one algorithm is
> twice as fast as another for any value of N, then it is said to have K=2 while
> the other has K=1.  K is the constant.

Yes.  Though it would depend on such things as the relative costs of comparison and assignment/swapping.

<snip>
> Bubble sort is the classic "bad" sorting algorithm.  It still has its defenders,
> for special cases.  Like insertion sort, it probably survives because it is easy
> to code for small hand written linked lists in languages like C.

I guess for linked lists, insertion sort is as easy or perhaps easier to code than bubble sort.

Just thinking about it, on a linked list, insertion sort has the advantage that an insertion is O(1).  OTOH, on an array it has the advantage that a search for the right place is O(log N).  I suppose the constant for the whole alg is smaller for the linked list....

I guess selection sort (or shaker sort, which I hadn't heard of before but is a perfectly intuitive improvement I had thought up) is equally easy and thus defendable.  It even has its strenghts, such as sorting tournament crosstables....

<snip>
> I always liked "shell sorts" and "natural merge sorts" but noone uses those any
> more.  Probably because they're really slow... oh well to each his own I guess.

Strange - JA's source shows shell sort being marginally quicker than heap sort.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment.  Please keep replies on the 'group where everyone may benefit.
May 13, 2004
Stewart Gordon wrote:

>
> Yes.  Though it would depend on such things as the relative costs of comparison and assignment/swapping.

<snip>

On of the great things about ADA generics is that you could easily replace the comparison operator and do tests on how many comparisons were actually performed.  Of course you could box all the variables you use in the sort algorithm but that's just not the same.  I wish D templates could accept operators/functions as parameters.  At the very least its a good way to profile that particular aspect of an algorithm.


-- 
-Anderson: http://badmama.com.au/~anderson/
May 13, 2004
Stewart Gordon wrote:

>
>> I always liked "shell sorts" and "natural merge sorts" but noone uses those any
>> more.  Probably because they're really slow... oh well to each his own I guess.
>
>
> Strange - JA's source shows shell sort being marginally quicker than heap sort.
>
> Stewart.
>
If your talking about the website I provided, did you read that shell sort was slightly broken?


-- 
-Anderson: http://badmama.com.au/~anderson/
May 13, 2004
>>
>>I think I've read in the docs that the D compiler autodetects if a
>>function should be virtual or not. Why is there such a speedgain by
>>explicitely marking methods as final, when the compiler can do this
>>implicitely?
> 
> 
> Because the compiler is not sophisticated enough to do it automatically yet.
> 
> 
Thanks, that's all I wanted to hear ;)

Regards,
Jörg
May 13, 2004
J Anderson wrote:
<snip>
> If your talking

My talking?

> about the website I provided, did you read that shell sort was slightly broken?

No.  I still don't.  Where's that little bit of info buried then?

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment.  Please keep replies on the 'group where everyone may benefit.