October 09, 2010
On Fri, 08 Oct 2010 14:58:27 +0300, Steven Schveighoffer <schveiguy@yahoo.com> wrote:

> On Thu, 07 Oct 2010 21:18:56 -0400, Juanjo Alvarez <fake@fakeemail.com> wrote:
>
>> On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
>>> Except that when you're dealing with generic code which has to deal
>> with
>>> multiple container types (like std.algorithm), you _need_ certain
>> complexity
>>> guarantees about an operation since it could happen on any
>> container that it's
>>
>> Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.
>
> That kind of "documentation" is useless, it doesn't prevent use, and it doesn't feel right to the person who accidentally uses it.  When I call
>
> sort(x);
>
> and it performs horribly, am I going to blame x or sort?  Certainly, I'll never think it's my own fault :)
>
> -Steve

Sure, write some random strings and compile it, if it doesn't compile, you can always blame Walter, right?
If documentation is useless, so is most of the programmers, you got to accept it :)

Question is, should this affect compiler design? If you think it should, you can't write a single line that calls "some other guy"'s code, it doesn't matter if he use "in" or "out",
operators or just simple functions.

Thanks!

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
October 09, 2010
Seth Hoenig wrote:
> 2010/10/8 "Jérôme M. Berger" <jeberger@free.fr>
> 
>> Steven Schveighoffer wrote:
>>> On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd@eldwood.com> wrote:
>>>> I can't say I've ever cared about the big-O complexity of an operation.
>>> Then you don't understand how important it is.
>>         If big O complexity is so important, then why does everyone use
>> quicksort (which is O(n**2)) and not heap sort or merge sort (which
>> are O(n*log(n)))?
>>
>>                Jerome
> 
> Because on average quicksort is faster than heap sort, and uses much less space than merge sort. Also, trivial guards can be put in place to avoid running quicksort in a worst case (pre-sorted data) scenario.
> 
	I rest my case.

		Jerome
-- 
mailto:jeberger@free.fr
http://jeberger.free.fr
Jabber: jeberger@jabber.fr



October 09, 2010
On 08/10/2010 00:39, Tomek Sowiński wrote:
<snip>
> What's the halting problem?

Basically, it's the challenge of determining algorithmically whether an arbitrary algorithm given arbitrary input will eventually halt or carry on running forever.

The point is that the halting problem is known to be unsolvable.  The standard proof of this is as follows.  Suppose the halt analyser algorithm we seek exists.  Call it WillHalt(Algorithm, Input).  Then we can consider WillHalt(Algorithm, Algorithm).

Then we can define a new algorithm, LoopIfHaltsOnItself(Algorithm), defined as

if WillHalt(Algorithm, Algorithm) then
    loop forever
else
    return

Now try to analyse the outcome of LoopIfHaltsOnItself(LoopIfHaltsOnItself).


Personally, I think it's a shame that the halting problem can't be solved.  If it could, we could use it to solve many mathematical problems that have as it happens remained unsolved for centuries.

Stewart.
October 09, 2010
On 9/10/10 3:11 PM, Stewart Gordon wrote:
> Personally, I think it's a shame that the halting problem can't be
> solved. If it could, we could use it to solve many mathematical problems
> that have as it happens remained unsolved for centuries.

But solving those problems would mean nothing in that hypothetical situation, because, for the halting problem to be solvable, it would require that P <=> ~P, so any "theorem" would be meaningless.

Besides, I don't care to think about universes where P <=> ~P  :-)
October 09, 2010
On 10/9/10 9:05 CDT, "Jérôme M. Berger" wrote:
> Seth Hoenig wrote:
>> 2010/10/8 "Jérôme M. Berger"<jeberger@free.fr>
>>
>>> Steven Schveighoffer wrote:
>>>> On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke<rainerd@eldwood.com>
>>>> wrote:
>>>>> I can't say I've ever cared about the big-O complexity of an operation.
>>>> Then you don't understand how important it is.
>>>          If big O complexity is so important, then why does everyone use
>>> quicksort (which is O(n**2)) and not heap sort or merge sort (which
>>> are O(n*log(n)))?
>>>
>>>                 Jerome
>>
>> Because on average quicksort is faster than heap sort, and uses much less
>> space than merge sort. Also, trivial guards can be put in place to avoid
>> running quicksort in a worst case (pre-sorted data) scenario.
>>
> 	I rest my case.
>
> 		Jerome

In fact, guards can be put to ensure that the _expected_ (not average, not best-case) complexity is O(n log n). This makes the risk of hitting a worst-case scenario negligible in a principled manner.

http://en.wikipedia.org/wiki/Quicksort#Randomized_quicksort_expected_complexity

http://docs.google.com/viewer?a=v&q=cache:MdBVR26N5UsJ:www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/lecture_notes/lect0123.pdf+randomized+quicksort&hl=en&gl=us&pid=bl&srcid=ADGEESi3GTSxfHWkeb_f14H0pkbigduS94qJVc9XLQ7aPa6lPUJ5JZbggI0izFe3ogiVOJCYcVkGtdumaS9hBvrGw0-TA_yZQj2qd1-AEudKyEWEGXnO4sTwqCZL95OpFkdFHDF2WXFV&sig=AHIEtbT1R0q5RIR4rob17QUKlYVl90vXyQ

Currently std.algorithm.getPivot picks the middle of the range as the pivot, but I made it modular such that it can be improved in a number of ways (i.e. randomized, median-of-3, median-of-5, etc).


Andrei
October 09, 2010
Andrei:

> Currently std.algorithm.getPivot picks the middle of the range as the pivot, but I made it modular such that it can be improved in a number of ways (i.e. randomized, median-of-3, median-of-5, etc).

Isn't median of 3 a better default?

Bye,
bearophile
October 09, 2010
On 9/10/10 4:53 PM, bearophile wrote:
> Andrei:
>
>> Currently std.algorithm.getPivot picks the middle of the range as the
>> pivot, but I made it modular such that it can be improved in a number of
>> ways (i.e. randomized, median-of-3, median-of-5, etc).
>
> Isn't median of 3 a better default?
>
> Bye,
> bearophile

There is no "best". They are all trade-offs between the time taken to select the pivot and the suitability of the pivot.

Middle - Fastest, not very good
Median-of-3 - Slower, but slightly better
Median-of-5 - Slower still, but even better

Randomized - Speed depends on the random number algorithm. Suitability is random, so could be good, could be bad, but on average it's good enough. Has the added quality that it's practically impossible to devise worst-case input for it (you'd need to know the generator and seed), whereas for the other 3, a malicious user could provide data that gives you O(n^2) complexity.
October 09, 2010
On 10/9/10 11:23 CDT, Peter Alexander wrote:
> On 9/10/10 4:53 PM, bearophile wrote:
>> Andrei:
>>
>>> Currently std.algorithm.getPivot picks the middle of the range as the
>>> pivot, but I made it modular such that it can be improved in a number of
>>> ways (i.e. randomized, median-of-3, median-of-5, etc).
>>
>> Isn't median of 3 a better default?
>>
>> Bye,
>> bearophile
>
> There is no "best". They are all trade-offs between the time taken to
> select the pivot and the suitability of the pivot.
>
> Middle - Fastest, not very good
> Median-of-3 - Slower, but slightly better
> Median-of-5 - Slower still, but even better
>
> Randomized - Speed depends on the random number algorithm. Suitability
> is random, so could be good, could be bad, but on average it's good
> enough. Has the added quality that it's practically impossible to devise
> worst-case input for it (you'd need to know the generator and seed),
> whereas for the other 3, a malicious user could provide data that gives
> you O(n^2) complexity.

Also: in-place sorting for the median cases.

Andrei
October 09, 2010
Andrei Alexandrescu Wrote:
> 
> In fact, guards can be put to ensure that the _expected_ (not average, not best-case) complexity is O(n log n). This makes the risk of hitting a worst-case scenario negligible in a principled manner.
> 
> http://en.wikipedia.org/wiki/Quicksort#Randomized_quicksort_expected_complexity
> 
> http://docs.google.com/viewer?a=v&q=cache:MdBVR26N5UsJ:www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/lecture_notes/lect0123.pdf+randomized+quicksort&hl=en&gl=us&pid=bl&srcid=ADGEESi3GTSxfHWkeb_f14H0pkbigduS94qJVc9XLQ7aPa6lPUJ5JZbggI0izFe3ogiVOJCYcVkGtdumaS9hBvrGw0-TA_yZQj2qd1-AEudKyEWEGXnO4sTwqCZL95OpFkdFHDF2WXFV&sig=AHIEtbT1R0q5RIR4rob17QUKlYVl90vXyQ
> 
> Currently std.algorithm.getPivot picks the middle of the range as the pivot, but I made it modular such that it can be improved in a number of ways (i.e. randomized, median-of-3, median-of-5, etc).

It would be nice if it used insertion sort for small ranges as well.  There's also a quicksort variant that partitions equal-to elements separately from less-than and greater-than values, which is faster for ranges with a lot of equal elements (and no slower for ranges without).
October 09, 2010
Peter Alexander Wrote:

> On 9/10/10 4:53 PM, bearophile wrote:
> > Andrei:
> >
> >> Currently std.algorithm.getPivot picks the middle of the range as the pivot, but I made it modular such that it can be improved in a number of ways (i.e. randomized, median-of-3, median-of-5, etc).
> >
> > Isn't median of 3 a better default?
> >
> > Bye,
> > bearophile
> 
> There is no "best". They are all trade-offs between the time taken to select the pivot and the suitability of the pivot.
> 
> Middle - Fastest, not very good
> Median-of-3 - Slower, but slightly better
> Median-of-5 - Slower still, but even better
> 
> Randomized - Speed depends on the random number algorithm. Suitability is random, so could be good, could be bad, but on average it's good enough. Has the added quality that it's practically impossible to devise worst-case input for it (you'd need to know the generator and seed), whereas for the other 3, a malicious user could provide data that gives you O(n^2) complexity.

Introsort (a quicksort variant) degrades to heapsort for sorts that are going quadratic (it tracks the recursion depth and transitions when the depth is more than a desired threshold for the size of the range).  The sort routine I wrote for Tango uses this approach plus media-of-3, the insertion sort fallback, and the quicksort variant that separately tracks equal values.  All told, it's as fast or faster than everything else I've tested, even for contrived degenerate cases.  Perhaps I'll convert it to use ranges and see how it does.