April 05, 2006
I believe that I have replied to this already. My first suspisions was with
comparisons, but there is nothing wrong with it. Comparisons is updated prior
to call to the base case psearch, so why should it be updated again during the
base case, since in the base case, no recursive call to psearch is being made.
Doing so might actually result in 256 or so as you claim, but that would be
introducing a bug and not eliminating it. Sorry, these arguments have only
served to convince me even further of the correctness and workability
of this algorithm.

In article <e0uug0$2mq8$1@digitaldaemon.com>, kwilson@nowhere.com says...
>
>Hi there Sandresh,
>
>The "bug" within your program is simply that your base case (or terminating
>case) for the recursion is (size == 1) and you increment/decrement a variable
>(ie. depth)  outside of this base case "if" statement.
>
>This leads to a recursively maximized value of 9 (or log base 2 of 512), as pointed out. Then, since no variable is updated in the base case "if" statement, you don't actually add to the comparison total accordingly. Thus you mistakenly get a maximal depth of 9 for each search. This is not actually the depth to find the element you are searching for, but rather the depth to the "size == 1" base case.
>
>Now you will say that "this is not true" since the base case is reached many times during the search process before the search is terminated....not just once after depth equals nine. The static "depth" variable is not being updated properly, though.
>
>Once you try the changes suggested below you will see that the "Average depth" of the 2048 searches per distribution will be approximately 256. Single "depths" or comparisons will range from 1 to 512 as would be expected. The average per distribution is approximately N/2 as several others in this thread have pointed out, as the correct average case search for an element in a randomized array of size N. If you want to argue that the "depth" of the actual comparisons never exceeds 9, then that is obviously true because log base 2 of 512 is 9.....but that doesn't mean that the number of comparisons is 9. Sorry ;)
>
>Hope the code change and viewing the results will clear things up if anyone can't follow my explanation above (my fault for not explaining things well enough, I'm afraid).
>
>
>Try these simple changes:
>
>in psearch.c
>
>on line 9:
>add ------>  ++comparisons;
>
>
>on line 18:
>comment out --------> //comparisons = ++depth;
>
>
>on line 31:
>comment out --------> //--depth;
>
>Thanks,
>K.Wilson
>
>
>
>
>


April 05, 2006
sundaresh wrote:
> I believe that I have replied to this already. My first suspisions was with
> comparisons, but there is nothing wrong with it. Comparisons is updated prior
> to call to the base case psearch, so why should it be updated again during the
> base case, since in the base case, no recursive call to psearch is being made.
> Doing so might actually result in 256 or so as you claim, but that would be
> introducing a bug and not eliminating it. Sorry, these arguments have only
> served to convince me even further of the correctness and workability
> of this algorithm.


Jolly good! Capital stuff, what!

Now then, my good man ~ you've qualified for tonight's free gift: a front-row ticket to the greatest show on the planet!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      The Cast (in order of appearance.)
      M= Man looking for an argument
      R= Receptionist
      Q= Abuser
      A= Arguer (John Cleese)
      C= Complainer (Eric Idle)
      H= Head Hitter


M:   Ah. I'd like to have an argument, please.
R:    Certainly sir. Have you been here before?
M:   No, I haven't, this is my first time.
R:     I see. Well, do you want to have just one argument, or were you thinking of taking a course?
M:   Well, what is the cost?
R:    Well, It's one pound for a five minute argument, but only eight pounds for a course of ten.
M:   Well, I think it would be best if I perhaps started off with just the one and then see how it goes.
R:     Fine. Well, I'll see who's free at the moment.
Pause
R:    Mr. DeBakey's free, but he's a little bit conciliatory.
Ahh yes, Try Mr. Barnard; room 12.
M:    Thank you.

(Walks down the hall. Opens door.)

Q:   WHAT DO YOU WANT?
M:   Well, I was told outside that...
Q:   Don't give me that, you snotty-faced heap of parrot droppings!
M:   What?
Q:   Shut your festering gob, you tit! Your type really makes me puke, you vacuous, coffee-nosed, maloderous, pervert!!!
M:   Look, I CAME HERE FOR AN ARGUMENT, I'm not going to just stand...!!
Q:   OH, oh I'm sorry, but this is abuse.
M:   Oh, I see, well, that explains it.
Q:   Ah yes, you want room 12A, Just along the corridor.
M:   Oh, Thank you very much. Sorry.
Q:   Not at all.
M:   Thank You.
(Under his breath) Stupid git!!

(Walk down the corridor)
M: (Knock)
A:   Come in.
M:   Ah, Is this the right room for an argument?
A:   I told you once.
M:   No you haven't.
A:   Yes I have.
M:   When?
A:    Just now.
M:   No you didn't.
A:   Yes I did.
M:  You didn't
A:   I did!
M:  You didn't!
A:   I'm telling you I did!
M:  You did not!!
A:   Oh, I'm sorry, just one moment. Is this a five minute argument or the full half hour?
M:  Oh, just the five minutes.
A:   Ah, thank you. Anyway, I did.
M:  You most certainly did not.
A:   Look, let's get this thing clear; I quite definitely told you.
M:  No you did not.
A:   Yes I did.
M:   No you didn't.
A:   Yes I did.
M:   No you didn't.
A:   Yes I did.
M:   No you didn't.
A:   Yes I did.
M:  You didn't.
A:   Did.
M:  Oh look, this isn't an argument.
A:   Yes it is.
M:   No it isn't. It's just contradiction.
A:   No it isn't.
M:  It is!
A:   It is not.
M:  Look, you just contradicted me.
A:   I did not.
M:  Oh you did!!
A:   No, no, no.
M:  You did just then.
A:   Nonsense!
M:  Oh, this is futile!
A:   No it isn't.
M:  I came here for a good argument.
A:   No you didn't; no, you came here for an argument.
M:  An argument isn't just contradiction.
A:   It can be.
M:  No it can't. An argument is a connected series of statements intended to establish a proposition.
A:   No it isn't.
M:  Yes it is! It's not just contradiction.
A:   Look, if I argue with you, I must take up a contrary position.
M:  Yes, but that's not just saying 'No it isn't.'
A:   Yes it is!
M:   No it isn't!

A:   Yes it is!
M:  Argument is an intellectual process. Contradiction is just the automatic gainsaying of any statement the other person makes.
(short pause)
A:  No it isn't.
M:  It is.
A:  Not at all.
M:  Now look.
A: (Rings bell)  Good Morning.
M:  What?
A:   That's it. Good morning.
M:   I was just getting interested.
A:   Sorry, the five minutes is up.
M:  That was never five minutes!
A:   I'm afraid it was.
M:  It wasn't.
Pause
A:   I'm sorry, but I'm not allowed to argue anymore.
M:  What?!
A:   If you want me to go on arguing, you'll have to pay for another five minutes.
M:  Yes, but that was never five minutes, just now. Oh come on!
A:  (Hums)
M:  Look, this is ridiculous.
A:   I'm sorry, but I'm not allowed to argue unless you've paid!
M:  Oh, all right.
(pays money)
A:   Thank you.
short pause
M:  Well?
A:   Well what?
M:   That wasn't really five minutes, just now.
A:    I told you, I'm not allowed to argue unless you've paid.
M:   I just paid!
A:   No you didn't.
M:   I DID!
A:   No you didn't.
M:  Look, I don't want to argue about that.
A:  Well, you didn't pay.
M:  Aha. If I didn't pay, why are you arguing? I Got you!
A:   No you haven't.
M:  Yes I have. If you're arguing, I must have paid.
A:   Not necessarily. I could be arguing in my spare time.
M:  Oh I've had enough of this.
A:   No you haven't.
M:  Oh Shut up.

(Walks down the stairs. Opens door.)

M:  I want to complain.
C:  You want to complain! Look at these shoes. I've only had them three weeks and the heels are worn right through.
M:  No, I want to complain about...
C:   If you complain nothing happens, you might as well not bother.
M:  Oh!
C:   Oh my back hurts, it's not a very fine day and I'm sick and tired of this office.


(Slams door. walks down corridor, opens next door.)

M:  Hello, I want to... Ooooh!
H:   No, no, no. Hold your head like this, then go Waaah. Try it again.
M:  uuuwwhh!!
H:   Better, Better, but Waah, Waah! Put your hand there.
M:  No.
H:   Now..
M:  Waaaaah!!!
H:   Good, Good! That's it.
M:  Stop hitting me!!
H:  What?
M:  Stop hitting me!!
H:   Stop hitting you?
M:  Yes!
H:   Why did you come in here then?
M:   I wanted to complain.
H:   Oh no, that's next door. It's being-hit-on-the-head lessons in here.
M:  What a stupid concept.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Sundaresh-dude;

If you want to argue the merits of your outstanding algorithm, you need to choose the right room to begin. This is the room for discussion upon the D language itself ~ not language-agnostic algorithms. Might I suggest you try the /slashdot/ room instead? Just down the hallway, and to the right a bit. Yes, that's right. You may get a good argument there
April 05, 2006
Sorry about that, you were right, for all but the base case, the value of comparisons is incremented in the else part. So here is the corrected version. But the search time is still O(lg(n)), since avg comparisons is just one more at 10,and I think the worst case comparisons is 11, so this still seems to be comparable to binary search.Here is the corrected version.

#include<stdlib.h>

int comparisons;

void *psearch(void *base, int size, int width, void *element, int
(*compare)(void *, void *)) {
static int depth = 0;

if(size <= 1) {
++comparisons;
return ! size || compare(base, element) ? NULL : base;
}
else {
int half, right;
void *found;

half = size / 2;
right = abs(rand()) % 2;
comparisons = ++depth;
if(right) {
found = psearch(base + half * width, size - half, width, element, compare);
if(! found) {
found = psearch(base, half, width, element, compare);
}
}
else {
found = psearch(base, half, width, element, compare);
if(! found) {
found = psearch(base + half * width, size - half, width, element, compare);
}
}
--depth;
return found;
}
}


In article <e0uug0$2mq8$1@digitaldaemon.com>, kwilson@nowhere.com says...
>
>Hi there Sandresh,
>
>The "bug" within your program is simply that your base case (or terminating
>case) for the recursion is (size == 1) and you increment/decrement a variable
>(ie. depth)  outside of this base case "if" statement.
>
>This leads to a recursively maximized value of 9 (or log base 2 of 512), as pointed out. Then, since no variable is updated in the base case "if" statement, you don't actually add to the comparison total accordingly. Thus you mistakenly get a maximal depth of 9 for each search. This is not actually the depth to find the element you are searching for, but rather the depth to the "size == 1" base case.
>
>Now you will say that "this is not true" since the base case is reached many times during the search process before the search is terminated....not just once after depth equals nine. The static "depth" variable is not being updated properly, though.
>
>Once you try the changes suggested below you will see that the "Average depth" of the 2048 searches per distribution will be approximately 256. Single "depths" or comparisons will range from 1 to 512 as would be expected. The average per distribution is approximately N/2 as several others in this thread have pointed out, as the correct average case search for an element in a randomized array of size N. If you want to argue that the "depth" of the actual comparisons never exceeds 9, then that is obviously true because log base 2 of 512 is 9.....but that doesn't mean that the number of comparisons is 9. Sorry ;)
>
>Hope the code change and viewing the results will clear things up if anyone can't follow my explanation above (my fault for not explaining things well enough, I'm afraid).
>
>
>Try these simple changes:
>
>in psearch.c
>
>on line 9:
>add ------>  ++comparisons;
>
>
>on line 18:
>comment out --------> //comparisons = ++depth;
>
>
>on line 31:
>comment out --------> //--depth;
>
>Thanks,
>K.Wilson
>
>
>
>
>


April 05, 2006
"sundaresh" <sundaresh_member@pathlink.com> wrote in message news:e0vqej$12o5$1@digitaldaemon.com...
>I believe that I have replied to this already. My first suspisions was with
> comparisons, but there is nothing wrong with it. Comparisons is updated
> prior
> to call to the base case psearch, so why should it be updated again during
> the
> base case, since in the base case, no recursive call to psearch is being
> made.

That's just it. You don't count how many comparisons it takes to find a
match, just
how deep it has to go. And with an original size of 512, that always leads
to
log2(512).

Since you only count the depth of the search, not the amount of comparisons,
you'll
find it's almost always 9 - it may be shorter, depending on which item was
randomly picked to be searched for.

> Doing so might actually result in 256 or so as you claim, but that would
> be
> introducing a bug and not eliminating it.

Frankly, you are so wrong in this. Now you only count the depth, not how many items it searches. And it's the items that matter, not the depth. It's the amount of comparisons that tells what time the algorithm has - and in the case of this, it's most definitely O(N).

> Sorry, these arguments have only served to convince me even further of the correctness and workability of this algorithm.

*sigh* A fool, then.

Many have shown you where you've gone wrong, but you seem to be "I'm right and I know it, you'd just introduce a bug in it"... If you refuse to listen, go ahead.

Mathematically, it is possible to prove that no search of unordered data can be done in time less than O(N) - you always have to in the worst case test *all* occurences.

I'm attaching an altered version of psearch with this message, so you can
test it yourself. It's a zip archive with MS-DOS line endings (since that's
the system I happen to be on). I strongly recommend after unzipping and
compiling that you invoke it by directing standard output to a file - it's
going to display so much data on the amount of comparisons it took to find
something, as well as how it found it. Then you'll see it's not
O(log N) but O(N).



April 05, 2006
sundaresh wrote:
> This (attached) search function of an unordered array of elements appears to have a logarithmic search time, like binary search of an ordered array.
> 
> 

ok, I just wrote a D version of the algorithm, and used the CPU counter to benchmark pefromance.

before I post the code, here's how you can use it, assuming you compile it into dps.exe

 >dps > log.txt

(give it a few seconds to run all the test cases, shouldn't take too long).

you can examine the log file, and you will see clearly that the increase in time is linear related to the increase in array size.

or, you can use gnuplot to see it more clearly.

 >gnuplot
 >>set terminal png
 >>set output "graph.png"
 >>plot "log.txt" with lines

I've attached my results.

and here's the D code, according to what I understood of the algorithm. if something is wrong in my understanding, please point me to it.

---------
import std.stdio;
import std.random;

template psearch(T)
{
     bool psearch( T[] arr, T element )
     {
         if( arr.length == 1 )
         {
             if( arr[0] == element ) return true;
             else return false;
         }

         int choice = rand() % 2;

         T[] first, second;
         int middle = arr.length / 2;

         if( choice == 1 )
         {
             first = arr[0..middle];
             second = arr[middle..$];
         }
         else
         {
             first = arr[middle..$];
             second = arr[0..middle];
         }

         if( !psearch( first, element ) )
         {
             return psearch( second, element );
         }
         return false;
     }
}

alias psearch!(int) probSearch;

void main()
{
     CPUTimer timer = new CPUTimer();
     for( int i = 1000; i < 100000; i+=500 )
     {
         int[] arr = randomArray( i ); //random int array of size i
         timer.timein();  //start cpu timer
         probSearch( arr, 27 ); //look for 27
         timer.timeout(); //stop cpu timer
	//print: length   time
         writefln( "%d %d", i, timer.getSmallTime() );
     }
}

class CPUTimer
{
     long time;

     void timein()
     {
         time = rdtsc();
     }
     void timeout()
     {
         time = rdtsc() - time;
     }

     //a small version of the time ..
     int getSmallTime()
     {
         return (time/1000L);
     }
}

//x86 only
long rdtsc()
{
     asm
     {
         naked;
         rdtsc;
         ret;
     }
}

int[] randomArray( int size )
{
     int[] arr = new int[size];
     for( int i = 0; i < size; i++ )
     {
         arr[i] = rand() % (size/2); //random data, mod by size/2 to
give 27 a reasonably chance of appearing??!!
     }
     return arr;
}




April 05, 2006
My apologies, making these changes does result in a linear worst case behaviour, so it is no better than sequential search.

In article <e0uug0$2mq8$1@digitaldaemon.com>, kwilson@nowhere.com says...
>
>Hi there Sandresh,
>
>The "bug" within your program is simply that your base case (or terminating
>case) for the recursion is (size == 1) and you increment/decrement a variable
>(ie. depth)  outside of this base case "if" statement.
>
>This leads to a recursively maximized value of 9 (or log base 2 of 512), as pointed out. Then, since no variable is updated in the base case "if" statement, you don't actually add to the comparison total accordingly. Thus you mistakenly get a maximal depth of 9 for each search. This is not actually the depth to find the element you are searching for, but rather the depth to the "size == 1" base case.
>
>Now you will say that "this is not true" since the base case is reached many times during the search process before the search is terminated....not just once after depth equals nine. The static "depth" variable is not being updated properly, though.
>
>Once you try the changes suggested below you will see that the "Average depth" of the 2048 searches per distribution will be approximately 256. Single "depths" or comparisons will range from 1 to 512 as would be expected. The average per distribution is approximately N/2 as several others in this thread have pointed out, as the correct average case search for an element in a randomized array of size N. If you want to argue that the "depth" of the actual comparisons never exceeds 9, then that is obviously true because log base 2 of 512 is 9.....but that doesn't mean that the number of comparisons is 9. Sorry ;)
>
>Hope the code change and viewing the results will clear things up if anyone can't follow my explanation above (my fault for not explaining things well enough, I'm afraid).
>
>
>Try these simple changes:
>
>in psearch.c
>
>on line 9:
>add ------>  ++comparisons;
>
>
>on line 18:
>comment out --------> //comparisons = ++depth;
>
>
>on line 31:
>comment out --------> //--depth;
>
>Thanks,
>K.Wilson
>
>
>
>
>


April 05, 2006
sundaresh wrote:
> #include<stdlib.h>
> 
> int comparisons;
> 
> void *psearch(void *base, int size, int width, void *element, int
> (*compare)(void *, void *)) {
> static int depth = 0;
> 
> if(size <= 1) {

Replace this line
> ++comparisons;
with
  if ( size ) ++comparisons;

> return ! size || compare(base, element) ? NULL : base;
> }
> else {
> int half, right;
> void *found;
> 
> half = size / 2;
> right = abs(rand()) % 2;

and remove this line
> comparisons = ++depth;

> if(right) {
> found = psearch(base + half * width, size - half, width, element, compare);
> if(! found) {
> found = psearch(base, half, width, element, compare);
> }
> }
> else {
> found = psearch(base, half, width, element, compare);
> if(! found) {
> found = psearch(base + half * width, size - half, width, element, compare);
> }
> }
> --depth;
> return found;
> }
> }

then you will have *REAL* count of compare() function call
even better we to supply test compare function that would increment comparisons before returning comparison result.

If you don't apply mentioned changes, your code will count something completely different from "comparison number"

Nic Tiger
April 05, 2006
this is your latest code, with proper comparisons counting and some simple test;
just compile it and run.
last printed line will state:
  average_cmp=247
which is essentially ~(N/2) for 512 item test

as a note, you should write compare(*(void**)base, element), otherwise it is difficult to use any of existing comparison functions (strcmp or smth.)

Nic Tiger.

-------- test suite for psearch --------
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

int comparisons = 0;

int compare (void *p1, void *p2) {
  comparisons ++;
  return strcmp ( *(const char**)p1, (const char*)p2 );
}

void *psearch(void *base, int size, int width, void *element, int (*compare)(void *, void *)) {
  static int depth = 0;

  if(size <= 1) {
    return ! size || compare(base, element) ? NULL : base;
  } else {
    int half, right;
    void *found;

    half = size / 2;
    right = abs(rand()) % 2;
    //comparisons = ++depth;
    if(right) {
      found = psearch((char*)base + half * width, size - half, width, element, compare);
      if(! found) {
        found = psearch((char*)base, half, width, element, compare);
      }
    }
    else {
      found = psearch((char*)base, half, width, element, compare);
      if(! found) {
        found = psearch((char*)base + half * width, size - half, width, element, compare);
      }
    }
    --depth;
    return found;
  }
}

main () {
  char *list[512];
  int i;
  for ( i = 0; i < 512; i ++ ) {
    char buf[32];
    sprintf ( buf, "str%d", i );
    list[i] = strdup(buf);
  }

  int total_cmp = 0;
  for ( i = 0; i < 512; i ++ ) {
    comparisons = 0;
    void *res = psearch ( list, 512, 4, list[i], compare );
    if ( res )
      printf ( "%d comparisons to find %s (found as %d index of list)\n", comparisons, list[i], (char**)res - list );
    else
      printf ( "%s not found, %d comparisons\n", list[i], comparisons );
    total_cmp += comparisons;
  }
  printf ( "average_cmp=%d", total_cmp/512 );
}
April 05, 2006
Hasan Aljudy wrote:
> 
> and here's the D code, according to what I understood of the algorithm.
> if something is wrong in my understanding, please point me to it.
> 
> ---------
> import std.stdio;
> import std.random;
> 
> template psearch(T)
> {
>     bool psearch( T[] arr, T element )
>     {
>         if( arr.length == 1 )
>         {
>             if( arr[0] == element ) return true;
>             else return false;
>         }
> 
>         int choice = rand() % 2;
> 
>         T[] first, second;
>         int middle = arr.length / 2;
> 
>         if( choice == 1 )
>         {
>             first = arr[0..middle];
>             second = arr[middle..$];
>         }
>         else
>         {
>             first = arr[middle..$];
>             second = arr[0..middle];
>         }
> 
>         if( !psearch( first, element ) )
>         {
>             return psearch( second, element );
>         }
>         return false;
>     }
> }

Thanks for posting this.  I hadn't taken the time to look at the algorithm yet, and this got me interested :-)  The algorithm is obviously O(N) for single-element searches, and equivalent in theoretical complexity to the naive approach for substring searches.  In practical terms it's even worse because of the N recursive function calls, but it could always be rewritten as a loop with some bookkeeping.

It's obvious that the algorithm favors certain pattern positions in the search string, ie. (1/4)N, (3/4)N, and so on, while traditional searches typically perform best if the pattern is near the beginning of the string.  Though this raises an interesting point--the probabilistic search above is not a "find first" or "find last" algorithm.  Probably not a big deal for unordered sets, but it might be otherwise.


Sean
April 05, 2006
Sean Kelly wrote:
> Hasan Aljudy wrote:
> 
>>
>> and here's the D code, according to what I understood of the algorithm.
>> if something is wrong in my understanding, please point me to it.
>>
>> ---------
>> import std.stdio;
>> import std.random;
>>
>> template psearch(T)
>> {
>>     bool psearch( T[] arr, T element )
>>     {
>>         if( arr.length == 1 )
>>         {
>>             if( arr[0] == element ) return true;
>>             else return false;
>>         }
>>
>>         int choice = rand() % 2;
>>
>>         T[] first, second;
>>         int middle = arr.length / 2;
>>
>>         if( choice == 1 )
>>         {
>>             first = arr[0..middle];
>>             second = arr[middle..$];
>>         }
>>         else
>>         {
>>             first = arr[middle..$];
>>             second = arr[0..middle];
>>         }
>>
>>         if( !psearch( first, element ) )
>>         {
>>             return psearch( second, element );
>>         }
>>         return false;
>>     }
>> }
> 
> 
> Thanks for posting this.  I hadn't taken the time to look at the algorithm yet, and this got me interested :-)  The algorithm is obviously O(N) for single-element searches, and equivalent in theoretical complexity to the naive approach for substring searches.  In practical terms it's even worse because of the N recursive function calls, but it could always be rewritten as a loop with some bookkeeping.
> 
> It's obvious that the algorithm favors certain pattern positions in the search string, ie. (1/4)N, (3/4)N, and so on, while traditional searches typically perform best if the pattern is near the beginning of the string.  Though this raises an interesting point--the probabilistic search above is not a "find first" or "find last" algorithm.  Probably not a big deal for unordered sets, but it might be otherwise.
> 
> 
> Sean


I think I have a mistake in the last bit of it ..

I wrote:

        if( !psearch( first, element ) )
        {
            return psearch( second, element );
        }
        return false;

when it should be:

        if( psearch( first, element ) )
        {
            return true;
        }
        else
        {
             return psearch( second, element );
        }
1 2 3
Next ›   Last »