Probabilistic search - psearch.tar.gz (page 2) - D Programming Language Discussion Forum

Settings

Help

Index » General » Probabilistic search - psearch.tar.gz (page 2)

April 04, 2006

Re: Probabilistic search - psearch.tar.gz

Posted by Sean Kelly
in reply to John Demme

Permalink

Sean Kelly

Posted in reply to John Demme

Permalink

John Demme wrote:
> Andrew Fedoniouk wrote:
> 
>> "Sean Kelly" <sean@f4.ca> wrote in message
>> news:e0u9gl$1ppd$1@digitaldaemon.com...
>>> Oskar Linde wrote:
>>>> I'm not sure if this is intended as a joke or not, but I am sorry to
>>>> tell that your code is not logarithmic. It is easy to prove that there
>>>> are no better algorithm than O(n) for searching an unordered array on a
>>>> deterministic single thread.
>>> Perhaps, but some algorithms have a better average case than others
>>> assuming certain (typical) conditions.  If there were simply no point,
>>> why all the bother back in the 60s?
>>>
>>> By the way, as I just woke up, I haven't had a chance to look at the
>>> algorithm yet.
>>>
>> Take your time, no rush, it is impossible anyway.
>> To find something in unordered set requires N comparison operations at
>> least. Always.
>>
>> As a morning tea reading I suggest to read US Patent 5,533,051 on
>> compression of random data.
>> It claims that it will compress two arbitrary bits into one bit.
>>
>> http://gailly.net/05533051.html
>>
>>
>> Andrew Fedoniouk.
>> http://terrainformatica.com
> 
> Come now... I'm sure he's talking about the average case, not the worst
> case.

I don't think it matters.  No matter the algorithm, you must examine every element of the search string.  It's just that typical searches are actually worse than O(N), as they often examine each element more than once.


Sean

April 04, 2006

Re: Probabilistic search - psearch.tar.gz

Posted by Rioshin an'Harthen
in reply to Sean Kelly

Permalink

Rioshin an'Harthen

Posted in reply to Sean Kelly

Permalink

"Sean Kelly" <sean@f4.ca> wrote in message news:e0ud2e$1tvn$1@digitaldaemon.com...
> John Demme wrote:
>> Andrew Fedoniouk wrote:
>>
>>> "Sean Kelly" <sean@f4.ca> wrote in message news:e0u9gl$1ppd$1@digitaldaemon.com...
>>>> Oskar Linde wrote:
>>>>> I'm not sure if this is intended as a joke or not, but I am sorry to
>>>>> tell that your code is not logarithmic. It is easy to prove that there
>>>>> are no better algorithm than O(n) for searching an unordered array on
>>>>> a
>>>>> deterministic single thread.
>>>> Perhaps, but some algorithms have a better average case than others assuming certain (typical) conditions.  If there were simply no point, why all the bother back in the 60s?
>>>>
>>>> By the way, as I just woke up, I haven't had a chance to look at the algorithm yet.
>>>>
>>> Take your time, no rush, it is impossible anyway.
>>> To find something in unordered set requires N comparison operations at
>>> least. Always.
>>>
>>> As a morning tea reading I suggest to read US Patent 5,533,051 on
>>> compression of random data.
>>> It claims that it will compress two arbitrary bits into one bit.
>>>
>>> http://gailly.net/05533051.html
>>>
>>>
>>> Andrew Fedoniouk.
>>> http://terrainformatica.com
>>
>> Come now... I'm sure he's talking about the average case, not the worst case.
>
> I don't think it matters.  No matter the algorithm, you must examine every element of the search string.  It's just that typical searches are actually worse than O(N), as they often examine each element more than once.

You have to examine every element of the search string? How come I haven't heard that? :)

Try (approximately - my C(++) experience shows too much):

char *needleInHaystack(char *haystack, char needle)
{
    if( !haystack )
        return null;

    for( char *loc = haystack; *loc != '\0'; loc++ )
        if( *loc == needle )
            return loc;

    return null;
}

This definitely does not examine every element of the string, except in the worst-case scenario - just enough to find the first occurence, no matter what the string is or what character the needle is.

April 04, 2006

Re: Probabilistic search - psearch.tar.gz

Posted by sundaresh
in reply to Oskar Linde

Permalink

sundaresh

Posted in reply to Oskar Linde

Permalink

FYI, there is absolutely nothing wrong with that statement. It is
certainly not a bug, if there is one, and I am not ruling out that
possibility.comparisons is never decremented or decreased, so how can it be
reduced to the value of 9, if it increases above 9, as you claim/assume
that it supposedly does. The value of comparisons is set prior to any/all
recursive calls to psearch. If on the other hand, you claim that it is
possibly the characteristic of the random number generator, since it makes
the decision of which half to search for first, but then again, that decision
is binary and I think that any random number generator should generate as
many even numbers as odd numbers, at least for a reasonably sized sample.

I am not one to contest theory, and if this algorithm is correct, then
there probably is a very valid and sound theoretical explanation as to
why it works, but I am not one to reject evidence either, in the face of it,
especially of such an astounding nature. It is well acknowledged that
empirical testing of methods lead to a convincing degree of assurance in them
and is a reasonable way to assess their reliability.

My apologies if I have irked you.

To have an open mind, while looking at it is all I asked for.

The reason I posted it here was because I was not sure myself.



In article <e0u7jg$1n3i$1@digitaldaemon.com>, Oskar Linde says...
>
>sundaresh skrev:
>> I was told to post here by walter. I did think of posting it in the C newgroup
>> but felt otherwise, since D is apparently upward compatible with C, or the
>> specs in the site told me, so I presumed that every C program was a D program.
>> As far as proof goes, I have included a test program. The results clearly
>> indicate that for an array of size 512, of random numbers, with 2048 trials
>> of different distibutions, and for each trial upto 2048 random selections
>> of elements in the array to lookup, the number of comparisons is a constant
>> at 9 or lg(512). Please run the test program. If it is a bug and you can point
>> it out, I would be indebted. But that was why I posted it here.
>> If it is'nt a bug, I was hoping it would be my contribution.
>> 
>> I was hoping for an enlightened response.
>> 
>> You can lie all you want, but that does'nt change the truth.
>> The sun does rise in the east. There are only 4 seasons in a year, and
>> the value of pie is a fixed.
>> 
>> I would encourage users to please try it out and respond. The proof of the pudding is in the eating.
>> 
>> Not many experts here walter.I am dissapointed.
>
>I'm not sure if this is intended as a joke or not, but I am sorry to tell that your code is not logarithmic. It is easy to prove that there are no better algorithm than O(n) for searching an unordered array on a deterministic single thread. I guess the error in your code is the computation of the number of comparisons:
>
>comparisons = ++depth;
>
>Cheers,
>
>/Oskar

April 04, 2006

Re: Probabilistic search - psearch.tar.gz

Posted by Andrew Fedoniouk
in reply to Rioshin an'Harthen

Permalink

Andrew Fedoniouk

Posted in reply to Rioshin an'Harthen

Permalink

"Rioshin an'Harthen" <rharth75@hotmail.com> wrote in message news:e0ugi0$229d$1@digitaldaemon.com...
> "Sean Kelly" <sean@f4.ca> wrote in message news:e0ud2e$1tvn$1@digitaldaemon.com...
>> John Demme wrote:
>>> Andrew Fedoniouk wrote:
>>>
>>>> "Sean Kelly" <sean@f4.ca> wrote in message news:e0u9gl$1ppd$1@digitaldaemon.com...
>>>>> Oskar Linde wrote:
>>>>>> I'm not sure if this is intended as a joke or not, but I am sorry to
>>>>>> tell that your code is not logarithmic. It is easy to prove that
>>>>>> there
>>>>>> are no better algorithm than O(n) for searching an unordered array on
>>>>>> a
>>>>>> deterministic single thread.
>>>>> Perhaps, but some algorithms have a better average case than others assuming certain (typical) conditions.  If there were simply no point, why all the bother back in the 60s?
>>>>>
>>>>> By the way, as I just woke up, I haven't had a chance to look at the algorithm yet.
>>>>>
>>>> Take your time, no rush, it is impossible anyway.
>>>> To find something in unordered set requires N comparison operations at
>>>> least. Always.
>>>>
>>>> As a morning tea reading I suggest to read US Patent 5,533,051 on
>>>> compression of random data.
>>>> It claims that it will compress two arbitrary bits into one bit.
>>>>
>>>> http://gailly.net/05533051.html
>>>>
>>>>
>>>> Andrew Fedoniouk.
>>>> http://terrainformatica.com
>>>
>>> Come now... I'm sure he's talking about the average case, not the worst case.
>>
>> I don't think it matters.  No matter the algorithm, you must examine every element of the search string.  It's just that typical searches are actually worse than O(N), as they often examine each element more than once.
>
> You have to examine every element of the search string? How come I haven't heard that? :)
>
> Try (approximately - my C(++) experience shows too much):
>
> char *needleInHaystack(char *haystack, char needle)
> {
>    if( !haystack )
>        return null;
>
>    for( char *loc = haystack; *loc != '\0'; loc++ )
>        if( *loc == needle )
>            return loc;
>
>    return null;
> }
>
> This definitely does not examine every element of the string, except in the worst-case scenario - just enough to find the first occurence, no matter what the string is or what character the needle is.
>

Lets assume that here unordered set (string) has flat distribution of element(chars) occurences.

You can cry, pray, read C++ books, whatever you like, but it will take you
N/2 comparisons average
and in the worst case N comparisons to find first occurence of something
there.

The only thing which affect N/2 average is the quality of your random
generator (flatness of distribution).
And this quality is usual reason of "wonders" in such discoveries.

Andrew Fedoniouk.
http://terrainformatica.com

April 04, 2006

Re: Probabilistic search - psearch.tar.gz

Posted by BCS
in reply to sundaresh

Permalink

BCS

Posted in reply to sundaresh

Permalink

sundaresh wrote:
> FYI, there is absolutely nothing wrong with that statement. It is
> certainly not a bug, if there is one, and I am not ruling out that
> possibility.comparisons is never decremented or decreased, so how can it be
> reduced to the value of 9, if it increases above 9, as you claim/assume
> that it supposedly does. The value of comparisons is set prior to any/all
> recursive calls to psearch. If on the other hand, you claim that it is possibly the characteristic of the random number generator, since it makes
> the decision of which half to search for first, but then again, that decision
> is binary and I think that any random number generator should generate as
> many even numbers as odd numbers, at least for a reasonably sized sample.
> 
> I am not one to contest theory, and if this algorithm is correct, then
> there probably is a very valid and sound theoretical explanation as to
> why it works, but I am not one to reject evidence either, in the face of it,
> especially of such an astounding nature. It is well acknowledged that
> empirical testing of methods lead to a convincing degree of assurance in them
> and is a reasonable way to assess their reliability.
> 
> My apologies if I have irked you. 
> 
> To have an open mind, while looking at it is all I asked for.
> 
> The reason I posted it here was because I was not sure myself.
> 
I haven't looked at the code because I don't have a .tar.gz loaded so...

How about try a different dataset? Grab chunks out of "Alice in wonderland" or something like that. I have heard of cases where it was possible to get more performance than you should be able to by leveraging predictable non randomness in a data set.
	An example: in English the letter "Q" is almost always followed by "U" their for if you are searching for "U" and you find a "Q" check the next char.

April 04, 2006

Re: Probabilistic search - psearch.tar.gz

Posted by BCS
in reply to Andrew Fedoniouk

Permalink

BCS

Posted in reply to Andrew Fedoniouk

Permalink

Andrew Fedoniouk wrote:
> 
> Lets assume that here unordered set (string) has flat distribution of element(chars) occurences.
> 
> You can cry, pray, read C++ books, whatever you like, but it will take you N/2 comparisons average
> and in the worst case N comparisons to find first occurence of something there.
> 
> The only thing which affect N/2 average is the quality of your random generator (flatness of distribution).
> And this quality is usual reason of "wonders" in such discoveries.
> 
> Andrew Fedoniouk.
> http://terrainformatica.com
> 

That is true if you are using N for the size of your alphabet

consider:

B is a string of random bits with even distribution of 1 or 0.

Find the first 1 bit.

mean search length is something like 1-2

April 04, 2006

Re: Probabilistic search - psearch.tar.gz

Posted by Sean Kelly
in reply to Rioshin an'Harthen

Permalink

Sean Kelly

Posted in reply to Rioshin an'Harthen

Permalink

Rioshin an'Harthen wrote:
> 
> You have to examine every element of the search string? How come I haven't heard that? :)

Potentially, yes.  Unless the algorithm has notably unique best-case behavior (such as quicksort), there's really no point in considering anything else.


Sean

April 04, 2006

Re: Probabilistic search - psearch.tar.gz

Posted by Georg Wrede
in reply to sundaresh

Permalink

Georg Wrede

Posted in reply to sundaresh

Permalink

Point 1: we do have to allow for the different expression of content and intent that arises from writers, who are from a fundamentally different cultural background. (Possibly this is made even harder if such a person uses some automatic translation between languages.)

So, I vote for understanding, meditation and deep thought, before dismissing the ideas presented.

Point 2: there are several "test cases" for algorithms. One is "worst case", another is "truly random data", another is "best case", yet another is "average case", and finally we have "plausible data". (There are others, I know.)

Since _no_ interesting data is truely random (because by definition, such data does not contain Information!), we can dismiss the "truly random data" case.

Now, depending on the application domain, it may or may not be significant what the "worst case" performance is. Similarly, "best case" usually doesn't count, but it's weight has to be evaluated on a case by case basis.

At this point we are between the "average case" and "plausible data". Here we actually can find surprising results, if we look hard enough. For all practical purposes, the algorithm which performs best _given_ "actual like" data, would be the one to choose. (Assuming we are even able to present such "actual like" data in the first place.)

---

In summary, instead of quibbling over theoretical merits of this particular method, we might do better by trying to evaluate which kind of data this algorithm may perform best with, and thereafter compare it to those generally suggested for that kind of data.

sundaresh wrote:
> FYI, there is absolutely nothing wrong with that statement. It is certainly not a bug, if there is one, and I am not ruling out that possibility.comparisons is never decremented or decreased, so how can
> it be reduced to the value of 9, if it increases above 9, as you
> claim/assume that it supposedly does. The value of comparisons is set
> prior to any/all recursive calls to psearch. If on the other hand,
> you claim that it is possibly the characteristic of the random number
> generator, since it makes the decision of which half to search for
> first, but then again, that decision is binary and I think that any
> random number generator should generate as many even numbers as odd
> numbers, at least for a reasonably sized sample.
> 
> I am not one to contest theory, and if this algorithm is correct,
> then there probably is a very valid and sound theoretical explanation
> as to why it works, but I am not one to reject evidence either, in
> the face of it, especially of such an astounding nature. It is well
> acknowledged that empirical testing of methods lead to a convincing
> degree of assurance in them and is a reasonable way to assess their
> reliability.
> 
> My apologies if I have irked you.
> 
> To have an open mind, while looking at it is all I asked for.
> 
> The reason I posted it here was because I was not sure myself.
> 
> 
> 
> In article <e0u7jg$1n3i$1@digitaldaemon.com>, Oskar Linde says...
> 
>> sundaresh skrev:
>> 
>>> I was told to post here by walter. I did think of posting it in
>>> the C newgroup but felt otherwise, since D is apparently upward
>>> compatible with C, or the specs in the site told me, so I
>>> presumed that every C program was a D program. As far as proof
>>> goes, I have included a test program. The results clearly indicate that for an array of size 512, of random numbers, with
>>> 2048 trials of different distibutions, and for each trial upto
>>> 2048 random selections of elements in the array to lookup, the
>>> number of comparisons is a constant at 9 or lg(512). Please run
>>> the test program. If it is a bug and you can point it out, I
>>> would be indebted. But that was why I posted it here. If it is'nt
>>> a bug, I was hoping it would be my contribution.
>>> 
>>> I was hoping for an enlightened response.
>>> 
>>> You can lie all you want, but that does'nt change the truth. The
>>> sun does rise in the east. There are only 4 seasons in a year,
>>> and the value of pie is a fixed.
>>> 
>>> I would encourage users to please try it out and respond. The
>>> proof of the pudding is in the eating.
>>> 
>>> Not many experts here walter.I am dissapointed.
>> 
>> I'm not sure if this is intended as a joke or not, but I am sorry
>> to tell that your code is not logarithmic. It is easy to prove that
>> there are no better algorithm than O(n) for searching an unordered
>> array on a deterministic single thread. I guess the error in your
>> code is the computation of the number of comparisons:
>> 
>> comparisons = ++depth;
>> 
>> Cheers,
>> 
>> /Oskar
> 
> 
>

April 04, 2006

Re: Probabilistic search - psearch.tar.gz

Posted by kwilson
in reply to sundaresh

Permalink

kwilson

Posted in reply to sundaresh

Permalink

Hi there Sandresh,

The "bug" within your program is simply that your base case (or terminating
case) for the recursion is (size == 1) and you increment/decrement a variable
(ie. depth) outside of this base case "if" statement.

This leads to a recursively maximized value of 9 (or log base 2 of 512), as pointed out. Then, since no variable is updated in the base case "if" statement, you don't actually add to the comparison total accordingly. Thus you mistakenly get a maximal depth of 9 for each search. This is not actually the depth to find the element you are searching for, but rather the depth to the "size == 1" base case.

Now you will say that "this is not true" since the base case is reached many times during the search process before the search is terminated....not just once after depth equals nine. The static "depth" variable is not being updated properly, though.

Once you try the changes suggested below you will see that the "Average depth" of the 2048 searches per distribution will be approximately 256. Single "depths" or comparisons will range from 1 to 512 as would be expected. The average per distribution is approximately N/2 as several others in this thread have pointed out, as the correct average case search for an element in a randomized array of size N. If you want to argue that the "depth" of the actual comparisons never exceeds 9, then that is obviously true because log base 2 of 512 is 9.....but that doesn't mean that the number of comparisons is 9. Sorry ;)

Hope the code change and viewing the results will clear things up if anyone can't follow my explanation above (my fault for not explaining things well enough, I'm afraid).

Try these simple changes:

in psearch.c

on line 9:
add ------> ++comparisons;

on line 18:
comment out --------> //comparisons = ++depth;

on line 31:
comment out --------> //--depth;

Thanks,
K.Wilson

April 05, 2006

Re: Probabilistic search - psearch.tar.gz

Posted by Hasan Aljudy
in reply to sundaresh

Permalink

Hasan Aljudy

Posted in reply to sundaresh

Permalink

sundaresh wrote:
> This (attached) search function of an unordered array of elements appears
> to have a logarithmic search time, like binary search of an ordered array.
> 
> 

It doesn't compile, on VC++ 2005 I get:

psearch.c(19) : error C2036: 'void *' : unknown size
psearch.c(27) : error C2036: 'void *' : unknown size

I fixed it by casting base to int
(int)base

anyway, I'd like (please) to see a human readable pseudo code that doesn't include mangling with pointers, just array indecies.

Top | Forum index | About this forum

Forums