```Sean Kelly wrote:
> I'm not terribly fond of the added verbosity however, or that this seems
> like I couldn't use the property form:
>
>     assumeSorted("abcd").find('c')
>
> Truth be told, my initial inclination would be to repackage the binary
> search as a one-liner with a different name, which kind of sabotages the
> whole idea.  But I'll try to resist this urge.

I understand. This is rather new, but I found it irresistibly cool to
unify find() routines under one name and specify structure in the
arguments' types. Usage is very easy, there's little to remember, and
every piece of structure is where it should be. Consider:

int[] a = [ 1, 2, 3, 4 ];
int[] b = [ 2. 3 ];

These algorithms each performs search a different way because each is
informed in different ways about the structure of their arguments:

find(a, b);
find(assumeSorted(a), b);
find(a, assumeSorted(b));
find(assumeSorted(a), assumeSorted(b));
find(a, boyerMooreFinder(b));

There's three names to remember that compose modularly. The
run-of-the-mill approach is:

find(a, b);
binaryFind(a, b);
findRhsSorted(a, b);
binaryFindRhsSorted(a, b);
boyerMooreFind(a, b);

To add insult to injury, boyerMooreFind is not enough because it hides
the structure created around b. So there's also need for one extra type
e.g. BoyerMooreFinder!(int[]) for cases when there are multiple searches
of the same thing. It's just onerous.

Andrei

P.S. By the way, this is the running example used in Chapter 4 of TDPL.
```
```bearophile wrote:
> Andrei Alexandrescu:
>> The approach I took with the new phobos is:
>>
>> int[] haystack; int[] needle; ... auto pos1 = find(haystack,
>> needle); // linear sort(haystack); auto pos2 =
>> find(assumeSorted(haystack), needle);
>
> In my dlibs I do in a simpler and shorter way:
>
> auto pos1 = haystack.index(needle); haystack.sort(); auto pos2 =
> haystack.bisect(needle);
>
> Here there's no need to give the same name to two very different
> functions.

They do the exact same thing. Unifying them under the same name is good
abstraction.

> If you really like to use a single function mame, with named
> arguments you may also do (bisect is false by default):
>
> auto pos1 = haystack.index(needle); haystack.sort();
> auto pos2 = haystack.index(needle, bisect=true);

I prefer encoding structural information in types.

Andrei
```
```Andrei Alexandrescu:
> int[] a = [ 1, 2, 3, 4 ];
> int[] b = [ 2. 3 ];

I guess you meant:

int[] b = [ 2, 3 ];

> find(a, b);
> find(assumeSorted(a), b);
> find(a, assumeSorted(b));
> find(assumeSorted(a), assumeSorted(b));
> find(a, boyerMooreFinder(b));

Are you talking about finding the position of a subarray into a bigger array? Then the two useful cases are:
a.index(b);
a.indexBoyerMoore(b);
The other cases aren't common enough, I think.

Bye,
bearophile
```
```bearophile wrote:
> Andrei Alexandrescu:
>> int[] a = [ 1, 2, 3, 4 ];
>> int[] b = [ 2. 3 ];
>
> I guess you meant:
>
> int[] b = [ 2, 3 ];
>
>> find(a, b);
>> find(assumeSorted(a), b);
>> find(a, assumeSorted(b));
>> find(assumeSorted(a), assumeSorted(b));
>> find(a, boyerMooreFinder(b));
>
> Are you talking about finding the position of a subarray into a bigger array? Then the two useful cases are:
> a.index(b);
> a.indexBoyerMoore(b);
> The other cases aren't common enough, I think.
>
> Bye,
> bearophile

Binary search is rather common.

As an aside, your use of "index" suggests you return integrals out of
the function. IMHO that's strongly unrecommended.

Andrei
```
```Andrei Alexandrescu:
> Binary search is rather common.

Oh, yes, sorry, I meant among the ones you have listed there...

> As an aside, your use of "index" suggests you return integrals out of
> the function. IMHO that's strongly unrecommended.

I don't want to use too much of your time (that it may be better spent with your new child), but I don't understand what you mean.
That index() function is meant the index position of the item or sub-sequence into the bigger array (or iterable), and it returns -1 if not found. This is an usual design.
Some people think that such controls for -1 value aren't always done, so to avoid that and some bugs, it's better to raise something like IndexException when the needle isn't found.

Bye,
bearophile
```
```bearophile wrote:
> Andrei Alexandrescu:
>> Binary search is rather common.
>
> Oh, yes, sorry, I meant among the ones you have listed there...

Of five, three are frequent (linear, binary, Boyer-Moore), one is a form
of set intersection (find sorted in sorted), and the odd one is:

find(a, assumeSorted(b));

This is rare but has excellent best-case complexity (is it O(a.length /
b.length)?) and is easy to add for completeness.

>> As an aside, your use of "index" suggests you return integrals out
>> of the function. IMHO that's strongly unrecommended.
>
> I don't want to use too much of your time (that it may be better
> spent with your new child), but I don't understand what you mean.
> That index() function is meant the index position of the item or
> sub-sequence into the bigger array (or iterable), and it returns -1

This is an extremely sloppy design. That it is usual doesn't make things
any better!

> Some people think that such controls for -1 value aren't always done,
> so to avoid that and some bugs, it's better to raise something like
> IndexException when the needle isn't found.

Yah, this is the subject of a long rant but in short: returning int from
find means that essentially that find is unusable with anything except
random access structures. This in turn means you'd have to have
different means, APIs, and user code to deal with e.g. lists, in spite
of the fact that linear search is the same boring thing for all: look at
the current thing, yes/no, move on to the next thing. IMHO ever since
the STL has seen the light of day there is no excuse, not even sheer
ignorance, to ever traffic in integers as a mean to access elements in
containers, in any language that has even the most modest parameterized
types capability.

Returning int from find is an insult. To add injury to it, have find
also return an int for a list.

"Is this item in this list?"
"Yeppers. It's the 538th element. Took me a hike to find it."
"Well I wanted to do something with it."
"Then go get it. I'm telling you, it'll take exactly 538 steps."

Andrei
```
```Andrei Alexandrescu:
> Of five, three are frequent (linear, binary, Boyer-Moore), one is a form
> of set intersection (find sorted in sorted),

Yeah. Sorted-array-based sets are probably common enough in C++ code (but I generally use something equivalent to hash-sets for this purpose, I have even mostly implemented such class for D1 in the dlibs, with a nice API).
(I see sorted-array-based sets as an optimization to use in special cases, while hash-sets are for the general case when you want to manage sets (in D hashing needs a comparison too because of the chains are external and tree-based, but some other languages for hash-sets you need just hashability and not sortability of items)).

> and the odd one is:
> find(a, assumeSorted(b));
> This is rare but has excellent best-case complexity (is it O(a.length /
> b.length)?) and is easy to add for completeness.

I suggest you to not add this uncommon case, and add it only if later there are enough people asking for it. Less code to write and maintain.
Creating functions and algorithms is a matter of balance between over-generalization (that often leads to useless complexity and longer syntax) and too much "special casing" that has other problems. Both extrema aren't good.
Life isn't easy, I guess we are asking you to be a cross between Alexander Stepanov and Guido van Rossum :o)

> Returning int from find is an insult. To add injury to it, have find
> also return an int for a list.

So you return an iterator, and no exception is raised, I see. I like this enough.

Bye,
bearophile
```
```On Wed, 04 Mar 2009 08:12:48 -0800, Sean Kelly wrote:

> So I guess the real question is whether a function is expected to
> validate its parameters.  I'd argue that it isn't, but then I'm from a
> C/C++ background.  For me, validation is a debugging tool, or at least
> an optional feature for applications that want the added insurance.

The rule-of-thumb that I use is that a function needs to validate a
parameter if that parameter /can/ come from user input and /may not/ have
been previously validated and is /critical/ to the success of the
function's behaviour.

If all of these are true, it means that the function has a potential to
fail if it doesn't take the responsibility of parameter validation.

If a parameter can only come from other functions, which are already
guaranteed to only emit validate data, the parameter data does not need
re-validation. However, even for some of these functions a 'contract'
validation of input parameters might be needed if you are attempting to
validate the logic or data flow, rather than the contents of the data
itself.

Contract validation of function results is not the same thing as input
validation. Output validation is an attempt to prove that the function's
logic is correct.

Input validation is not a debugging tool. It is a chance to inform the
program's user that they might have given the program some wrong
information to work with.

--
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
```
```On Wed, 04 Mar 2009 13:55:10 -0800, Andrei Alexandrescu wrote:

> bearophile wrote:
>> Andrei Alexandrescu:
>>> Binary search is rather common.
>>
>> Oh, yes, sorry, I meant among the ones you have listed there...
>
> Of five, three are frequent (linear, binary, Boyer-Moore), one is a form
> of set intersection (find sorted in sorted), and the odd one is:
>
> find(a, assumeSorted(b));
>
> This is rare but has excellent best-case complexity (is it O(a.length /
> b.length)?) and is easy to add for completeness.
>
>>> As an aside, your use of "index" suggests you return integrals out of
>>> the function. IMHO that's strongly unrecommended.
>>
>> I don't want to use too much of your time (that it may be better spent
>> with your new child), but I don't understand what you mean. That
>> index() function is meant the index position of the item or
>> sub-sequence into the bigger array (or iterable), and it returns -1 if
>
> This is an extremely sloppy design. That it is usual doesn't make things
> any better!
>
>> Some people think that such controls for -1 value aren't always done,
>> so to avoid that and some bugs, it's better to raise something like
>> IndexException when the needle isn't found.
>
> Yah, this is the subject of a long rant but in short: returning int from
> find means that essentially that find is unusable with anything except
> random access structures. This in turn means you'd have to have
> different means, APIs, and user code to deal with e.g. lists, in spite
> of the fact that linear search is the same boring thing for all: look at
> the current thing, yes/no, move on to the next thing. IMHO ever since
> the STL has seen the light of day there is no excuse, not even sheer
> ignorance, to ever traffic in integers as a mean to access elements in
> containers, in any language that has even the most modest parameterized
> types capability.
>
> Returning int from find is an insult. To add injury to it, have find
> also return an int for a list.
>
> "Is this item in this list?"
> "Yeppers. It's the 538th element. Took me a hike to find it." "Well I
> wanted to do something with it." "Then go get it. I'm telling you, it'll
> take exactly 538 steps."

BTW, what happens if you pass a sorted list into find?  Intuitively,
you'd assume you can pass as assumeSorted?  But you can't really do
anything but linear search?

Then what if you pass a tree structure into find?  It's sorted, but not
exactly random access...

I think find should return an iterator/pointer fine, but I don't think
there's a find template that can be used on all possible container
types.  Probably the best solution I think is to implement find inside
the container itself, and have the global find function advance a range
to the point of the element (or return an empty range if not found), with
"quick" searches reserved for sorted random-access ranges.  Note that stl
works this way.

-Steve
```
```Andrei Alexandrescu wrote:
>
> Returning int from find is an insult. To add injury to it, have find
> also return an int for a list.

tango.core.Array deals exclusively with indexes, but its aim is to make
D's built-in arrays look more like a robust type than to provide a
general set of algorithms usable with containers, etc.  So in that
instance, I think the decision is justifiable.  It certainly makes for
some nifty code when combined with the slice syntax.
```
6 7 8 9 10 11