March 04, 2009
Andrei Alexandrescu:
> Of five, three are frequent (linear, binary, Boyer-Moore), one is a form
> of set intersection (find sorted in sorted),

Yeah. Sorted-array-based sets are probably common enough in C++ code (but I generally use something equivalent to hash-sets for this purpose, I have even mostly implemented such class for D1 in the dlibs, with a nice API).
(I see sorted-array-based sets as an optimization to use in special cases, while hash-sets are for the general case when you want to manage sets (in D hashing needs a comparison too because of the chains are external and tree-based, but some other languages for hash-sets you need just hashability and not sortability of items)).


> and the odd one is:
> find(a, assumeSorted(b));
> This is rare but has excellent best-case complexity (is it O(a.length /
> b.length)?) and is easy to add for completeness.

I suggest you to not add this uncommon case, and add it only if later there are enough people asking for it. Less code to write and maintain.
Creating functions and algorithms is a matter of balance between over-generalization (that often leads to useless complexity and longer syntax) and too much "special casing" that has other problems. Both extrema aren't good.
Life isn't easy, I guess we are asking you to be a cross between Alexander Stepanov and Guido van Rossum :o)


> Returning int from find is an insult. To add injury to it, have find also return an int for a list.

So you return an iterator, and no exception is raised, I see. I like this enough.

Bye,
bearophile
March 04, 2009
On Wed, 04 Mar 2009 08:12:48 -0800, Sean Kelly wrote:

> So I guess the real question is whether a function is expected to validate its parameters.  I'd argue that it isn't, but then I'm from a C/C++ background.  For me, validation is a debugging tool, or at least an optional feature for applications that want the added insurance.

The rule-of-thumb that I use is that a function needs to validate a parameter if that parameter /can/ come from user input and /may not/ have been previously validated and is /critical/ to the success of the function's behaviour.

If all of these are true, it means that the function has a potential to fail if it doesn't take the responsibility of parameter validation.

If a parameter can only come from other functions, which are already guaranteed to only emit validate data, the parameter data does not need re-validation. However, even for some of these functions a 'contract' validation of input parameters might be needed if you are attempting to validate the logic or data flow, rather than the contents of the data itself.

Contract validation of function results is not the same thing as input validation. Output validation is an attempt to prove that the function's logic is correct.

Input validation is not a debugging tool. It is a chance to inform the program's user that they might have given the program some wrong information to work with.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
March 04, 2009
On Wed, 04 Mar 2009 13:55:10 -0800, Andrei Alexandrescu wrote:

> bearophile wrote:
>> Andrei Alexandrescu:
>>> Binary search is rather common.
>> 
>> Oh, yes, sorry, I meant among the ones you have listed there...
> 
> Of five, three are frequent (linear, binary, Boyer-Moore), one is a form
> of set intersection (find sorted in sorted), and the odd one is:
> 
> find(a, assumeSorted(b));
> 
> This is rare but has excellent best-case complexity (is it O(a.length /
> b.length)?) and is easy to add for completeness.
> 
>>> As an aside, your use of "index" suggests you return integrals out of the function. IMHO that's strongly unrecommended.
>> 
>> I don't want to use too much of your time (that it may be better spent
>> with your new child), but I don't understand what you mean. That
>> index() function is meant the index position of the item or
>> sub-sequence into the bigger array (or iterable), and it returns -1 if
>> not found. This is an usual design.
> 
> This is an extremely sloppy design. That it is usual doesn't make things any better!
> 
>> Some people think that such controls for -1 value aren't always done, so to avoid that and some bugs, it's better to raise something like IndexException when the needle isn't found.
> 
> Yah, this is the subject of a long rant but in short: returning int from find means that essentially that find is unusable with anything except random access structures. This in turn means you'd have to have different means, APIs, and user code to deal with e.g. lists, in spite of the fact that linear search is the same boring thing for all: look at the current thing, yes/no, move on to the next thing. IMHO ever since the STL has seen the light of day there is no excuse, not even sheer ignorance, to ever traffic in integers as a mean to access elements in containers, in any language that has even the most modest parameterized types capability.
> 
> Returning int from find is an insult. To add injury to it, have find also return an int for a list.
> 
> "Is this item in this list?"
> "Yeppers. It's the 538th element. Took me a hike to find it." "Well I
> wanted to do something with it." "Then go get it. I'm telling you, it'll
> take exactly 538 steps."

BTW, what happens if you pass a sorted list into find?  Intuitively, you'd assume you can pass as assumeSorted?  But you can't really do anything but linear search?

Then what if you pass a tree structure into find?  It's sorted, but not exactly random access...

I think find should return an iterator/pointer fine, but I don't think there's a find template that can be used on all possible container types.  Probably the best solution I think is to implement find inside the container itself, and have the global find function advance a range to the point of the element (or return an empty range if not found), with "quick" searches reserved for sorted random-access ranges.  Note that stl works this way.

-Steve
March 05, 2009
Andrei Alexandrescu wrote:
> 
> Returning int from find is an insult. To add injury to it, have find also return an int for a list.

tango.core.Array deals exclusively with indexes, but its aim is to make D's built-in arrays look more like a robust type than to provide a general set of algorithms usable with containers, etc.  So in that instance, I think the decision is justifiable.  It certainly makes for some nifty code when combined with the slice syntax.
March 05, 2009
Steve Schveighoffer wrote:
> BTW, what happens if you pass a sorted list into find?  Intuitively, you'd assume you can pass as assumeSorted?  But you can't really do anything but linear search?

It's up to the designer of the API. Passing a sorted forward range into sort will cut the average search time in half, which does not improve complexity.

> Then what if you pass a tree structure into find?  It's sorted, but not exactly random access...
> 
> I think find should return an iterator/pointer fine, but I don't think there's a find template that can be used on all possible container types.  Probably the best solution I think is to implement find inside the container itself, and have the global find function advance a range to the point of the element (or return an empty range if not found), with "quick" searches reserved for sorted random-access ranges.  Note that stl works this way.

Yah, that's right. The coolness of the technique comes mostly when the structure that can help searching is not obvious at the type system level (e.g. "is sorted") or is present in the needle, not the haystack (Boyer-Moore). I believe this approach is superior to STL's.

I still think it's cool to have find operate on a variety of haystacks and needles. That way users don't need to fiddle with details of changing call syntax etc.


Andrei
March 05, 2009
Sean Kelly wrote:
> Why should contracts be limited to parameter checking of internally used
> functions only?  If I write a function and document parameter constraints
> then I certainly expect those constraints to be followed regardless of
> whether I'm calling the function or someone else is calling the function.
> Checking these via a contract simply provides an optional means of
> ensuring that a logic error didn't occur within the program as a whole.
> 
> If you're talking about application input however, then I agree completely.
> ie. stuff typed in by the user, read from a file, etc, should never be validated
> within a contract because an input failure at that level doesn't represent
> a program logic error but rather user error.  An assertion failure isn't
> a terribly good way of notifying the user that they shouldn't have put an
> alphabetic character in a box intended to receive an integer :-)

Your "users" are anyone external to your built binary. That means that dll's should not use contracts to validate arguments passed to the dll's entry points.

If you're doing a library to be statically linked, it is debatable, and a decision you (as the library developer) need to make.
March 05, 2009
On Wed, 04 Mar 2009 12:14:53 -0800, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

>Max Samukha wrote:
>> On Wed, 04 Mar 2009 10:27:55 -0800, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>> 
>>> Max Samukha wrote:
>>>> If you intruduce a dummy type, why not make it perform validation in a debug build when sumthing like debug=slowButSafe is set?
>>> Because in the case of binarySearch slowButSafe quickly becomes slowAndUseless. It's happened to me - I had an assert(isSorted) in binary search (I guess it's in one of the older phobos releases!) and when I was using the debug version, my program would take forever to run.
>>>
>>> A debug build should at most change the constant multiplying the complexity, not the complexity.
>>>
>>>
>>> Andrei
>> 
>> I intentionaly proposed a special debug mode, not regular asserts, which are on in any debug build. I would like, knowing that I can wait a couple of days to make sure my program is correct, to be able to turn on validation.
>
>I am waiting a couple of days in release mode with all optimizations
>turned on and wind from behind.
>Andrei

Ok
March 05, 2009
Andrei Alexandrescu wrote:
> Steve Schveighoffer wrote:
>> BTW, what happens if you pass a sorted list into find?  Intuitively, you'd assume you can pass as assumeSorted?  But you can't really do anything but linear search?
> 
> It's up to the designer of the API. Passing a sorted forward range into sort will cut the average search time in half, which does not improve complexity.

How does this work? find in a sorted linked list has the same expected runtime as in an unsorted one -- on average, 0.5 * length. Assuming that comparisons and iterating are equally expensive, that is. If you assume that comparison is more expensive, you can do better, though cheaper comparisons don't benefit you at all.
March 06, 2009
Christopher Wright wrote:
> Andrei Alexandrescu wrote:
>> Steve Schveighoffer wrote:
>>> BTW, what happens if you pass a sorted list into find?  Intuitively, you'd assume you can pass as assumeSorted?  But you can't really do anything but linear search?
>>
>> It's up to the designer of the API. Passing a sorted forward range into sort will cut the average search time in half, which does not improve complexity.
> 
> How does this work? find in a sorted linked list has the same expected runtime as in an unsorted one -- on average, 0.5 * length. Assuming that comparisons and iterating are equally expensive, that is. If you assume that comparison is more expensive, you can do better, though cheaper comparisons don't benefit you at all.

We're saying the same thing.

Andrei
1 2 3 4 5 6 7 8 9 10 11
Next ›   Last »