October 07, 2010 Re: "in" everywhere | ||||
|---|---|---|---|---|
| ||||
Andrei Alexandrescu:
> I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it.
>
> One thing that could be done is to allow "in" with literal arrays to their right:
>
> if (x in ["abcde", "asd"]) { ... }
>
> The size of the operand is constant, known, and visible.
The "in" is useful to tell if an item is present in a collection (dynamic arrays, fixed-sized arrays, sorted arrays, tuple, typetuple, hash set, tree set, hash map, tree map, trees, graph, deque, skip list, finger tree, etc), and it's useful to tell if a substring is present in a string. Those two purposes are very commonly useful in programs.
Despite those purposes are very common, there is no need for an "in" syntax. In dlibs1 I have created an isIn() function that was good enough.
D supports the "in" operator for associative arrays, and in its operator overloading (my Set()/set() implementation supports the "in" operator), so it's natural for people to desire that operator to be usable for the other kind of collections too, like arrays. Maybe those people have used Python, where the in operator (__contains__) is widey used.
Regarding generic programming, Python doesn't have templates, but it has dynamic typing, that also has purposes similar to generic programming. To a python function that uses the "in" operator you may give both a list and an associative array, or other objects that support the __contains__. In this case the Python code performs an O(n) or O(1) operation according to the dynamic type given to the function.
I can live without the "in" operation for arrays and strings (useful to look for substrings too). So one solution is not not change the D language, and not add support for "in" to arrays.
Another solution is just to accept O(n) as the worst complexity for the "in" operator. I don't understand what's the problem in this.
Another solution is to support the "in" operator for dynamic arrays too, and define a new attribute, like @complexity(), plus an Enum that allows to specify the worst case complexity. So associative arrays are annotated with @complexity(O.linear), while the function that searches for items/substrings in arrays/strings is @complexity(O.constant). At compile-time the generic code is then able to query the computational complexity of the "in" operator implementation. The problem is that the compiler today can't enforce functions annotated with @complexity(O.constant) to actually not perform a linear search (but I don't think it's a problem, today if the Range protocol asks an operation to not be worse than O(n ln n) the compiler doesn't enforce it).
I don't like the idea of allowing the "in" operator for sorted arrays, tuples, fixed-sized arrays, skip lists, maps, sets, trees and disallow it for dynamic arrays. It looks a bit silly. People for the next ten years will then ask "in" to be extended for unsorted dynamic arrays & substring search too, you can bet on it.
Bye,
bearophile
| ||||
October 07, 2010 Re: "in" everywhere | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 10/07/2010 09:40 PM, bearophile wrote:
> Andrei Alexandrescu:
>
>> I'm a bit leary of adopting this feature (it has been discussed). To me
>> "in" implies a fast operation and substring searching isn't quite it.
>>
>> One thing that could be done is to allow "in" with literal arrays to
>> their right:
>>
>> if (x in ["abcde", "asd"]) { ... }
>>
>> The size of the operand is constant, known, and visible.
>
> The "in" is useful to tell if an item is present in a collection (dynamic arrays, fixed-sized arrays, sorted arrays, tuple, typetuple, hash set, tree set, hash map, tree map, trees, graph, deque, skip list, finger tree, etc), and it's useful to tell if a substring is present in a string. Those two purposes are very commonly useful in programs.
>
> Despite those purposes are very common, there is no need for an "in" syntax. In dlibs1 I have created an isIn() function that was good enough.
>
> D supports the "in" operator for associative arrays, and in its operator overloading (my Set()/set() implementation supports the "in" operator), so it's natural for people to desire that operator to be usable for the other kind of collections too, like arrays. Maybe those people have used Python, where the in operator (__contains__) is widey used.
>
> Regarding generic programming, Python doesn't have templates, but it has dynamic typing, that also has purposes similar to generic programming. To a python function that uses the "in" operator you may give both a list and an associative array, or other objects that support the __contains__. In this case the Python code performs an O(n) or O(1) operation according to the dynamic type given to the function.
>
> I can live without the "in" operation for arrays and strings (useful to look for substrings too). So one solution is not not change the D language, and not add support for "in" to arrays.
>
> Another solution is just to accept O(n) as the worst complexity for the "in" operator. I don't understand what's the problem in this.
>
> Another solution is to support the "in" operator for dynamic arrays too, and define a new attribute, like @complexity(), plus an Enum that allows to specify the worst case complexity. So associative arrays are annotated with @complexity(O.linear), while the function that searches for items/substrings in arrays/strings is @complexity(O.constant). At compile-time the generic code is then able to query the computational complexity of the "in" operator implementation. The problem is that the compiler today can't enforce functions annotated with @complexity(O.constant) to actually not perform a linear search (but I don't think it's a problem, today if the Range protocol asks an operation to not be worse than O(n ln n) the compiler doesn't enforce it).
>
> I don't like the idea of allowing the "in" operator for sorted arrays, tuples, fixed-sized arrays, skip lists, maps, sets, trees and disallow it for dynamic arrays. It looks a bit silly. People for the next ten years will then ask "in" to be extended for unsorted dynamic arrays& substring search too, you can bet on it.
>
> Bye,
> bearophile
I think it's better to know that in always has logarithmic or better complexity. Otherwise, there is always canFind (which needs a new name :-)
| |||
October 07, 2010 Re: "in" everywhere | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 10/7/10 14:40 CDT, bearophile wrote:
> Another solution is just to accept O(n) as the worst complexity for the "in" operator. I don't understand what's the problem in this.
That means we'd have to define another operation, i.e. "quickIn" that has O(log n) bound.
Andrei
| |||
October 07, 2010 Re: "in" everywhere | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 10/7/2010 13:57, Andrei Alexandrescu wrote: > On 10/7/10 14:40 CDT, bearophile wrote: >> Another solution is just to accept O(n) as the worst complexity for the "in" operator. I don't understand what's the problem in this. > > That means we'd have to define another operation, i.e. "quickIn" that > has O(log n) bound. Why? I can't say I've ever cared about the big-O complexity of an operation. All I care about is that it's "fast enough", which is highly context-dependent and may have nothing to do with complexity. I can't see myself replacing my 'int[]' arrays with the much slower and bigger 'int[MAX_SIZE]' arrays just to satisfy the compiler. I shouldn't have to. The type system shouldn't encourage me to. I think it's an abuse of the type system to use it to guarantee performance. However, if I wanted the type system to provide performance guarantees, I would need a lot more language support than a convention that certain operations are supposed to be O(n). I'm talking performance specification on *all* functions, with a compile-time error if the compiler can't prove that the compiled function meets those guarantees. And *even then*, I would like to be able to use an O(n) implementation of 'in' where I know that O(n) performance is acceptable. -- Rainer Deyke - rainerd@eldwood.com | |||
October 07, 2010 Re: "in" everywhere | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Rainer Deyke | On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd@eldwood.com> wrote: > On 10/7/2010 13:57, Andrei Alexandrescu wrote: >> On 10/7/10 14:40 CDT, bearophile wrote: >>> Another solution is just to accept O(n) as the worst complexity for >>> the "in" operator. I don't understand what's the problem in this. >> >> That means we'd have to define another operation, i.e. "quickIn" that >> has O(log n) bound. > > Why? > > I can't say I've ever cared about the big-O complexity of an operation. Then you don't understand how important it is. Here is an example of how caring about the big O complexity cut the runtime of dmd to about 1/3: http://d.puremagic.com/issues/show_bug.cgi?id=4721 big O complexity is very important when you are writing libraries. Not so much when you are writing applications -- if you can live with it in your application, then fine. But Phobos should not have these problems for people who *do* care. What I'd suggest is to write your own function that uses in when possible and find when not possible. Then use that in your code. -Steve | |||
October 07, 2010 Re: "in" everywhere | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | Steven Schveighoffer wrote:
> On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd@eldwood.com> wrote:
>>
>> I can't say I've ever cared about the big-O complexity of an operation.
>
> Then you don't understand how important it is. Here is an example of how caring about the big O complexity cut the runtime of dmd to about 1/3:
>
> http://d.puremagic.com/issues/show_bug.cgi?id=4721
>
> big O complexity is very important when you are writing libraries. Not so much when you are writing applications -- if you can live with it in your application, then fine. But Phobos should not have these problems for people who *do* care.
>
> What I'd suggest is to write your own function that uses in when possible and find when not possible. Then use that in your code.
>
Personally, I'd vote for inclusion of such a function in Phobos. The big problem with C++ containers was and is an absence of uniform interface. Typedefing those templates to save typing was good, but it didn't completely solve the problem of interchangeability. E.g., you make a change from list to vector and suddenly realize that remove() doesn't work anymore.
Having uniform interface for insertion/removal/lookup is plainly awesome.
What if Phobos defined a function, say, E* contains(C,E)(C container, E element) that'd behave as 'in' operator as best it could for type C? It'd fit frequent demands without making big promises - in other words, would still be largerly useful.
Yes, I understand that good (performance-wise) generic solution isn't that simple to discover and implement. But having an easily accessible (i.e. standard), though maybe not all that efficient solution is a good thing nevertheless.
| |||
October 07, 2010 Re: "in" everywhere | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Rainer Deyke | On 10/7/10 15:23 CDT, Rainer Deyke wrote: > On 10/7/2010 13:57, Andrei Alexandrescu wrote: >> On 10/7/10 14:40 CDT, bearophile wrote: >>> Another solution is just to accept O(n) as the worst complexity for >>> the "in" operator. I don't understand what's the problem in this. >> >> That means we'd have to define another operation, i.e. "quickIn" that >> has O(log n) bound. > > Why? > > I can't say I've ever cared about the big-O complexity of an operation. Complexity composes very badly. Issues tend to manifest at moderate sizes and may make things unbearable at large sizes. I'm really grateful I'm at a workplace where the exclamation "Damn! I was waiting like an idiot for this quadratic append!" is met with understanding nods from workmates who've made the same mistake before. As an example, only last week I was working on cramming a sort of an index of the entire Wikipedia on one machine. I was waiting for the indexer which ran slower and slower and slower. In the end I figured there was only _one_ quadratic operation - appending to a vector<size_t> that held document lengths. That wasn't even the bulk of the data and it was the last thing I looked at! Yet it made the run time impossible to endure. > All I care about is that it's "fast enough", which is highly > context-dependent and may have nothing to do with complexity. I can't > see myself replacing my 'int[]' arrays with the much slower and bigger > 'int[MAX_SIZE]' arrays just to satisfy the compiler. I shouldn't have > to. The type system shouldn't encourage me to. > > I think it's an abuse of the type system to use it to guarantee > performance. However, if I wanted the type system to provide > performance guarantees, I would need a lot more language support than a > convention that certain operations are supposed to be O(n). I'm talking > performance specification on *all* functions, with a compile-time error > if the compiler can't prove that the compiled function meets those > guarantees. And *even then*, I would like to be able to use an O(n) > implementation of 'in' where I know that O(n) performance is acceptable. std.container introduces the convention that O(n) methods start with "linear". Andrei | |||
October 07, 2010 Re: "in" everywhere | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu wrote: > > Complexity composes very badly. Issues tend to manifest at moderate sizes and may make things unbearable at large sizes. I'm really grateful I'm at a workplace where the exclamation "Damn! I was waiting like an idiot for this quadratic append!" is met with understanding nods from workmates who've made the same mistake before. > > As an example, only last week I was working on cramming a sort of an index of the entire Wikipedia on one machine. I was waiting for the indexer which ran slower and slower and slower. In the end I figured there was only _one_ quadratic operation - appending to a vector<size_t> that held document lengths. That wasn't even the bulk of the data and it was the last thing I looked at! Yet it made the run time impossible to endure. > But that is not a matter of library interface isn't it? It's a matter of algorithm/container choice. It's not the push_back that was slow in the end, it was std::vector (yes, that's arguable, but the point is that container defines rules for its methods, not vice-versa). >> >> I think it's an abuse of the type system to use it to guarantee >> performance. However, if I wanted the type system to provide >> performance guarantees, I would need a lot more language support than a >> convention that certain operations are supposed to be O(n). I'm talking >> performance specification on *all* functions, with a compile-time error >> if the compiler can't prove that the compiled function meets those >> guarantees. And *even then*, I would like to be able to use an O(n) >> implementation of 'in' where I know that O(n) performance is acceptable. > > std.container introduces the convention that O(n) methods start with "linear". I find such convention useful indeed, though it brings a fact to surface: if we need to emphasize method that make strong promises about complexity with prefixes/suffixes, or say by putting them into separate module, then why don't we have an non-emphasized counterpart that won't make strong promises but would fit wider range of containers? After all, std.algorithm offers different search mechanisms with varying complexity (e.g. find() vs boyerMooreFinder()). | |||
October 07, 2010 Re: "in" everywhere | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Stanislav Blinov | Stanislav Blinov wrote:
> ... make strong promises but would fit wider range of containers? After all, std.algorithm offers ...
Sorry, that should be 'cases', not 'containers' there.
| |||
October 07, 2010 Re: "in" everywhere | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | bearophile napisał: > Another solution is to support the "in" operator for dynamic arrays too, and define a new attribute, like @complexity(), plus an Enum that allows to specify the worst case complexity. So associative arrays are annotated with @complexity(O.linear), while the function that searches for items/substrings in arrays/strings is @complexity(O.constant). At compile-time the generic code is then able to query the computational complexity of the "in" operator implementation. The problem is that the compiler today can't enforce functions annotated with @complexity(O.constant) to actually not perform a linear search (but I don't think it's a problem, today if the Range protocol asks an operation to not be worse than O(n ln n) the compiler doesn't enforce it). Good idea. It would make a nice use case for user-defined attributes in D3. Making the language aware of @complexity specifically doesn't buy much, all you need is: __traits(getAttribute, opIn, @complexity).bigOh == O.constant -- Tomek | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply