Jump to page: 1 2
Thread overview
in not working for arrays is silly, change my view
Feb 29, 2020
JN
Mar 02, 2020
Andrea Fontana
Mar 02, 2020
aliak
Mar 02, 2020
aliak
Mar 02, 2020
H. S. Teoh
Mar 03, 2020
aliak
Mar 03, 2020
H. S. Teoh
Feb 29, 2020
Ali Çehreli
Mar 02, 2020
JN
Mar 02, 2020
aliak
February 29, 2020
assert(1 in [1, 2, 3]);

Error: incompatible types for (1) in ([1, 2, 3]): int and int[

Yes, I know about .canFind(), but this is something that trips people over and over.

I think it would be better if "in" worked for both assoc arrays and normal arrays, or didn't work at all, for added consistency.
February 29, 2020
On 2/29/20 2:38 PM, JN wrote:
> assert(1 in [1, 2, 3]);
> 
> Error: incompatible types for (1) in ([1, 2, 3]): int and int[
> 
> Yes, I know about .canFind(), but this is something that trips people over and over.
> 
> I think it would be better if "in" worked for both assoc arrays and normal arrays, or didn't work at all, for added consistency.

1. in is supposed to be O(lg(n)) or better. Generic code may depend on this property. Searching an array is O(n).
2. Array elements are not guaranteed to be comparable. AA keys are.
3. As this cannot be done via library, the compiler would have to do it (at least process the syntax), we need to ensure its something we want to do. I don't see it adding much to the language.

What MAY be a good idea is to have a specialized error telling people to import std.algorithm.canFind and use it when they try this.

Another thing that may be useful is having something like "isIn" instead of "canFind", which swaps the left and right parameters to read more naturally.

assert([1,2,3].canFind(1));
vs.
assert(1.isIn([1, 2, 3]));

I'd also add an overload to allow:

assert(1.isIn(1, 2, 3));

-Steve
February 29, 2020
On 2/29/20 11:38 AM, JN wrote:

> assert(1 in [1, 2, 3]);

Because you mentioned canFind, I think you want the semantics to be "is there an element with this value." If so, it would be confusing to use the same operator for two different things: For associative arrays, it means "is there an element accessible with this key."

Unless 'in' works with arrays to mean "is this index valid", then I don't see the benefit. If we had it, I think more people would ask "why does 'in' work differently for arrays?"

Are there other languages that support this semantic? Checking... Ok, Python has it, highly likely because they don't have arrays to begin with.

Ali

March 02, 2020
On Saturday, 29 February 2020 at 21:56:51 UTC, Ali Çehreli wrote:
> Because you mentioned canFind, I think you want the semantics to be "is there an element with this value." If so, it would be confusing to use the same operator for two different things: For associative arrays, it means "is there an element accessible with this key."

Does it? I always viewed it as "is this value in list of keys"

> Unless 'in' works with arrays to mean "is this index valid", then I don't see the benefit. If we had it, I think more people would ask "why does 'in' work differently for arrays?"
> Are there other languages that support this semantic? Checking... Ok, Python has it, highly likely because they don't have arrays to begin with.
>

Well, Python lists are for most purposes equivalent to arrays and it hasn't really been confusing for people.

March 02, 2020
On Saturday, 29 February 2020 at 20:11:24 UTC, Steven Schveighoffer wrote:
> 1. in is supposed to be O(lg(n)) or better. Generic code may depend on this property. Searching an array is O(n).

Probably it should work if we're using a "SortedRange".


int[] a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
auto p = assumeSorted(a);

assert(3 in p);


March 02, 2020
On 3/2/20 6:52 AM, Andrea Fontana wrote:
> On Saturday, 29 February 2020 at 20:11:24 UTC, Steven Schveighoffer wrote:
>> 1. in is supposed to be O(lg(n)) or better. Generic code may depend on this property. Searching an array is O(n).
> 
> Probably it should work if we're using a "SortedRange".
> 
> 
> int[] a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
> auto p = assumeSorted(a);
> 
> assert(3 in p);
> 
> 

That could work. Currently, you need to use p.contains(3). opIn could be added as a shortcut.

It only makes sense if you have it as a literal though, as p.contains(3) isn't that bad to use:

assert(3 in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].assumeSorted);

-Steve
March 02, 2020
On 3/2/20 6:39 AM, JN wrote:
> On Saturday, 29 February 2020 at 21:56:51 UTC, Ali Çehreli wrote:
>> Because you mentioned canFind, I think you want the semantics to be "is there an element with this value." If so, it would be confusing to use the same operator for two different things: For associative arrays, it means "is there an element accessible with this key."
> 
> Does it? I always viewed it as "is this value in list of keys"

Array keys are the element index..

So essentially:

int[int] c1;
int[] c2 = new int[4];
c1[3] = 10;
c2[3] = 10;

assert(3 in c1); // true
assert(3 in c2); // what should this do?

-Steve
March 02, 2020
On Monday, 2 March 2020 at 15:50:08 UTC, Steven Schveighoffer wrote:
> On 3/2/20 6:39 AM, JN wrote:
>> On Saturday, 29 February 2020 at 21:56:51 UTC, Ali Çehreli wrote:
>>> Because you mentioned canFind, I think you want the semantics to be "is there an element with this value." If so, it would be confusing to use the same operator for two different things: For associative arrays, it means "is there an element accessible with this key."
>> 
>> Does it? I always viewed it as "is this value in list of keys"
>
> Array keys are the element index..
>
> So essentially:
>
> int[int] c1;
> int[] c2 = new int[4];
> c1[3] = 10;
> c2[3] = 10;
>
> assert(3 in c1); // true
> assert(3 in c2); // what should this do?
>
> -Steve

If in were to mean "is this value in list of keys" then to be consistent:

3 in c2 == 3 < c2.length
March 02, 2020
On Monday, 2 March 2020 at 15:47:26 UTC, Steven Schveighoffer wrote:
> On 3/2/20 6:52 AM, Andrea Fontana wrote:
>> On Saturday, 29 February 2020 at 20:11:24 UTC, Steven Schveighoffer wrote:
>>> 1. in is supposed to be O(lg(n)) or better. Generic code may depend on this property. Searching an array is O(n).
>> 
>> Probably it should work if we're using a "SortedRange".
>> 
>> 
>> int[] a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
>> auto p = assumeSorted(a);
>> 
>> assert(3 in p);
>> 
>> 
>
> That could work. Currently, you need to use p.contains(3). opIn could be added as a shortcut.
>
> It only makes sense if you have it as a literal though, as p.contains(3) isn't that bad to use:
>
> assert(3 in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].assumeSorted);
>
> -Steve

There's no guarantee that checking if a value is in a sorted list is any faster than checking if it's in a non sorted list. It's why sort usually switches from a binary-esque algorithm to a linear one at a certain size. The list could potentially need to be _very_ large for p.contains to make a significant impact over canFind(p) AFAIK.

Here's a small test program, try playing with the numbers and see what happens:

import std.random;
import std.range;
import std.algorithm;
import std.datetime.stopwatch;
import std.stdio;

void main()
{
    auto count = 1_000;
    auto max = int.max;

    alias randoms = generate!(() => uniform(0, max));

    auto r1 = randoms.take(count).array;
    auto r2 = r1.dup.sort;
    auto elem = r1[uniform(0, count)];

    benchmark!(
        () => r1.canFind(elem),
        () => r2.contains(elem),
    )(1_000).writeln;
}

Use LDC and -O3 of course. I was hard pressed to get the sorted contains to be any faster than canFind.

This begs the question then: do these requirements on in make any sense? An algorithm can be log n (ala the sorted search) but still be a magnitude slower than a linear search... what has the world come to 🤦‍♂️

PS: Why is it named contains if it's on a SortedRange and canFind otherwise?

March 02, 2020
On 3/2/20 3:52 PM, aliak wrote:
> On Monday, 2 March 2020 at 15:47:26 UTC, Steven Schveighoffer wrote:
>> On 3/2/20 6:52 AM, Andrea Fontana wrote:
>>> On Saturday, 29 February 2020 at 20:11:24 UTC, Steven Schveighoffer wrote:
>>>> 1. in is supposed to be O(lg(n)) or better. Generic code may depend on this property. Searching an array is O(n).
>>>
>>> Probably it should work if we're using a "SortedRange".
>>>
>>>
>>> int[] a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
>>> auto p = assumeSorted(a);
>>>
>>> assert(3 in p);
>>>
>>>
>>
>> That could work. Currently, you need to use p.contains(3). opIn could be added as a shortcut.
>>
>> It only makes sense if you have it as a literal though, as p.contains(3) isn't that bad to use:
>>
>> assert(3 in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].assumeSorted);
>>
> There's no guarantee that checking if a value is in a sorted list is any faster than checking if it's in a non sorted list. It's why sort usually switches from a binary-esque algorithm to a linear one at a certain size.

Well of course! A binary search needs Lg(n) comparisons for pretty much any value, whereas a linear search is going to end early when it finds it. So there's no guarantee that searching for an element in the list is going to be faster one way or the other. But Binary search is going to be faster overall because the complexity is favorable.

> The list could potentially need to be _very_ large for p.contains to make a significant impact over canFind(p) AFAIK.
> 
> Here's a small test program, try playing with the numbers and see what happens:
> 
> import std.random;
> import std.range;
> import std.algorithm;
> import std.datetime.stopwatch;
> import std.stdio;
> 
> void main()
> {
>      auto count = 1_000;
>      auto max = int.max;
> 
>      alias randoms = generate!(() => uniform(0, max));
> 
>      auto r1 = randoms.take(count).array;
>      auto r2 = r1.dup.sort;
>      auto elem = r1[uniform(0, count)];

auto elem = r1[$-1]; // try this instead

> 
>      benchmark!(
>          () => r1.canFind(elem),
>          () => r2.contains(elem),
>      )(1_000).writeln;
> }
> 
> Use LDC and -O3 of course. I was hard pressed to get the sorted contains to be any faster than canFind.
> 
> This begs the question then: do these requirements on in make any sense? An algorithm can be log n (ala the sorted search) but still be a magnitude slower than a linear search... what has the world come to 🤦‍♂️
> 
> PS: Why is it named contains if it's on a SortedRange and canFind otherwise?
> 

A SortedRange uses O(lgn) steps vs. canFind which uses O(n) steps.

If you change your code to testing 1000 random numbers, instead of a random number guaranteed to be included, then you will see a significant improvement with the sorted version. I found it to be about 10x faster. (most of the time, none of the other random numbers are included). Even if you randomly select 1000 numbers from the elements, the binary search will be faster. In my tests, it was about 5x faster.

Note that the compiler can do a lot more tricks for linear searches, and CPUs are REALLY good at searching sequential data. But complexity is still going to win out eventually over heuristics. Phobos needs to be a general library, not one that only caters to certain situations.

-Steve
« First   ‹ Prev
1 2