in not working for arrays is silly, change my view - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » in not working for arrays is silly, change my view

Thread overview

in not working for arrays is silly, change my view
Feb 29, 2020 JN
Feb 29, 2020 Steven Schveighoffer
Mar 02, 2020 Andrea Fontana
Mar 02, 2020 Steven Schveighoffer
Mar 02, 2020 aliak
Mar 02, 2020 Steven Schveighoffer
Mar 02, 2020 aliak
Mar 02, 2020 Steven Schveighoffer
Mar 02, 2020 H. S. Teoh
Mar 03, 2020 aliak
Mar 03, 2020 Steven Schveighoffer
Mar 03, 2020 H. S. Teoh
Feb 29, 2020 Ali Çehreli
Mar 02, 2020 JN
Mar 02, 2020 Steven Schveighoffer
Mar 02, 2020 aliak

February 29, 2020

in not working for arrays is silly, change my view

Posted by JN

JN

assert(1 in [1, 2, 3]);

Error: incompatible types for (1) in ([1, 2, 3]): int and int[

Yes, I know about .canFind(), but this is something that trips people over and over.

I think it would be better if "in" worked for both assoc arrays and normal arrays, or didn't work at all, for added consistency.

February 29, 2020

Re: in not working for arrays is silly, change my view

Posted by Steven Schveighoffer
in reply to JN

Steven Schveighoffer

Posted in reply to JN

On 2/29/20 2:38 PM, JN wrote:
> assert(1 in [1, 2, 3]);
> 
> Error: incompatible types for (1) in ([1, 2, 3]): int and int[
> 
> Yes, I know about .canFind(), but this is something that trips people over and over.
> 
> I think it would be better if "in" worked for both assoc arrays and normal arrays, or didn't work at all, for added consistency.

1. in is supposed to be O(lg(n)) or better. Generic code may depend on this property. Searching an array is O(n).
2. Array elements are not guaranteed to be comparable. AA keys are.
3. As this cannot be done via library, the compiler would have to do it (at least process the syntax), we need to ensure its something we want to do. I don't see it adding much to the language.

What MAY be a good idea is to have a specialized error telling people to import std.algorithm.canFind and use it when they try this.

Another thing that may be useful is having something like "isIn" instead of "canFind", which swaps the left and right parameters to read more naturally.

assert([1,2,3].canFind(1));
vs.
assert(1.isIn([1, 2, 3]));

I'd also add an overload to allow:

assert(1.isIn(1, 2, 3));

-Steve

February 29, 2020

Re: in not working for arrays is silly, change my view

Posted by Ali Çehreli
in reply to JN

Ali Çehreli

Posted in reply to JN

On 2/29/20 11:38 AM, JN wrote:

> assert(1 in [1, 2, 3]);

Because you mentioned canFind, I think you want the semantics to be "is there an element with this value." If so, it would be confusing to use the same operator for two different things: For associative arrays, it means "is there an element accessible with this key."

Unless 'in' works with arrays to mean "is this index valid", then I don't see the benefit. If we had it, I think more people would ask "why does 'in' work differently for arrays?"

Are there other languages that support this semantic? Checking... Ok, Python has it, highly likely because they don't have arrays to begin with.

Ali

March 02, 2020

Re: in not working for arrays is silly, change my view

Posted by JN
in reply to Ali Çehreli

JN

Posted in reply to Ali Çehreli

On Saturday, 29 February 2020 at 21:56:51 UTC, Ali Çehreli wrote:
> Because you mentioned canFind, I think you want the semantics to be "is there an element with this value." If so, it would be confusing to use the same operator for two different things: For associative arrays, it means "is there an element accessible with this key."

Does it? I always viewed it as "is this value in list of keys"

> Unless 'in' works with arrays to mean "is this index valid", then I don't see the benefit. If we had it, I think more people would ask "why does 'in' work differently for arrays?"
> Are there other languages that support this semantic? Checking... Ok, Python has it, highly likely because they don't have arrays to begin with.
>

Well, Python lists are for most purposes equivalent to arrays and it hasn't really been confusing for people.

March 02, 2020

Re: in not working for arrays is silly, change my view

Posted by Andrea Fontana
in reply to Steven Schveighoffer

Andrea Fontana

Posted in reply to Steven Schveighoffer

On Saturday, 29 February 2020 at 20:11:24 UTC, Steven Schveighoffer wrote:
> 1. in is supposed to be O(lg(n)) or better. Generic code may depend on this property. Searching an array is O(n).

Probably it should work if we're using a "SortedRange".


int[] a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
auto p = assumeSorted(a);

assert(3 in p);

March 02, 2020

Re: in not working for arrays is silly, change my view

Posted by Steven Schveighoffer
in reply to Andrea Fontana

Steven Schveighoffer

Posted in reply to Andrea Fontana

On 3/2/20 6:52 AM, Andrea Fontana wrote:
> On Saturday, 29 February 2020 at 20:11:24 UTC, Steven Schveighoffer wrote:
>> 1. in is supposed to be O(lg(n)) or better. Generic code may depend on this property. Searching an array is O(n).
> 
> Probably it should work if we're using a "SortedRange".
> 
> 
> int[] a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
> auto p = assumeSorted(a);
> 
> assert(3 in p);
> 
> 

That could work. Currently, you need to use p.contains(3). opIn could be added as a shortcut.

It only makes sense if you have it as a literal though, as p.contains(3) isn't that bad to use:

assert(3 in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].assumeSorted);

-Steve

March 02, 2020

Re: in not working for arrays is silly, change my view

Posted by Steven Schveighoffer
in reply to JN

Steven Schveighoffer

Posted in reply to JN

On 3/2/20 6:39 AM, JN wrote:
> On Saturday, 29 February 2020 at 21:56:51 UTC, Ali Çehreli wrote:
>> Because you mentioned canFind, I think you want the semantics to be "is there an element with this value." If so, it would be confusing to use the same operator for two different things: For associative arrays, it means "is there an element accessible with this key."
> 
> Does it? I always viewed it as "is this value in list of keys"

Array keys are the element index..

So essentially:

int[int] c1;
int[] c2 = new int[4];
c1[3] = 10;
c2[3] = 10;

assert(3 in c1); // true
assert(3 in c2); // what should this do?

-Steve

March 02, 2020

Re: in not working for arrays is silly, change my view

Posted by aliak
in reply to Steven Schveighoffer

aliak

Posted in reply to Steven Schveighoffer

On Monday, 2 March 2020 at 15:50:08 UTC, Steven Schveighoffer wrote:
> On 3/2/20 6:39 AM, JN wrote:
>> On Saturday, 29 February 2020 at 21:56:51 UTC, Ali Çehreli wrote:
>>> Because you mentioned canFind, I think you want the semantics to be "is there an element with this value." If so, it would be confusing to use the same operator for two different things: For associative arrays, it means "is there an element accessible with this key."
>> 
>> Does it? I always viewed it as "is this value in list of keys"
>
> Array keys are the element index..
>
> So essentially:
>
> int[int] c1;
> int[] c2 = new int[4];
> c1[3] = 10;
> c2[3] = 10;
>
> assert(3 in c1); // true
> assert(3 in c2); // what should this do?
>
> -Steve

If in were to mean "is this value in list of keys" then to be consistent:

3 in c2 == 3 < c2.length

March 02, 2020

Re: in not working for arrays is silly, change my view

Posted by aliak
in reply to Steven Schveighoffer

aliak

Posted in reply to Steven Schveighoffer

On Monday, 2 March 2020 at 15:47:26 UTC, Steven Schveighoffer wrote:
> On 3/2/20 6:52 AM, Andrea Fontana wrote:
>> On Saturday, 29 February 2020 at 20:11:24 UTC, Steven Schveighoffer wrote:
>>> 1. in is supposed to be O(lg(n)) or better. Generic code may depend on this property. Searching an array is O(n).
>> 
>> Probably it should work if we're using a "SortedRange".
>> 
>> 
>> int[] a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
>> auto p = assumeSorted(a);
>> 
>> assert(3 in p);
>> 
>> 
>
> That could work. Currently, you need to use p.contains(3). opIn could be added as a shortcut.
>
> It only makes sense if you have it as a literal though, as p.contains(3) isn't that bad to use:
>
> assert(3 in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].assumeSorted);
>
> -Steve

There's no guarantee that checking if a value is in a sorted list is any faster than checking if it's in a non sorted list. It's why sort usually switches from a binary-esque algorithm to a linear one at a certain size. The list could potentially need to be _very_ large for p.contains to make a significant impact over canFind(p) AFAIK.

Here's a small test program, try playing with the numbers and see what happens:

import std.random;
import std.range;
import std.algorithm;
import std.datetime.stopwatch;
import std.stdio;

void main()
{
    auto count = 1_000;
    auto max = int.max;

    alias randoms = generate!(() => uniform(0, max));

    auto r1 = randoms.take(count).array;
    auto r2 = r1.dup.sort;
    auto elem = r1[uniform(0, count)];

    benchmark!(
        () => r1.canFind(elem),
        () => r2.contains(elem),
    )(1_000).writeln;
}

Use LDC and -O3 of course. I was hard pressed to get the sorted contains to be any faster than canFind.

This begs the question then: do these requirements on in make any sense? An algorithm can be log n (ala the sorted search) but still be a magnitude slower than a linear search... what has the world come to 🤦‍♂️

PS: Why is it named contains if it's on a SortedRange and canFind otherwise?

March 02, 2020

Re: in not working for arrays is silly, change my view

Posted by Steven Schveighoffer
in reply to aliak

Steven Schveighoffer

Posted in reply to aliak

On 3/2/20 3:52 PM, aliak wrote:
> On Monday, 2 March 2020 at 15:47:26 UTC, Steven Schveighoffer wrote:
>> On 3/2/20 6:52 AM, Andrea Fontana wrote:
>>> On Saturday, 29 February 2020 at 20:11:24 UTC, Steven Schveighoffer wrote:
>>>> 1. in is supposed to be O(lg(n)) or better. Generic code may depend on this property. Searching an array is O(n).
>>>
>>> Probably it should work if we're using a "SortedRange".
>>>
>>>
>>> int[] a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
>>> auto p = assumeSorted(a);
>>>
>>> assert(3 in p);
>>>
>>>
>>
>> That could work. Currently, you need to use p.contains(3). opIn could be added as a shortcut.
>>
>> It only makes sense if you have it as a literal though, as p.contains(3) isn't that bad to use:
>>
>> assert(3 in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].assumeSorted);
>>
> There's no guarantee that checking if a value is in a sorted list is any faster than checking if it's in a non sorted list. It's why sort usually switches from a binary-esque algorithm to a linear one at a certain size.

Well of course! A binary search needs Lg(n) comparisons for pretty much any value, whereas a linear search is going to end early when it finds it. So there's no guarantee that searching for an element in the list is going to be faster one way or the other. But Binary search is going to be faster overall because the complexity is favorable.

> The list could potentially need to be _very_ large for p.contains to make a significant impact over canFind(p) AFAIK.
> 
> Here's a small test program, try playing with the numbers and see what happens:
> 
> import std.random;
> import std.range;
> import std.algorithm;
> import std.datetime.stopwatch;
> import std.stdio;
> 
> void main()
> {
>      auto count = 1_000;
>      auto max = int.max;
> 
>      alias randoms = generate!(() => uniform(0, max));
> 
>      auto r1 = randoms.take(count).array;
>      auto r2 = r1.dup.sort;
>      auto elem = r1[uniform(0, count)];

auto elem = r1[$-1]; // try this instead

> 
>      benchmark!(
>          () => r1.canFind(elem),
>          () => r2.contains(elem),
>      )(1_000).writeln;
> }
> 
> Use LDC and -O3 of course. I was hard pressed to get the sorted contains to be any faster than canFind.
> 
> This begs the question then: do these requirements on in make any sense? An algorithm can be log n (ala the sorted search) but still be a magnitude slower than a linear search... what has the world come to 🤦‍♂️
> 
> PS: Why is it named contains if it's on a SortedRange and canFind otherwise?
> 

A SortedRange uses O(lgn) steps vs. canFind which uses O(n) steps.

If you change your code to testing 1000 random numbers, instead of a random number guaranteed to be included, then you will see a significant improvement with the sorted version. I found it to be about 10x faster. (most of the time, none of the other random numbers are included). Even if you randomly select 1000 numbers from the elements, the binary search will be faster. In my tests, it was about 5x faster.

Note that the compiler can do a lot more tricks for linear searches, and CPUs are REALLY good at searching sequential data. But complexity is still going to win out eventually over heuristics. Phobos needs to be a general library, not one that only caters to certain situations.

-Steve

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation