December 19, 2021

On Saturday, 18 December 2021 at 20:46:06 UTC, Dennis wrote:

>

I wouldn't mind giving in this meaning so you can do easy bounds checks in slices as well, but I think it will be to confusing for newcomers expecting in to work like in Python.

But your proposal would kinda work like in Python?

In [3]: 'a' in {'a':1, 'b':2}
Out[3]: True
In [4]: 'a' in ['a','b']
Out[4]: True

December 19, 2021

On 12/18/21 3:46 PM, Dennis wrote:

>

On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:

>

The in operator on AAs has a speed of O(log n) or O(1). in on slices would exceed that being O(n).

Who do people keep repeating that?

  • Worst case time complexity of the in operator in AA's is actually O(n). This can easily happen in practice when your toHash has a poor implementation, which I have had before

Well, yeah. But that doesn't mean it happens often or even regularly. You need to have a bad toHash function.

We shouldn't lower the bar for in just because it's possible to make it perform poorly.

>
  • In custom types you can overload the in operator to something with arbitrarily bad time complexity.

There's your answer -- make a custom type that allows in with an array-like structure.

Or make a UFCS function.

>
  • Despite scaling with O(n), a linear scan is pretty fast. You can probably scan your entire RAM in a matter of seconds.

Complexity is not some old-fashioned idea that has no relevance on today's hardware. Linear searching is a real bottleneck, and identifying in to be sub-linear is a useful design decision.

If you are searching your entire memory by using an opCmp, I bet it will be more than a few seconds.

>

The whole "it ruins performance guarantees" so far seems completely hypothetical and is not something that came from experience with actual code.

The design is sound. You use in, you get sub-linear (i.e. fast) performance. That kind of expectation is useful from a generic point of view. It's similar to opIndex for a linked list -- it's possible, it makes it so you can used linked lists in places that might expect arrays, but generic code that uses it might perform really badly as the code expects opIndex to be fast.

>

The problem with in for slices to me is that it would be inconsistent with how it works for AAs, where in searches for a key and returns a pointer to a value. The same for slices would look like this:

[snip]

>

But people suggest in to search a value in a slice.

Yes, this is a good point against. But the thing is, we also have find, which works fine, I'm not sure why we need in support.

>

I've once overloaded the in operator for a custom 2D array type used to represent a maze in which path finding was done. There it was useful for bounds checking since writing comparisons with width and height all the time is cumbersome. I wouldn't mind giving in this meaning so you can do easy bounds checks in slices as well, but I think it will be to confusing for newcomers expecting in to work like in Python.

Yes, that's not a terrible use case for it. But you are right that in will be confusing to newcomers, especially when you learn that an array is a collection of data. if(x in arr) doesn't seem to suggest that it will be looking at indexes.

-Steve

December 19, 2021

On 12/19/21 6:45 AM, Ola Fosheim Grøstad wrote:

>

Now, if AA was implemented as a hashtable with a constant upper-bound on the linked lists (say 100) then we could say that "in" would be O(1).

D AAs do not use linked lists any more. It uses open addressing. But having a bound on how many collisions can happen is ineffective against a bad hash function (say, one that always returns 0).

The expectation of the hash table is that the hash function is good. Without a good hash function, the complexity goes to crap.

>

We could also claim that inserts would be O(1) amortized (ignore rehashing, assuming a distribution that is relatively flat).

That is what we claim (and what hash algorithms in general claim). amortized O(1) lookups and inserts.

-Steve

December 19, 2021

On Sunday, 19 December 2021 at 18:36:12 UTC, Steven Schveighoffer wrote:

>

That is what we claim (and what hash algorithms in general claim). amortized O(1) lookups and inserts.

Yes, if you don't shrink.

You need to prove that the claim holds for all possible orderings of inserts, deletions and lookups.

So you need to prove that you always do O(N) operations between rehashing, so you get O(N) operations of complexity O(1) + one rehash of complexity O(N) = O(N).

December 20, 2021

On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:

>

I'm not sure the speed rule is written or not, but generally in is supposed to have the same speed requirement as [n] which is sub-linear.

It's in std.container, bottom of the page.

I remember a discussion on complexity that prompted Andrei to create a library for marking functions with their complexity. I'm not sure if this was the original source for the complexities given in std.container.

My brain is also telling me the values were sourced from Alexander Stepanov's Elements of Programming, but I can't find a source for that, and I'm too lazy to look very deep.

--
Biotronic

December 20, 2021

On Sunday, 19 December 2021 at 17:11:23 UTC, Ola Fosheim Grøstad wrote:

>

But your proposal would kinda work like in Python?

In [3]: 'a' in {'a':1, 'b':2}
Out[3]: True
In [4]: 'a' in ['a','b']
Out[4]: True

In my proposal 'a' in ['a', 'b'] will return null because it looks for the key 'a', and since the ASCII value of 'a' is 0x61 and the array length 2, it will return null.

December 20, 2021

On Sunday, 19 December 2021 at 18:31:46 UTC, Steven Schveighoffer wrote:

>

There's your answer -- make a custom type that allows in with an array-like structure.

For the record, I'm not arguing in favor of in for slices. I'm just baffled that whenever this comes up, the first thing people bring up is the 'time complexity' argument.

>

That kind of expectation is useful from a generic point of view.

And this is my biggest problem with that argument: it's about this mystical function that takes a generic data structure and uses the in operator multiple times on it, which can now suddenly be called on an array, ruining performance to the surprise of the programmer.

What is this function? Does it exist in Phobos currently?

December 20, 2021

On Monday, 20 December 2021 at 11:04:00 UTC, Dennis wrote:

>

On Sunday, 19 December 2021 at 18:31:46 UTC, Steven Schveighoffer wrote:

>

There's your answer -- make a custom type that allows in with an array-like structure.

For the record, I'm not arguing in favor of in for slices. I'm just baffled that whenever this comes up, the first thing people bring up is the 'time complexity' argument.

>

That kind of expectation is useful from a generic point of view.

And this is my biggest problem with that argument: it's about this mystical function that takes a generic data structure and uses the in operator multiple times on it, which can now suddenly be called on an array, ruining performance to the surprise of the programmer.

What is this function? Does it exist in Phobos currently?

And how come it cannot store the result of the "in" operation in a local variable for re-use? And how exactly is calling an oh-so-efficient sub-linear O(log N) function repeatedly when you can save the result in a local not a problem?

December 20, 2021

On Monday, 20 December 2021 at 11:28:52 UTC, Araq wrote:

>

On Monday, 20 December 2021 at 11:04:00 UTC, Dennis wrote:

>

And this is my biggest problem with that argument: it's about this mystical function that takes a generic data structure and uses the in operator multiple times on it, which can now suddenly be called on an array, ruining performance to the surprise of the programmer.

What is this function? Does it exist in Phobos currently?

And how come it cannot store the result of the "in" operation in a local variable for re-use? And how exactly is calling an oh-so-efficient sub-linear O(log N) function repeatedly when you can save the result in a local not a problem?

bool isSubset(T1, T2)(T1 needles, T2 haystack) {
    foreach (needle; needles) {
        if (!(needle in haystack)) return false;
    }
    return true;
}

If 'in' is O(1), that's a fairly sensible implementation. If it's O(N), it's not. Since the lookup uses different arguments each time, local variables can't help you.

Note that, of course, there's no problem if x in arr is a simple bounds check - if your bounds check on an array is O(N), you're doing it wrong.

--
Simen

December 20, 2021

On Monday, 20 December 2021 at 10:34:44 UTC, Dennis wrote:

>

In my proposal 'a' in ['a', 'b'] will return null because it looks for the key 'a', and since the ASCII value of 'a' is 0x61 and the array length 2, it will return null.

Oh yeah, I'm sorry, you looked at the index as a key. I'm not sure how I managed to mix that up…

But yeah, just doing what Python does, but with a pointer to a value or null instead of boolean, makes sense in D.