Jump to page: 1 25  
Page
Thread overview
Suggestion: Operator `in` for slices
Dec 18, 2021
Quirin Schroll
Dec 18, 2021
Adam Ruppe
Dec 18, 2021
Quirin Schroll
Dec 19, 2021
Dukc
Dec 18, 2021
Dennis
Dec 19, 2021
Dukc
Dec 22, 2021
Paul Backus
Dec 29, 2021
Paul Backus
Jan 07, 2022
bauss
Jan 07, 2022
Mike Parker
Dec 19, 2021
Ali Çehreli
Dec 20, 2021
Dennis
Dec 20, 2021
Dennis
Dec 20, 2021
Araq
Dec 20, 2021
Biotronic
Dec 20, 2021
Dennis
Dec 20, 2021
Paul Backus
Dec 20, 2021
Stanislav Blinov
Dec 20, 2021
Paul Backus
Dec 21, 2021
Nick Treleaven
Jan 01, 2022
Nick Treleaven
Jan 02, 2022
Elronnd
Jan 03, 2022
Era Scarecrow
Jan 04, 2022
Nick Treleaven
Jan 05, 2022
Era Scarecrow
Jan 07, 2022
Nick Treleaven
Dec 20, 2021
Biotronic
December 18, 2021

The operator in between T and T[] is requested a lot of times. Most people suggest it should return a bool signifying if some value of the array on the right-hand side is equal (==) to the T value on the left-hand side. This is the obvious thing to do.

I guess it has never been implemented because this information (true/false) is so much less than it could be. If D used 1-based indexing, in could return the index if found or 0 if not found. Because 0 == false, one could use if (auto x in array). However, D's indexing 0-based, so 0 is a valid index. in cannot return a plain size_t.

My idea is inspired by how AAs in operator does not return a (near-useless) bool, but a value that can be used like a bool, but provides much more valuable information: a T*. It would work returning a T* for slices' in as well, but it is much less useful than the index. (One could get the index of something from the pointer to it using pointer subtraction, but pointer arithmetic not considered @safe.) The index practically is the pointer.

So, can we return an object that is, for almost all intents and purposes, usable as an index, but evaluating it via if returns true or false depending on whether the value was found or not.

So, my suggestion is that the in operator for slices returns a value of this type:

struct InResult
{
    size_t index = size_t.max;
    alias index this;
    bool opCast(T : bool)() const { return index != size_t.max; }
}

In fact, this is very much a simple optional type. It takes a very unlikely valid value of its underlying type and reinterprets it as the invalid value.

December 18, 2021

On Saturday, 18 December 2021 at 18:29:56 UTC, Quirin Schroll wrote:

>

I guess it has never been implemented because this information (true/false) is so much less than it could be.

Nah, the general rule against it is due to speed/computational complexity. The in operator on AAs has a speed of O(log n) or O(1). in on slices would exceed that being O(n).

I'm not sure the speed rule is written or not, but generally in is supposed to have the same speed requirement as [n] which is sub-linear.

December 18, 2021

On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:

>

On Saturday, 18 December 2021 at 18:29:56 UTC, Quirin Schroll wrote:

>

I guess it has never been implemented because this information (true/false) is so much less than it could be.

Nah, the general rule against it is due to speed/computational complexity. The in operator on AAs has a speed of O(log n) or O(1). in on slices would exceed that being O(n).

How is that even a criterion? foreach on arrays has O(n) complexity as well. You cannot say it's okay for foreach because it obviously has to be that way. The same can be said for linear search. It doesn't take a genius that searching an unordered structure cannot be better than O(n) in the worst case.

December 18, 2021

On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:

>

The in operator on AAs has a speed of O(log n) or O(1). in on slices would exceed that being O(n).

Who do people keep repeating that?

  • Worst case time complexity of the in operator in AA's is actually O(n). This can easily happen in practice when your toHash has a poor implementation, which I have had before

  • In custom types you can overload the in operator to something with arbitrarily bad time complexity.

  • Despite scaling with O(n), a linear scan is pretty fast. You can probably scan your entire RAM in a matter of seconds.

The whole "it ruins performance guarantees" so far seems completely hypothetical and is not something that came from experience with actual code.

The problem with in for slices to me is that it would be inconsistent with how it works for AAs, where in searches for a key and returns a pointer to a value. The same for slices would look like this:

T* opIn(size_t i, T[] arr)
{
    return i >= arr.length ? null : &arr[i];
}

void main()
{
    int[3] a = [10, 20, 30];
    assert((10 in a) == null); // index 10 is out of bounds
    assert((2 in a) == &a[2]);
}

But people suggest in to search a value in a slice.

I've once overloaded the in operator for a custom 2D array type used to represent a maze in which path finding was done. There it was useful for bounds checking since writing comparisons with width and height all the time is cumbersome. I wouldn't mind giving in this meaning so you can do easy bounds checks in slices as well, but I think it will be to confusing for newcomers expecting in to work like in Python.

December 19, 2021

On Saturday, 18 December 2021 at 20:46:06 UTC, Dennis wrote:

>
  • Despite scaling with O(n), a linear scan is pretty fast. You can probably scan your entire RAM in a matter of seconds.

Also, if the array is sorted it would be O(log N). You can achieve this either by adding a high level IR that can deduce that the array is sorted or making it part of the type system.

December 19, 2021

On Saturday, 18 December 2021 at 20:17:51 UTC, Quirin Schroll wrote:

> >

Nah, the general rule against it is due to speed/computational complexity. The in operator on AAs has a speed of O(log n) or O(1). in on slices would exceed that being O(n).

How is that even a criterion? foreach on arrays has O(n) complexity as well. You cannot say it's okay for foreach because it obviously has to be that way.

foreach is not supposed to give that guarantee. in is. Just like the indexing operator (range[index]). That does not say you can't define a function that checks for an element in O(n) time, but it should be called something else than in then.

It's the same reason as why we don't define range[index] as range.save.dropExactly(index).front for all forward ranges that do not explicitly declare opIndex. It's not that you can't index a range like that, but that we usually want a compile error rather than silent O(n) indexing if we are expecting O(1) or O(log n) indexing.

>

The same can be said for linear search. It doesn't take a genius that searching an unordered structure cannot be better than O(n) in the worst case.

But it isn't always immediately clear to the reader what the type of the data structure is.

December 19, 2021

On Saturday, 18 December 2021 at 20:46:06 UTC, Dennis wrote:

>

On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:

>

The in operator on AAs has a speed of O(log n) or O(1). in on slices would exceed that being O(n).

Who do people keep repeating that?

  • Worst case time complexity of the in operator in AA's is actually O(n). This can easily happen in practice when your toHash has a poor implementation, which I have had before

  • In custom types you can overload the in operator to something with arbitrarily bad time complexity.

These are bugs. The O(log n) assumption still makes sense when your code works as it's supposed to.

>
  • Despite scaling with O(n), a linear scan is pretty fast. You can probably scan your entire RAM in a matter of seconds.

A few seconds is by far too much. Remember, the data structure might be fed to function designed by introspection. The function sees "Ah, an in operator. in is on average O(log N) at worst, I can call it hundreds of times without significant slowdown", and compiles such that it takes a quarter of an hour to execute.

Had the in operator respected the O(log N) assumption, the introspecting function would have failed to compile alerting the programmer, or used an alternative, quicker implementation.

December 19, 2021

On Sunday, 19 December 2021 at 10:10:21 UTC, Dukc wrote:

>

function sees "Ah, an in operator. in is on average O(log N) at worst, I can call it hundreds of times without significant slowdown", and compiles such that it takes a quarter of an hour to execute.

I know this is a common abuse of notation in C++ circles, but O(…) means worst case.

Also if something is O(log N), then it is also O(N) or any other bigger bound. The notations O(N) means that N is sufficient to put an upper bound on the computation growth (in terms of input size) for each and every possible input of any size.

Average case is not given with Big-Oh-notation, but done "accurately" by presenting a model for the distribution of input data and then solving it. (usually a very big task)

When some C++ libraries talk of amortized complexity, they probably should not use O-notation, but we get what they mean. E.g. if you do a long series of operations then the costly reallocations will be amortized, so they therefore ignore it in the analysis and "pretend" it does not happen.

But it isn't on average. To do on-average-calculations you need a model of the input and the operations, and the calculcations can take a scholar many days to figure out. E.g. one strategy is to translate the model and patterns of operations to recurrence relations and solving it as integrals… a very tedious and time consuming job.

December 19, 2021

On Sunday, 19 December 2021 at 11:07:30 UTC, Ola Fosheim Grøstad wrote:

>

Average case is not given with Big-Oh-notation, but done "accurately" by presenting a model for the distribution of input data and then solving it. (usually a very big task)

By this I mean that using "O(…)" alone means worst case.

We can say that quicksort is O(N²). Not qualifying the statements implies worst case.

If we consider all inputs equally likely then we can also claim that the average case complexity is limited by O(N log N).

Now, if AA was implemented as a hashtable with a constant upper-bound on the linked lists (say 100) then we could say that "in" would be O(1).

We could also claim that inserts would be O(1) amortized (ignore rehashing, assuming a distribution that is relatively flat).

So in general, we should be able to have AAs where all operations (insert, delete and membership tests) are O(1) amortized.

December 19, 2021
On 12/18/21 12:46 PM, Dennis wrote:

> The problem with `in` for slices to me is that it would be inconsistent
> with how it works for AAs, where `in` searches for a *key* and returns a
> pointer to a *value*.

That.

Ali

« First   ‹ Prev
1 2 3 4 5