Jump to page: 1 2
Thread overview
[Issue 1323] New: Implement opIn_r for arrays
Jul 09, 2007
d-bugmail
Jul 09, 2007
d-bugmail
Jul 09, 2007
d-bugmail
Jul 09, 2007
d-bugmail
Jul 09, 2007
d-bugmail
Jul 09, 2007
d-bugmail
Jul 09, 2007
d-bugmail
Jul 09, 2007
d-bugmail
Jul 09, 2007
d-bugmail
Jul 10, 2007
d-bugmail
Jul 11, 2007
d-bugmail
Jul 11, 2007
d-bugmail
Nov 04, 2007
d-bugmail
Oct 31, 2012
Roy Crihfield
Oct 31, 2012
Jonathan M Davis
July 09, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1323

           Summary: Implement opIn_r for arrays
           Product: D
           Version: 2.002
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: wbaxter@gmail.com


int[] someInts = getInts();
if (0 in someInts) {
   handleZero(someInts);
}

>> ERROR

I just did this again for the umpteenth time so I thought I'd go ahead and post it as an enhancement request.

In Python at least the 'in' operator works on any sequence type.  There's no
reason why it shouldn't work on regular arrays in D too.  The implementation
should basically be the linear search equivalent of what associative arrays do.
 Something like:

T* opIn_r(T searchval) {
 for(size_t i=0; i<array.length; i++) {
    if (array[i]==searchval) { return &array[i]; }
 }
 return null;
}

Maybe == might not be the right equality check to use (since that would fail for an array with nulls).  Maybe it should be specialized depending on whether the elements are build-in/class/struct.  But the idea is that "x in array" should do /something/ useful.

--bb


-- 

July 09, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1323


smjg@iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg@iname.com




------- Comment #1 from smjg@iname.com  2007-07-09 10:39 -------
It's been said before, no harm in repeating it here.

For associative arrays,

        X in A

means that there exists a Y for which

        A[X] == Y

According to the semantics that get proposed every time this is brought up, for linear arrays, it would mean the reverse: there exists a Y for which

        A[Y] == X

This inconsistency might be undesirable from a generic programming POV.

Besides, if we're going to have an operator to check whether a specified _value_ is in an array, shouldn't it be available for AAs as well?


-- 

July 09, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1323





------- Comment #2 from regan@netmail.co.nz  2007-07-09 10:56 -------
Workarounds... (that dont work if key and value type are the same)

bool contains(T)(T[] array, T value)
{
        foreach(v; array) if (v == value) return true;
        return false;
}

bool contains(V, K)(V[K] array, V value)
{
        foreach(v; array.values) if (v == value) return true;
        return false;
}

bool contains(V, K)(V[K] array, K key)
{
        foreach(k; array.keys) if (k == key) return true;
        return false;
}

int find(T)(T[] array, T value)
{
        foreach(i, v; array) if (v == value) return i;
        return -1;
}

K find(V, K)(V[K] array, V value)
{
        //sadly inefficient
        foreach(v; array.values)
        {
                if (v != value) continue;
                foreach(k; array.keys)
                {
                        if (array[k] == v) return k;
                }
        }
        return K.init;
}

V find(V, K)(V[K] array, K key)
{
        return array[key];
}

void main()
{
        string[char]    ctos;
        char[int]       itoc;
        char[]          carr;
        int[]           iarr;

        ctos['c'] = "string";
        assert(ctos.contains('c'));
        assert(ctos.contains("string"));

        itoc[5] = 'c';
        assert(itoc.contains(5));
        assert(itoc.contains('c'));

        carr ~= 'c';
        assert(carr.contains('c'));

        iarr ~= 5;
        assert(iarr.contains(5));

        assert(ctos.find('c') == "string");
        assert(ctos.find("string") == 'c');

        assert(itoc.find(5) == 'c');
        assert(itoc.find('c') == 5);

        assert(carr.find('c') == 0);
        assert(iarr.find(5) == 0);
}


-- 

July 09, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1323





------- Comment #3 from smjg@iname.com  2007-07-09 13:18 -------
(In reply to comment #2)
> Workarounds... (that dont work if key and value type are the same)

Only because you've chosen to give the functions the same names.

> bool contains(V, K)(V[K] array, K key)
> {
>         foreach(k; array.keys) if (k == key) return true;
>         return false;
> }

What's wrong with
    return cast(bool) (k in array);
?


-- 

July 09, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1323





------- Comment #4 from wbaxter@gmail.com  2007-07-09 14:04 -------
(In reply to comment #1)
> It's been said before, no harm in repeating it here.
> 
> For associative arrays,
> 
>         X in A
> 
> means that there exists a Y for which
> 
>         A[X] == Y
> 
> According to the semantics that get proposed every time this is brought up, for linear arrays, it would mean the reverse: there exists a Y for which
> 
>         A[Y] == X
> 
> This inconsistency might be undesirable from a generic programming POV.

I can't see how.  Arrays and associative arrays are different beasts.  You can't really generically interchange the two, regardless of what the behavior of 'in' is, because of other semantic differences.  And note that currently they certainly behave very differently regarding how 'in' works, and there hasn't been a meltdown because of the lack of ability to generically interchange arrays and associative arrays on account of that.

In Python since everything is dynamic and uses 'duck typing' it's really got more to lose from loss of genericity than D.  In Python you're effectively *always* writing generic template code.  And yet in Python "foo in alist" checks if foo is in the alist, and "foo in aa" checks if foo is a key in aa (a 'dict' in Python).  And there has been no outcry over this as far as I know. Probably because you usually know before you start coding whether you need a dict or an array.  It's pretty obvious that if I want to look up people by their employee ID's that the ID shouldn't be the index in an regular array, an associative array would be better.


> Besides, if we're going to have an operator to check whether a specified _value_ is in an array, shouldn't it be available for AAs as well?

Yes, it would be spelled "foo in AA.values".

Ok, that wouldn't be so efficient since it builds a big temporary array, but that's a problem with AA.values.  AA's need some keys() and values()-type methods that return lightweight iterators.  But the iterators situation in D is a whole 'nother mess.


Arguing from another direction, there's only one 'opIn_r', and that's not likely to change, so which version of opIn_r is going to save you more work if implemented for arrays?  An opIn_r for arrays that does (0<=k && k<A.length) or one which does

  bool found = false;
  foreach (tmp; A) { if (tmp==k) { found=true; break; } }

The former is already an expression, and so can be used in an if condition, for example; the latter is a) more typing and b) not an expression.  If you want to use it as an expression you need to do more typing to wrap it in an function.

I'd say the former is already convenient enough (and you can even get away with just k<A.length if k is unsigned).  And really the actual implementation of the latter should probably be a little more complex to dance around the ticking time-bomb behavior of == for classes.  Something like:

bool contains(T[] A, T k) {
  bool found = false;
  foreach (v; A) {
    static if (is(T==class)) {
       if (v is k || (k !is null && k==v)) { found=true; break; }
    }
    else {
       if (v==k) { found=true; break; }
    }
  }
  return found;
}

And maybe I've missed something there too.  Like maybe the v should be 'ref' for better efficiency when it's a big struct type?  Probably it shouldn't be ref for class types though, so maybe I need another static if there.  Anyway, that's part of why I want it taken care of for me in the language.  I just want to type 'x in some_array' and let the system find it for me in the most efficient way possible (which may include some sort of fancy parallel scanning in the future).


-- 

July 09, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1323





------- Comment #5 from shro8822@uidaho.edu  2007-07-09 14:12 -------
How about make

(i in array)

go to

((i>=0 && i<array.length) ? &array[i] : null)


-- 

July 09, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1323





------- Comment #6 from wbaxter@gmail.com  2007-07-09 14:38 -------
(In reply to comment #5)
> How about make
> 
> (i in array)
> 
> go to
> 
> ((i>=0 && i<array.length) ? &array[i] : null)
> 

Whichever meaning it is given it should probably return a useful pointer like the opIn for AA's does.  But I'd argue the above is pretty useless or at least brief enough already to not be worth the effort.  Especially since now you've got a pointer that might or might not be null, so the next thing you're probably going to do is turn around and check if it's null and if not dereference it.  So it's like saying:

auto x = ((i>=0 && i<array.length) ? &array[i] : null);
if (x !is null) {
   auto X = *x;
   ...
}
That's just silly when you can instead do
if (i>=0 && i<array.length) {
   auto X = array[i];
}

Do you folks actually have some killer use case for wanting to pretend that indices are like AA keys?  Or is it just an ideological thing?  Because I agree, you're totally right on the idealogical front.  We can think of them both as a map or partial function from one set to another.  Gotcha.

But it's just not useful.  Like I said, very seldom in my experience are you ever tempted to throw AA's and regular arrays at the same problem.  You're more likely to switch an array implementation to a Set implementation of some sort. And guess what, if you implement said Set using AA's then you'll be putting the contents of the arrays in the _keys_, not the values, using a bool[T] or somesuch.  So if X in Y checks keys for AA's then, in this common use case, it would be better for "generic code" if X in Y checked the values of the arrays rather than the indices.


-- 

July 09, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1323





------- Comment #7 from shro8822@uidaho.edu  2007-07-09 15:28 -------
as to in returning null, that is what 'in' does with AA's; if the key is in the AA return a pointer to the value, otherwise, return null.

In fact, the code I wrote works the same as 'in' for AA's

The usual way to use the return from 'in' where it might give null is this:

if(auto v = i in j)
{
  use(*v);
}
else
{
  // i not in j
}


-- 

July 09, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1323





------- Comment #8 from wbaxter@gmail.com  2007-07-09 16:39 -------
(In reply to comment #7)
> as to in returning null, that is what 'in' does with AA's; if the key is in the AA return a pointer to the value, otherwise, return null.
> 
> In fact, the code I wrote works the same as 'in' for AA's
> 
> The usual way to use the return from 'in' where it might give null is this:
> 
> if(auto v = i in j)
> {
>   use(*v);
> }
> else
> {
>   // i not in j
> }

Sure, but my point was that it doesn't gain you any efficiency or clarity in the case of arrays.  You might as well do:

if(0<=i && i<j.length)
{
  use(j[i]);
}
else
{
  // i not in j
}

It's only a few characters longer and clearer in my opinion.


-- 

July 10, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1323





------- Comment #9 from smjg@iname.com  2007-07-10 11:50 -------
(In reply to comment #4)
<snip>
>> This inconsistency might be undesirable from a generic programming POV.
> 
> I can't see how.  Arrays and associative arrays are different beasts.  You can't really generically interchange the two, regardless of what the behavior of 'in' is, because of other semantic differences.

Other than in how you can add/remove elements, are there many differences that actually compromise their interchangeability?

For example, suppose you want a function that iterates over the keys and values, and possibly changes some of the values it encounters.  This is already something that can be done for LAs and AAs alike with the same code.

> And note that currently they certainly behave very differently regarding how 'in' works,

Associative arrays: in checks if the key is there
Linear arrays: in doesn't work at all

So why is this an issue?

> and there hasn't been a meltdown because of the lack of ability to generically interchange arrays and associative arrays on account of that.

That doesn't mean it won't happen.

Suppose you want a container indexed by non-negative integers, and a moderate proportion of indexes within some range will actually have some useful value on the other end.  You could use an AA, or you could use an LA and fill in the blanks with ValueType.init or something (which might be more efficient).  At some point you change your mind about which to use.  You're going to need to change some code anyway, and the compiler would likely catch quite a few instances.  But if in does something semantically completely different because you've changed the internal data structure, it could easily lead to bugs.

<snip>
>> Besides, if we're going to have an operator to check whether a specified _value_ is in an array, shouldn't it be available for AAs as well?
> 
> Yes, it would be spelled "foo in AA.values".

Yes, you have a point there.  The .values property could even be defined for LAs to just return the array, which would enable values of either to be searched in the same way....

<snip>
> And really the actual implementation of the latter should
> probably be a little more complex to dance around the ticking
> time-bomb behavior of == for classes.  Something like:
> bool contains(T[] A, T k) {
<snip>
> }

That depends on whether it's implemented using templates or TypeInfo.  But TypeInfo_C.equals should probably be changed to do take nulls into account. After all, TypeInfo_C.compare already does take null references into account.


-- 

« First   ‹ Prev
1 2