January 27, 2010
>"Bill Baxter" <wbaxter@gmail.com> wrote in message news:mailman.34.1264542189.4461.digitalmars-d-learn@puremagic.com... On Tue, Jan 26, 2010 at 1:21 PM, bearophile <bearophileHUGS@lycos.com> wrote:
>> Nick Sabalausky:
>>> Aside from that being how Python does it, why do you see that as preferable?
>>
>> Because:
>> 1) linear searches in an array are damn common. I don't remember the
>> results of my benchmarks, but until your integer arrays is quite longer
>> than 30-50 items, performing a linear search is faster than a lookup in
>> an AA, on DMD. On Tango this number is probably 70% higher
>> 1b) In Python if you perform a "foo" in "barfoo" the language doesn't
>> perform a linear search, it uses a much smarter search that has a
>> complexity lower than the product of the two lengths, using a custom
>> algorithm. So in D you can use the same syntax to search for
>> substrings/subarrays. Where such smarter search is not possible, D can
>> use a naive search.
>> 2) It's really handy. I use isIn(item, items) to search on arrays in D,
>> but having a item in items is nicer.
>> 3) You can use the same syntax to search into anything that's lazily
>> iterable too (a Range). This is very handy.
>>
>>
>>> So having a single syntax work on the outputs for
>>> regular arrays, but then on the inputs for AAs, seems highly
>>> inconsistent
>>> and error-prone to me.
>>
>> I have followed many Python newbies personally, I am following the Python newsgroups, and I have programmed for years in Python, and while I have seen many different kinds of bugs, I have not seen a significant amount of bugs in this. Python programmers just learn that dicts and lists are a little different in this regard. At the same way they learn that a set and a dict are different data structures, with different capabilities and usages.
>
>It's not even really  inconsistent if you just think about these data
>structures in terms of function rather than form.
>An array is often used as a simple set of things.  "O in Array" means
>"is O in that set of things"
>An AA is a set of things that also have some associated data.  "O in
>AA" means "is O in that set of things" (not the ancillary data)
>If you have an actual "set" data structure for containing a set of of
>things, then "O in Set" means, again, "is O in that set of things".
>(In fact the closest thing D has to a built-in set type is an AA with
>"don't care" associated data, reinforcing the notion of AA as a set
>plus extra data.)
>
>

Even looking at function rather than form, I still think its innacurate to consider the keys to be the elements of an AA. In most uses of an AA, the key is primarily something convenient with which to look up data. They hold significance, but typically not as much as the data that is looked up with it. What you've described is very much like (and quite literally the same as, in the case of many dynamic languages) thinking of a variable's name as the actual data, and thinking of the value it holds merely as "ancillary data".

Keep in mind too, even with a regular array, the index can still hold significance as data. For instace, I could have an array of Foo's, declare that any element with an odd index has property 'A' and any with an even index has property 'B', and treat them as such. May seem strange at a glance, but such things are common in low-level, low-resoruce and performance-oriented code. Bottom line, though, is that "Property 'A' or 'B'" is data that now been encoded in the array's index, but despite that, the indicies still aren't considered the array's elements. And the data they lookup still isn't considered "ancillary data".

And yes, a Hashed Set may likely be *implemented* as an AA with just keys, but that's just form, it doesn't imply a similarity in regard to function. The *function* of a HashSet is that of an unordered array that's been optimized for "contains X / doesn't contain X" on large data sets.

Containers and their functions:
- AA: Store A's with label B, A is fairly important, B may or may not be.
- Array: Store A's with label B, A is fairly important, B may or may not be,
B has certain restrictions, container overall has different performance
characteristacs from an AA.
- Hashed Set: Store A's.


January 27, 2010
bearophile wrote:
> Nick Sabalausky:
>> Aside from that being how Python does it, why do you see that as preferable?
> 
> Because: 1) linear searches in an array are damn common. I don't remember the results of my benchmarks, but until your integer arrays is quite longer than 30-50 items, performing a linear search is faster than a lookup in an AA, on DMD. On Tango this number is probably 70% higher 1b) In Python if you perform a "foo" in "barfoo" the language doesn't perform a linear search, it uses a much smarter search that has a complexity lower than the product of the two lengths, using a custom algorithm. So in D you can use the same syntax to search for substrings/subarrays. Where such smarter search is not possible, D can use a naive search. 2) It's really handy. I use isIn(item, items) to search on arrays in D, but having a item in items is nicer. 3) You can use the same syntax to search into anything that's lazily iterable too (a Range). This is very handy.

I would add to that:
4) Because 'in' is an operator, and operators are expected to bear a
greater weight than ordinary functions.

If 'in' was an ordinary method, say 'a.contains(b)', then I would choose
different method names for searching an array for a value and searching
an associative array for a key.  Probably something like:
  array.contains(value)
  associative_array.containsKey(key)

However, since 'in' already is an infix operator, it should have the most widely applicable semantics.  Operators are heavyweight syntactic sugar for function calls.  There is no room in D for operators that are only rarely useful.


-- 
Rainer Deyke - rainerd@eldwood.com
January 27, 2010
On Tue, Jan 26, 2010 at 6:16 PM, Nick Sabalausky <a@a.a> wrote:
>>"Bill Baxter" <wbaxter@gmail.com> wrote in message news:mailman.34.1264542189.4461.digitalmars-d-learn@puremagic.com... On Tue, Jan 26, 2010 at 1:21 PM, bearophile <bearophileHUGS@lycos.com> wrote:
>>> Nick Sabalausky:
>>>> Aside from that being how Python does it, why do you see that as preferable?
>>>
>>> Because:
>>> 1) linear searches in an array are damn common. I don't remember the
>>> results of my benchmarks, but until your integer arrays is quite longer
>>> than 30-50 items, performing a linear search is faster than a lookup in
>>> an AA, on DMD. On Tango this number is probably 70% higher
>>> 1b) In Python if you perform a "foo" in "barfoo" the language doesn't
>>> perform a linear search, it uses a much smarter search that has a
>>> complexity lower than the product of the two lengths, using a custom
>>> algorithm. So in D you can use the same syntax to search for
>>> substrings/subarrays. Where such smarter search is not possible, D can
>>> use a naive search.
>>> 2) It's really handy. I use isIn(item, items) to search on arrays in D,
>>> but having a item in items is nicer.
>>> 3) You can use the same syntax to search into anything that's lazily
>>> iterable too (a Range). This is very handy.
>>>
>>>
>>>> So having a single syntax work on the outputs for
>>>> regular arrays, but then on the inputs for AAs, seems highly
>>>> inconsistent
>>>> and error-prone to me.
>>>
>>> I have followed many Python newbies personally, I am following the Python newsgroups, and I have programmed for years in Python, and while I have seen many different kinds of bugs, I have not seen a significant amount of bugs in this. Python programmers just learn that dicts and lists are a little different in this regard. At the same way they learn that a set and a dict are different data structures, with different capabilities and usages.
>>
>>It's not even really  inconsistent if you just think about these data
>>structures in terms of function rather than form.
>>An array is often used as a simple set of things.  "O in Array" means
>>"is O in that set of things"
>>An AA is a set of things that also have some associated data.  "O in
>>AA" means "is O in that set of things" (not the ancillary data)
>>If you have an actual "set" data structure for containing a set of of
>>things, then "O in Set" means, again, "is O in that set of things".
>>(In fact the closest thing D has to a built-in set type is an AA with
>>"don't care" associated data, reinforcing the notion of AA as a set
>>plus extra data.)
>>
>>
>
> Even looking at function rather than form, I still think its innacurate to consider the keys to be the elements of an AA. In most uses of an AA, the key is primarily something convenient with which to look up data. They hold significance, but typically not as much as the data that is looked up with it. What you've described is very much like (and quite literally the same as, in the case of many dynamic languages) thinking of a variable's name as the actual data, and thinking of the value it holds merely as "ancillary data".
>
> Keep in mind too, even with a regular array, the index can still hold significance as data. For instace, I could have an array of Foo's, declare that any element with an odd index has property 'A' and any with an even index has property 'B', and treat them as such. May seem strange at a glance, but such things are common in low-level, low-resoruce and performance-oriented code. Bottom line, though, is that "Property 'A' or 'B'" is data that now been encoded in the array's index, but despite that, the indicies still aren't considered the array's elements. And the data they lookup still isn't considered "ancillary data".
>
> And yes, a Hashed Set may likely be *implemented* as an AA with just keys, but that's just form, it doesn't imply a similarity in regard to function. The *function* of a HashSet is that of an unordered array that's been optimized for "contains X / doesn't contain X" on large data sets.

> Containers and their functions:
> - AA: Store A's with label B, A is fairly important, B may or may not be.
> - Array: Store A's with label B, A is fairly important, B may or may not be,
> B has certain restrictions, container overall has different performance
> characteristacs from an AA.
> - Hashed Set: Store A's.


All I am trying to say is that there are multiple ways of looking at
the functionality offered by different containers.
And there exists a way of looking at it where the Python-style 'in'
operator can be seen as behaving consistently.

You asserted it is inconsistent.  I'm just saying it's only inconsistent if you insist that one particular way of looking at the containers is the "right" way.

--bb
1 2 3
Next ›   Last »