August 28, 2008
On 2008-08-28 00:24:50 +0200, Dee Girl <deegirl@noreply.com> said:

> Derek Parnell Wrote:
> 
>> On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:
>> 
>>> The way I see it, encapsulation is all about the black box idea. And the
>>> only things you can see from outside the black box are the inputs and
>>> outputs.
>> 
>> Well said.
> 
> I am sorry I will say my opinion. This sounds good but is simplistic. Black box is good in principle. But it assume you found right interface for the black box. If you define bad interface you have a bad black box. Also please remember that iterator is black box also. But it defines right interface.

I agree with the meaning, but I disagree with the example.
I think that iterators are an example of bad interface, as also others brought up the iterator as good example I though that I should say something.

An iterator should be like a generator, have a method next, and one at_end or something similar packaged (and maybe prev() and at_start() if it can also go back) in a single struct, furthermore it should work seamlessly with a kind of for_each(x;iterator) construct.

Instead C++ choose to have begin & end iterators, simply because with that construct it is trivial for the compiler to optimize it for arrays, and you can use pointers as iterators without a cast/constructor.

This means a worse interface for 99% of the uses, apart form arrays and vectors I think one is better off without end iterator, and even when this is not the case writing something like for_each(x;FromTo(a,b)), with FromTo constructing a generator is (I think) better than for(i=t.begin();i!=t.end();++t), and the implementation of an a generator itself is easier (no ==,!=,increment, decrement(pre/post),... for performance reasons)

As I believe that the optimizations to make the better interface be as efficient as the iterator one are perfectly doable (some work, yes, but not extremely much so), I see no good reason for C++ design.

Fawzi

August 28, 2008
On 2008-08-28 11:47:22 +0200, Fawzi Mohamed <fmohamed@mac.com> said:

> On 2008-08-28 00:24:50 +0200, Dee Girl <deegirl@noreply.com> said:
> 
>> Derek Parnell Wrote:
>> 
>>> On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:
>>> 
>>>> The way I see it, encapsulation is all about the black box idea. And the
>>>> only things you can see from outside the black box are the inputs and
>>>> outputs.
>>> 
>>> Well said.
>> 
>> I am sorry I will say my opinion. This sounds good but is simplistic. Black box is good in principle. But it assume you found right interface for the black box. If you define bad interface you have a bad black box. Also please remember that iterator is black box also. But it defines right interface.
> 
> I agree with the meaning, but I disagree with the example.
> I think that iterators are an example of bad interface, as also others brought up the iterator as good example I though that I should say something.
> 
> An iterator should be like a generator, have a method next, and one at_end or something similar packaged (and maybe prev() and at_start() if it can also go back) in a single struct, furthermore it should work seamlessly with a kind of for_each(x;iterator) construct.
> 
> Instead C++ choose to have begin & end iterators, simply because with that construct it is trivial for the compiler to optimize it for arrays, and you can use pointers as iterators without a cast/constructor.
> 
> This means a worse interface for 99% of the uses, apart form arrays and vectors I think one is better off without end iterator, and even when this is not the case writing something like for_each(x;FromTo(a,b)), with FromTo constructing a generator is (I think) better than for(i=t.begin();i!=t.end();++t), and the implementation of an a generator itself is easier (no ==,!=,increment, decrement(pre/post),... for performance reasons)
> 
> As I believe that the optimizations to make the better interface be as efficient as the iterator one are perfectly doable (some work, yes, but not extremely much so), I see no good reason for C++ design.
> 
> Fawzi

Please note before the discussion gets messy, that I find the hierarchy and kinds of iterators in c++ well done, (input,output,...random), but the interface of them (especially the most used) flawed.

August 28, 2008
Nick Sabalausky wrote:
> "Don" <nospam@nospam.com.au> wrote in message news:g95ks5$2aon$1@digitalmars.com...
>> Nick Sabalausky wrote:
>>> "Dee Girl" <deegirl@noreply.com> wrote in message news:g94j7a$2875$1@digitalmars.com...
>>>> I appreciate your view point. Please allow me explain. The view point is in opposition with STL. In STL each algorithm defines what kind of iterator it operates with. And it requires what iterator complexity.
>>>>
>>>> I agree that other design can be made. But STL has that design. In my opinion is much part of what make STL so successful.
>>>>
>>>> I disagree that algorithm that knows complexity of iterator is concrete. I think exactly contrary. Maybe it is good that you read book about STL by Josuttis. STL algorithms are the most generic I ever find in any language. I hope std.algorithm in D will be better. But right now std.algorithm works only with array.
>>>>
>>>>> If an algoritm uses [] and doesn't know the
>>>>> complexity of the []...good! It shouldn't know, and it shouldn't care. It's
>>>>> the code that sends the collection to the algoritm that knows and cares.
>>>> I think this is mistake. Algorithm should know. Otherwise "linear find" is not "linear find"! It is "cuadratic find" (spell?). If you want to define something called linear find then you must know iterator complexity.
>>>>
>>> If a generic algorithm describes itself as "linear find" then I know damn well that it's referring to the behavior of *just* the function itself, and is not a statement that the function *combined* with the behavior of the collection and/or a custom comparison is always going to be O(n).
>>>
>>> A question about STL: If I create a collection that, internally, is like a linked list, but starts each indexing operation from the position of the last indexing operation (so that a "find first" would run in O(n) instead of O(n*n)), is it possible to send that collection to STL's generic "linear find first"? I would argue that it should somehow be possible *even* if the STL's generic "linear find first" guarantees a *total* performance of O(n) (Since, in this case, it would still be O(n) anyway). Because otherwise, the STL wouldn't be very extendable, which would be a bad thing for a library of "generic" algorithms.
>> Yes, it will work.
>>
>>> Another STL question: It is possible to use STL to do a "linear find" using a custom comparison? If so, it is possible to make STL's "linear find" function use a comparison that just happens to be O(n)? If so, doesn't that violate the linear-time guarantee, too? If not, how does it know that the custom comparison is O(n) instead of O(1) or O(log n)?
>> This will work too.
>>
>> IF you follow the conventions THEN the STL gives you the guarantees.
> 
> I'm not sure that's really a "guarantee" per se, but that's splitting hairs.
> 
> In any case, it sounds like we're all arguing more or less the same point:
> 
> Setting aside the issue of "should opIndex be used and when?", suppose I have the following collection interface and find function (not guaranteed to compile):
> 
> interface ICollection(T)
> {
>     T getElement(index);
>     int getSize();
> }
> 
> int find(T)(ICollection(T) c, T elem)
> {
>     for(int i=0; i<c.size(); i++)
>     {
>  if(c.getElement(i) == elem)
>             return i;
>     }
> }
> 
> It sounds like STL's approach is to do something roughly like that and say:
> 
> "find()'s parameter 'c' should be an ICollection for which getElement() is O(1), in which case find() is guaranteed to be O(n)"
> 
> What I've been advocating is, again, doing something like the code above and saying:
> 
> "find()'s complexity is dependant on the complexity of the ICollection's getElement(). If getElement()'s complexity is O(m), then find()'s complexity is guaranteed to be O(m * n). Of course, this means that the only way to get ideal complexity from find() is to use an ICollection for which getElement() is O(1)".
> 
> But, you see, those two statements are effectively equivilent.

They are. But...
if you don't adhere to the conventions, your code gets really hard to reason about.

"This class has an opIndex which is in O(n). Is that OK?" Well, that depends on what it's being used for. So you have to look at all of the places where it is used.

It's much simpler to use the convention that opIndex _must_ be fast; this way the performance requirements for containers and algorithms are completely decoupled from each other. It's about good design.


August 28, 2008
bearophile wrote:

>> Biggest of all: using the wrong tool. I.e. using a hash map for a maximal populated key range.
> 
> But if the hash machinery is good it must work well in this common situation too.

This statement seems to be as true as the statement:

    But if a house is good it must perform well
    in the common situation of an off shore
    speed boat race.

The problem with your approach is, that you have close to no idea, whether your candidates are speed boats or houses. The only thing you know seems to be, that in both candidates some humans can live for some time.

Your first design seems to have placed both candidates off shore. Now you have introduced some randomness in the location.

However, you might only be designing some overly complicated tool for computing the percentage of landmass not covered by water on some random planet.

-manfred

--
If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)

August 28, 2008
On 2008-08-28 05:47:22 -0400, Fawzi Mohamed <fmohamed@mac.com> said:

> An iterator should be like a generator, have a method next, and one at_end or something similar packaged (and maybe prev() and at_start() if it can also go back) in a single struct, furthermore it should work seamlessly with a kind of for_each(x;iterator) construct.

I perfectly agree with this. This is why I prefer much Objective-C's NSEnumerator over C++ iterators: they're much simpler to use.

Some time ago, Walter asked if he could change D arrays to have internally an end pointer instead of a length value. He said it was to allow arrays, or slices, to be used as iterator efficiently. I hope he hasn't given up on that idea since slices are much easier to understand and use than C++-style iterators.


> Instead C++ choose to have begin & end iterators, simply because with that construct it is trivial for the compiler to optimize it for arrays, and you can use pointers as iterators without a cast/constructor.

I would add that if you are using many iterators for the same array (vector) in a given algorithm or in function parameters, having only one "end" pointer save some space.


-- 
Michel Fortin  michel.fortin@michelf.com  http://michelf.com/

August 28, 2008
bearophile wrote:
> Tomas Lindquist Olsen:
>> Might I ask where that site is?
> 
> I have sent you an email with information and more things, etc.
> 
> Bye,
> bearophile

Got it. Thanx :) I'll give it a go over the weekend :)

Tomas
August 28, 2008
On 2008-08-28 13:33:47 +0200, "Manfred_Nowak" <svv1999@hotmail.com> said:

> Walter Bright wrote:
> 
>> I invite you to look at the code in internal/aaA.d and do some
>> testing!
> 
> This invitation is a red herring without the offer to change the
> language, because there exists no implementation for AA covering all
> possible use cases.
> 
> The bare minimum to get anything out of fiddling with the
> implemenations for AA is the posibility to use the results of adaptions
> without considerable overhead especially for the declarations. However,
> currenty I do not see any elegant solution, because types of
> implementation of maps are given implicitely.
> 
> At least something like prototyping seems to be necessary:
> 
> |    int[ char[]] map;
> |    Prototype= typeof( map);
> |    Prototype.implementation= MyAA;
> 
> where MyAA is some class type implementing the interface required for
> AA.
> 
> Are you willing to do something in this direction?

I think that the invitation should be read as the possibility to experiment with some changes for AA, see their effect, and if worthwhile provide them back, so that they can be applied to the "official" version.

making the standard version changeable seems just horrible form the portability and maintainability and clarity of the code: if the standard version is not ok for your use you should explicitly use another one, otherwise mixing codes that use two different standard versions becomes a nightmare.
On the other hand if you think that you can improve the standard version for everybody, changing internal/aaA.d is what you should do...

Fawzi
> 
> -manfred


August 28, 2008
"Nick Sabalausky" wrote
> "Steven Schveighoffer" wrote in message news:g95b89$1jii$1@digitalmars.com...
>>
>> When someone sees an index operator the first thought is that it is a quick lookup.
>
> This seems to be a big part of the disagreement. Personally, I think it's insane to look at [] and just assume it's a cheap lookup. That's was true in pure C where, as someone else said, it was nothing more than a shorthand for a specific pointer arithmentic operation, but carrying that idea over to more modern languages (especially one that also uses [] for associative arrays) is a big mistake.

Perhaps it is a mistake to assume it, but it is a common mistake.  And the expectation is intuitive.  You don't see people making light switches that look like outlets, even though it's possible.  You might perhaps make a library where opIndex is a linear search in your list, but I would expect that people would not use that indexing feature correctly.  Just as if I plug my lamp into the light switch that looks like an outlet, I'd expect it to get power, and be confused when it doesn't.  Except the opIndex mistake is more subtle because I *do* get what I actually want, but I just am not realizing the cost of it.

> The '+' operator means "add". Addition is typically O(1). But vectors can be added, and that's an O(n) operation. Should opAdd never be used for vectors?

As long as it's addition, I have no problem with O(n) operation (and it depends on your view of n).  Indexing implies speed, look at all other cases where indexing is used.  For addition to be proportional to the size of the element is natural and expected.

>> You can force yourself to think differently, but the reality is that most people think that because of the universal usage of square brackets (except for VB, and I feel pity for anyone who needs to use VB) to mean 'lookup by key', and usually this is only useful on objects where the lookup is quick ( < O(n) ).  Although there is no requirement, nor enforcement, the 'quick' contract is expected by the user, no matter how much docs you throw at them.
>>
>> Look, for instance, at Tango's now-deprecated LinkMap, which uses a linked-list of key/value pairs (copied from Doug Lea's implementation). Nobody in their right mind would use link map because lookups are O(n), and it's just as easy to use a TreeMap or HashMap.  Would you ever use it?
>>
>>> If you *really* need that sort of guarantee (and I can imagine it may be useful in some cases), then the implementation of the guarantee does *not* belong in the realm of "implements vs doesn't-implement a particular operator overload". Doing so is an abuse of operator overloading, since operator overloading is there for defining syntactic sugar, not for acting as a makeshift contract.
>>
>> I don't think anybody is asking for a guarantee from the compiler or any specific tool.  I think what we are saying is that violating the 'opIndex is fast' notion is bad design because you end up with users thinking they are doing something that's quick.  You end up with people posting benchmarks on your containers saying 'why does python beat the pants off your list implementation?'.  You can say 'hey, it's not meant to be used that way', but then why can the user use it that way?  A better design is to nudge the user into using a correct container for the job by only supporting operations that make sense on the collections.
>>
>> And as far as operator semantic meaning, D's operators are purposely named after what they are supposed to do.  Notice that the operator for + is opAdd, not opPlus.  This is because opAdd is supposed to mean you are performing an addition operation.  Assigning a different semantic meaning is not disallowed by the compiler, but is considered bad design.  opIndex is supposed to be an index function, not a linear search.  It's not called opSearch for a reason.  Sure you can redefine it however you want semantically, but it's considered bad design.  That's all we're saying.
>>
>
> Nobody is suggesting using [] to invoke a search (Although we have talked about using [] *in the search function's implementation*). Search means you want to get the position of a given element, or in other words, "element" -> search -> "key/index". What we're talking about is the reverse: getting the element at a given position, ie, "key/index" -> [] -> "element". It doesn't matter if it's an array, a linked list, or a super-duper-collection-from-2092: that's still indexing, not searching.

It is a search, you are searching for the element whose key matches.  When the key can be used to lookup the element efficiently, we call that indexing.  Indexing is a more constrained form of searching.

-Steve


August 28, 2008
Manfred_Nowak:
> This statement seems to be as true as the statement:<

No, it's a different statement.
I'll try to post another more real-looking benchmark later, anyway.

In the meantime I'll keep using Python for many of my purposes where I need hash maps and sets (and regular expressions, etc), instead of D, because in tons of real-world tests I have seen Python dicts are quite faster.

Bye,
bearophile
August 28, 2008
2008/8/28 Fawzi Mohamed <fmohamed@mac.com>:
> On 2008-08-28 00:24:50 +0200, Dee Girl <deegirl@noreply.com> said:
>
>> Derek Parnell Wrote:
>>
>>> On Wed, 27 Aug 2008 17:08:47 -0400, Nick Sabalausky wrote:
>>>
>>>> The way I see it, encapsulation is all about the black box idea. And the only things you can see from outside the black box are the inputs and outputs.
>>>
>>> Well said.
>>
>> I am sorry I will say my opinion. This sounds good but is simplistic. Black box is good in principle. But it assume you found right interface for the black box. If you define bad interface you have a bad black box. Also please remember that iterator is black box also. But it defines right interface.
>
> I agree with the meaning, but I disagree with the example.
> I think that iterators are an example of bad interface, as also others
> brought up the iterator as good example I though that I should say
> something.
>
> An iterator should be like a generator, have a method next, and one at_end
> or something similar packaged (and maybe prev() and at_start() if it can
> also go back) in a single struct, furthermore it should work seamlessly with
> a kind of for_each(x;iterator) construct.
>
> Instead C++ choose to have begin & end iterators, simply because with that construct it is trivial for the compiler to optimize it for arrays, and you can use pointers as iterators without a cast/constructor.
>
> This means a worse interface for 99% of the uses, apart form arrays and
> vectors I think one is better off without end iterator, and even when this
> is not the case writing something like for_each(x;FromTo(a,b)), with FromTo
> constructing a generator is (I think) better than
> for(i=t.begin();i!=t.end();++t), and the implementation of an a generator
> itself is easier (no ==,!=,increment, decrement(pre/post),... for
> performance reasons)

It is worse in simple "for each element" uses, but one of the cooler things about C++ iterators is exactly that they are exchangeable for any pointer-like object, and that it's trivial to iterate over a subset of a container. Emulating pointers means that in some ways, once you grok how they work, they are more trivial than e.g. Python or D since there is no further specialised syntax beyond "*", "++", and maybe "--".

I love the concise D and Python syntaxes, just playing devil's advocate here. :)


>
> As I believe that the optimizations to make the better interface be as efficient as the iterator one are perfectly doable (some work, yes, but not extremely much so), I see no good reason for C++ design.
>
> Fawzi
>
>