View mode: basic / threaded / horizontal-split · Log in · Help
August 28, 2008
Re: Why Strings as Classes?
"Don" <nospam@nospam.com.au> wrote in message 
news:g95td3$2tu0$1@digitalmars.com...
> Nick Sabalausky wrote:
>> "Don" <nospam@nospam.com.au> wrote in message 
>> news:g95ks5$2aon$1@digitalmars.com...
>>> Nick Sabalausky wrote:
>>>> "Dee Girl" <deegirl@noreply.com> wrote in message 
>>>> news:g94j7a$2875$1@digitalmars.com...
>>>>> I appreciate your view point. Please allow me explain. The view point 
>>>>> is in opposition with STL. In STL each algorithm defines what kind of 
>>>>> iterator it operates with. And it requires what iterator complexity.
>>>>>
>>>>> I agree that other design can be made. But STL has that design. In my 
>>>>> opinion is much part of what make STL so successful.
>>>>>
>>>>> I disagree that algorithm that knows complexity of iterator is 
>>>>> concrete. I think exactly contrary. Maybe it is good that you read 
>>>>> book about STL by Josuttis. STL algorithms are the most generic I ever 
>>>>> find in any language. I hope std.algorithm in D will be better. But 
>>>>> right now std.algorithm works only with array.
>>>>>
>>>>>> If an algoritm uses [] and doesn't know the
>>>>>> complexity of the []...good! It shouldn't know, and it shouldn't 
>>>>>> care. It's
>>>>>> the code that sends the collection to the algoritm that knows and 
>>>>>> cares.
>>>>> I think this is mistake. Algorithm should know. Otherwise "linear 
>>>>> find" is not "linear find"! It is "cuadratic find" (spell?). If you 
>>>>> want to define something called linear find then you must know 
>>>>> iterator complexity.
>>>>>
>>>> If a generic algorithm describes itself as "linear find" then I know 
>>>> damn well that it's referring to the behavior of *just* the function 
>>>> itself, and is not a statement that the function *combined* with the 
>>>> behavior of the collection and/or a custom comparison is always going 
>>>> to be O(n).
>>>>
>>>> A question about STL: If I create a collection that, internally, is 
>>>> like a linked list, but starts each indexing operation from the 
>>>> position of the last indexing operation (so that a "find first" would 
>>>> run in O(n) instead of O(n*n)), is it possible to send that collection 
>>>> to STL's generic "linear find first"? I would argue that it should 
>>>> somehow be possible *even* if the STL's generic "linear find first" 
>>>> guarantees a *total* performance of O(n) (Since, in this case, it would 
>>>> still be O(n) anyway). Because otherwise, the STL wouldn't be very 
>>>> extendable, which would be a bad thing for a library of "generic" 
>>>> algorithms.
>>> Yes, it will work.
>>>
>>>> Another STL question: It is possible to use STL to do a "linear find" 
>>>> using a custom comparison? If so, it is possible to make STL's "linear 
>>>> find" function use a comparison that just happens to be O(n)? If so, 
>>>> doesn't that violate the linear-time guarantee, too? If not, how does 
>>>> it know that the custom comparison is O(n) instead of O(1) or O(log n)?
>>> This will work too.
>>>
>>> IF you follow the conventions THEN the STL gives you the guarantees.
>>
>> I'm not sure that's really a "guarantee" per se, but that's splitting 
>> hairs.
>>
>> In any case, it sounds like we're all arguing more or less the same 
>> point:
>>
>> Setting aside the issue of "should opIndex be used and when?", suppose I 
>> have the following collection interface and find function (not guaranteed 
>> to compile):
>>
>> interface ICollection(T)
>> {
>>     T getElement(index);
>>     int getSize();
>> }
>>
>> int find(T)(ICollection(T) c, T elem)
>> {
>>     for(int i=0; i<c.size(); i++)
>>     {
>>  if(c.getElement(i) == elem)
>>             return i;
>>     }
>> }
>>
>> It sounds like STL's approach is to do something roughly like that and 
>> say:
>>
>> "find()'s parameter 'c' should be an ICollection for which getElement() 
>> is O(1), in which case find() is guaranteed to be O(n)"
>>
>> What I've been advocating is, again, doing something like the code above 
>> and saying:
>>
>> "find()'s complexity is dependant on the complexity of the ICollection's 
>> getElement(). If getElement()'s complexity is O(m), then find()'s 
>> complexity is guaranteed to be O(m * n). Of course, this means that the 
>> only way to get ideal complexity from find() is to use an ICollection for 
>> which getElement() is O(1)".
>>
>> But, you see, those two statements are effectively equivilent.
>
> They are. But...
> if you don't adhere to the conventions, your code gets really hard to 
> reason about.
>
> "This class has an opIndex which is in O(n). Is that OK?" Well, that 
> depends on what it's being used for. So you have to look at all of the 
> places where it is used.
>
> It's much simpler to use the convention that opIndex _must_ be fast; this 
> way the performance requirements for containers and algorithms are 
> completely decoupled from each other. It's about good design.
>

Taking a slight detour, let me ask you this... Which of the following 
strategies do you consider to be better:

//-- A --
value = 0;
for(int i=1; i<=10; i++)
{
   value += i*2;
}

//-- B --
value = sum(map(1..10, {n * 2}));

Both strategies compute the sum of the first 10 multiples of 2.

Strategy A makes the low-level implementation details very clear, but IMO, 
it comes at the expense of high-level clarity. This is because the code 
intermixes the high-level "what I want to accomplish?" with the low-level 
details.

Strategy B much more closely resembles the high-level desired result, and 
thus makes the high-level intent more clear. But this comes at the cost of 
hiding the low-level details behind a layer of abstraction.

I may very well be wrong on this, but from what you've said it sounds like 
you (as well as the other people who prefer [] to never be O(n)) are the 
type of coder who would prefer "Strategy A". In that case, I can completely 
understand your viewpoint on opIndex, even though I don't agree with it (I'm 
a "Strategy B" kind of person).

Of course, if I'm wrong on that assumption, then we're back to square one ;)
August 28, 2008
Re: Why Strings as Classes?
"Nick Sabalausky" wrote
> "Steven Schveighoffer" wrote in message 
> news:g9650h$cp9$1@digitalmars.com...
>> "Nick Sabalausky" wrote
>>> "Steven Schveighoffer" wrote in message 
>>> news:g95b89$1jii$1@digitalmars.com...
>>>>
>>>> When someone sees an index operator the first thought is that it is a 
>>>> quick lookup.
>>>
>>> This seems to be a big part of the disagreement. Personally, I think 
>>> it's insane to look at [] and just assume it's a cheap lookup. That's 
>>> was true in pure C where, as someone else said, it was nothing more than 
>>> a shorthand for a specific pointer arithmentic operation, but carrying 
>>> that idea over to more modern languages (especially one that also uses 
>>> [] for associative arrays) is a big mistake.
>>
>> Perhaps it is a mistake to assume it, but it is a common mistake. And the 
>> expectation is intuitive.
>
> Why in the world would any halfway competent programmer ever look at a 
> *linked list* and assume that the linked lists's [] (if implemented) is 
> O(1)?

You are writing this function:

void foo(IOrderedContainer cont)
{
   ....
}

IOrderedContainer implements opIndex(uint).  The problem is that you can't 
tell whether the object itself is a list or not, so you are powerless to 
make the decision as to whether the container has fast indexing.  In that 
case, your only choice (if speed is an issue) is to not use opIndex.

> As for the risk that could create of accidentially sending a linked list 
> to a "search" (ie, a "search for an element which contains data X") that 
> uses [] internally instead of iterators (but then, why wouldn't it just 
> use iterators anyway?): I'll agree that in a case like this there should 
> be some mechanism for automatic choosing of an algorithm, but that 
> mechanism should be at a separate level of abstraction. There would be a 
> function "search" that, through either RTTI or template constraints or 
> something else, says "does collection 'c' implement 
> ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the 
> collection have attribute ForewardDirectionIndexingComplexity that is set 
> equal to Complexity.Constant?", and based on that passes control to either 
> IndexingSearch or IteratorSearch.

To me, this is a bad design.  It's my opinion, but one that is shared among 
many people.  You can do stuff this way, but it is not intuitive.  I'd much 
rather reserve opIndex to only quick lookups, and avoid the possibility of 
accidentally using it incorrectly.

In general, I'd say if you are using lists and frequently looking up the nth 
value in the list, you have chosen the wrong container for the job.

>>  You don't see people making light switches that look like outlets, even 
>> though it's possible.  You might perhaps make a library where opIndex is 
>> a linear search in your list, but I would expect that people would not 
>> use that indexing feature correctly.  Just as if I plug my lamp into the 
>> light switch that looks like an outlet, I'd expect it to get power, and 
>> be confused when it doesn't.  Except the opIndex mistake is more subtle 
>> because I *do* get what I actually want, but I just am not realizing the 
>> cost of it.
>>
>>> The '+' operator means "add". Addition is typically O(1). But vectors 
>>> can be added, and that's an O(n) operation. Should opAdd never be used 
>>> for vectors?
>>
>> As long as it's addition, I have no problem with O(n) operation (and it 
>> depends on your view of n).  Indexing implies speed, look at all other 
>> cases where indexing is used.  For addition to be proportional to the 
>> size of the element is natural and expected.
>>
>
> When people look at '+' they typically think "integer/float addition". Why 
> would, for example, the risk of mistaking an O(n) "big int" addition for 
> an O(1) integer/float addition be any worse than the risk of mistaking an 
> O(n) linked list "get element at index" for an O(1) array "get element at 
> index"?

What good are integers that can't be added?  In this case, it is not 
possible to have quick addition, no matter how you implement your 
arbitrary-precision integer.  I think the time penalty is understood and 
accepted.  With opIndex, the time penalty is not expected.  Like it or not, 
this is how many users look at it.

>>>> You can force yourself to think differently, but the reality is that 
>>>> most people think that because of the universal usage of square 
>>>> brackets (except for VB, and I feel pity for anyone who needs to use 
>>>> VB) to mean 'lookup by key', and usually this is only useful on objects 
>>>> where the lookup is quick ( < O(n) ).  Although there is no 
>>>> requirement, nor enforcement, the 'quick' contract is expected by the 
>>>> user, no matter how much docs you throw at them.
>>>>
>>>> Look, for instance, at Tango's now-deprecated LinkMap, which uses a 
>>>> linked-list of key/value pairs (copied from Doug Lea's implementation). 
>>>> Nobody in their right mind would use link map because lookups are O(n), 
>>>> and it's just as easy to use a TreeMap or HashMap.  Would you ever use 
>>>> it?
>>>>
>>>>> If you *really* need that sort of guarantee (and I can imagine it may 
>>>>> be useful in some cases), then the implementation of the guarantee 
>>>>> does *not* belong in the realm of "implements vs doesn't-implement a 
>>>>> particular operator overload". Doing so is an abuse of operator 
>>>>> overloading, since operator overloading is there for defining 
>>>>> syntactic sugar, not for acting as a makeshift contract.
>>>>
>>>> I don't think anybody is asking for a guarantee from the compiler or 
>>>> any specific tool.  I think what we are saying is that violating the 
>>>> 'opIndex is fast' notion is bad design because you end up with users 
>>>> thinking they are doing something that's quick.  You end up with people 
>>>> posting benchmarks on your containers saying 'why does python beat the 
>>>> pants off your list implementation?'.  You can say 'hey, it's not meant 
>>>> to be used that way', but then why can the user use it that way?  A 
>>>> better design is to nudge the user into using a correct container for 
>>>> the job by only supporting operations that make sense on the 
>>>> collections.
>>>>
>>>> And as far as operator semantic meaning, D's operators are purposely 
>>>> named after what they are supposed to do.  Notice that the operator for 
>>>> + is opAdd, not opPlus.  This is because opAdd is supposed to mean you 
>>>> are performing an addition operation.  Assigning a different semantic 
>>>> meaning is not disallowed by the compiler, but is considered bad 
>>>> design. opIndex is supposed to be an index function, not a linear 
>>>> search.  It's not called opSearch for a reason.  Sure you can redefine 
>>>> it however you want semantically, but it's considered bad design. 
>>>> That's all we're saying.
>>>>
>>>
>>> Nobody is suggesting using [] to invoke a search (Although we have 
>>> talked about using [] *in the search function's implementation*). Search 
>>> means you want to get the position of a given element, or in other 
>>> words, "element" -> search -> "key/index". What we're talking about is 
>>> the reverse: getting the element at a given position, ie, "key/index" -> 
>>> [] -> "element". It doesn't matter if it's an array, a linked list, or a 
>>> super-duper-collection-from-2092: that's still indexing, not searching.
>>
>> It is a search, you are searching for the element whose key matches. 
>> When the key can be used to lookup the element efficiently, we call that 
>> indexing.  Indexing is a more constrained form of searching.
>>
>
> If you've got a linked list, and you want to get element N, are you 
> *really* going to go reaching for a function named "search"? How often do 
> you really see a generic function named "search" or "find" that takes a 
> numeric index as a the "to be found" parameter instead of something to be 
> matched against the element's value? I would argue that that would be 
> confusing for most people. Like I said in a different post farther down, 
> the implementation of a "getAtIndex()" is obviously going to work like a 
> search, but from "outside the box", what you're asking for is not the 
> same.

If you are indexing into a tree, it is considered a binary search, if you 
are indexing into a hash, it is a search at some point to deal with 
collisions.  People don't think about indexing as being a search, but in 
reality it is.  A really fast search.

And I don't think search would be the name of the member function, it should 
be something like 'getNth', which returns a cursor that points to the 
element.

-Steve
August 28, 2008
Re: Why Strings as Classes?
"Nick Sabalausky" wrote
> "Don" <nospam@nospam.com.au> wrote in message 
> news:g95td3$2tu0$1@digitalmars.com...
>> Nick Sabalausky wrote:
>>> "Don" <nospam@nospam.com.au> wrote in message 
>>> news:g95ks5$2aon$1@digitalmars.com...
>>>> Nick Sabalausky wrote:
>>>>> "Dee Girl" <deegirl@noreply.com> wrote in message 
>>>>> news:g94j7a$2875$1@digitalmars.com...
>>>>>> I appreciate your view point. Please allow me explain. The view point 
>>>>>> is in opposition with STL. In STL each algorithm defines what kind of 
>>>>>> iterator it operates with. And it requires what iterator complexity.
>>>>>>
>>>>>> I agree that other design can be made. But STL has that design. In my 
>>>>>> opinion is much part of what make STL so successful.
>>>>>>
>>>>>> I disagree that algorithm that knows complexity of iterator is 
>>>>>> concrete. I think exactly contrary. Maybe it is good that you read 
>>>>>> book about STL by Josuttis. STL algorithms are the most generic I 
>>>>>> ever find in any language. I hope std.algorithm in D will be better. 
>>>>>> But right now std.algorithm works only with array.
>>>>>>
>>>>>>> If an algoritm uses [] and doesn't know the
>>>>>>> complexity of the []...good! It shouldn't know, and it shouldn't 
>>>>>>> care. It's
>>>>>>> the code that sends the collection to the algoritm that knows and 
>>>>>>> cares.
>>>>>> I think this is mistake. Algorithm should know. Otherwise "linear 
>>>>>> find" is not "linear find"! It is "cuadratic find" (spell?). If you 
>>>>>> want to define something called linear find then you must know 
>>>>>> iterator complexity.
>>>>>>
>>>>> If a generic algorithm describes itself as "linear find" then I know 
>>>>> damn well that it's referring to the behavior of *just* the function 
>>>>> itself, and is not a statement that the function *combined* with the 
>>>>> behavior of the collection and/or a custom comparison is always going 
>>>>> to be O(n).
>>>>>
>>>>> A question about STL: If I create a collection that, internally, is 
>>>>> like a linked list, but starts each indexing operation from the 
>>>>> position of the last indexing operation (so that a "find first" would 
>>>>> run in O(n) instead of O(n*n)), is it possible to send that collection 
>>>>> to STL's generic "linear find first"? I would argue that it should 
>>>>> somehow be possible *even* if the STL's generic "linear find first" 
>>>>> guarantees a *total* performance of O(n) (Since, in this case, it 
>>>>> would still be O(n) anyway). Because otherwise, the STL wouldn't be 
>>>>> very extendable, which would be a bad thing for a library of "generic" 
>>>>> algorithms.
>>>> Yes, it will work.
>>>>
>>>>> Another STL question: It is possible to use STL to do a "linear find" 
>>>>> using a custom comparison? If so, it is possible to make STL's "linear 
>>>>> find" function use a comparison that just happens to be O(n)? If so, 
>>>>> doesn't that violate the linear-time guarantee, too? If not, how does 
>>>>> it know that the custom comparison is O(n) instead of O(1) or O(log 
>>>>> n)?
>>>> This will work too.
>>>>
>>>> IF you follow the conventions THEN the STL gives you the guarantees.
>>>
>>> I'm not sure that's really a "guarantee" per se, but that's splitting 
>>> hairs.
>>>
>>> In any case, it sounds like we're all arguing more or less the same 
>>> point:
>>>
>>> Setting aside the issue of "should opIndex be used and when?", suppose I 
>>> have the following collection interface and find function (not 
>>> guaranteed to compile):
>>>
>>> interface ICollection(T)
>>> {
>>>     T getElement(index);
>>>     int getSize();
>>> }
>>>
>>> int find(T)(ICollection(T) c, T elem)
>>> {
>>>     for(int i=0; i<c.size(); i++)
>>>     {
>>>  if(c.getElement(i) == elem)
>>>             return i;
>>>     }
>>> }
>>>
>>> It sounds like STL's approach is to do something roughly like that and 
>>> say:
>>>
>>> "find()'s parameter 'c' should be an ICollection for which getElement() 
>>> is O(1), in which case find() is guaranteed to be O(n)"
>>>
>>> What I've been advocating is, again, doing something like the code above 
>>> and saying:
>>>
>>> "find()'s complexity is dependant on the complexity of the ICollection's 
>>> getElement(). If getElement()'s complexity is O(m), then find()'s 
>>> complexity is guaranteed to be O(m * n). Of course, this means that the 
>>> only way to get ideal complexity from find() is to use an ICollection 
>>> for which getElement() is O(1)".
>>>
>>> But, you see, those two statements are effectively equivilent.
>>
>> They are. But...
>> if you don't adhere to the conventions, your code gets really hard to 
>> reason about.
>>
>> "This class has an opIndex which is in O(n). Is that OK?" Well, that 
>> depends on what it's being used for. So you have to look at all of the 
>> places where it is used.
>>
>> It's much simpler to use the convention that opIndex _must_ be fast; this 
>> way the performance requirements for containers and algorithms are 
>> completely decoupled from each other. It's about good design.
>>
>
> Taking a slight detour, let me ask you this... Which of the following 
> strategies do you consider to be better:
>
> //-- A --
> value = 0;
> for(int i=1; i<=10; i++)
> {
>    value += i*2;
> }
>
> //-- B --
> value = sum(map(1..10, {n * 2}));
>
> Both strategies compute the sum of the first 10 multiples of 2.
>
> Strategy A makes the low-level implementation details very clear, but IMO, 
> it comes at the expense of high-level clarity. This is because the code 
> intermixes the high-level "what I want to accomplish?" with the low-level 
> details.
>
> Strategy B much more closely resembles the high-level desired result, and 
> thus makes the high-level intent more clear. But this comes at the cost of 
> hiding the low-level details behind a layer of abstraction.
>
> I may very well be wrong on this, but from what you've said it sounds like 
> you (as well as the other people who prefer [] to never be O(n)) are the 
> type of coder who would prefer "Strategy A". In that case, I can 
> completely understand your viewpoint on opIndex, even though I don't agree 
> with it (I'm a "Strategy B" kind of person).
>
> Of course, if I'm wrong on that assumption, then we're back to square one 
> ;)

For me at least, you are wrong :)  In fact, I view it the other way, you 
shouldn't have to care about the underlying implementation, as long as the 
runtime is well defined.  If you tell me strategy B may or may not take up 
to O(n^2) to compute, then you bet your ass I'm not going to even touch 
option B, 'cause I can always get O(n) time with option A :)  Your solution 
FORCES me to care about the details, it's not so much that I want to care 
about them.

-Steve
August 28, 2008
Re: Why Strings as Classes?
Nick Sabalausky wrote:
> Taking a slight detour, let me ask you this... Which of the following 
> strategies do you consider to be better:
> 
> //-- A --
> value = 0;
> for(int i=1; i<=10; i++)
> {
>     value += i*2;
> }
> 
> //-- B --
> value = sum(map(1..10, {n * 2}));
> 
> Both strategies compute the sum of the first 10 multiples of 2.
> 
> Strategy A makes the low-level implementation details very clear, but IMO, 
> it comes at the expense of high-level clarity. This is because the code 
> intermixes the high-level "what I want to accomplish?" with the low-level 
> details.
> 
> Strategy B much more closely resembles the high-level desired result, and 
> thus makes the high-level intent more clear. But this comes at the cost of 
> hiding the low-level details behind a layer of abstraction.

Didn't read the rest of the discussion, but I disagree here... Most 
programmers learn iterative languages first, and anyone whose taken 
Computer Science 101 can figure out what's going on in A. B takes a 
second to think about. I'm not into the zen of FP for sure, and that 
probably makes me a worse programmer... but I bet you if you take a 
random candidate for a development position, she'll be more likely to 
figure out (and write) A than B. [That may be projection; I haven't 
seen/done any studies]

The big problem IMO is the number of primitive things you need to 
understand. In A, you need to understand variables, looping and 
arithmetic operations. In B, you need to understand and think about 
closures/scoping, lists, the "map" function, aggregate functions, 
function compositions, and arithmetic operations. What hit me when first 
looking at it "where the **** did n come from?"

I'm not saying the functional style isn't perfect for a lot of things, 
I'm just saying that this not one of them.
August 28, 2008
Re: Why Strings as Classes?
"Steven Schveighoffer" <schveiguy@yahoo.com> wrote in message 
news:g96vmq$3cc$1@digitalmars.com...
> "Nick Sabalausky" wrote
>> "Steven Schveighoffer" wrote in message 
>> news:g9650h$cp9$1@digitalmars.com...
>>> Perhaps it is a mistake to assume it, but it is a common mistake. And 
>>> the expectation is intuitive.
>>
>> Why in the world would any halfway competent programmer ever look at a 
>> *linked list* and assume that the linked lists's [] (if implemented) is 
>> O(1)?
>
> You are writing this function:
>
> void foo(IOrderedContainer cont)
> {
>    ....
> }
>
> IOrderedContainer implements opIndex(uint).  The problem is that you can't 
> tell whether the object itself is a list or not, so you are powerless to 
> make the decision as to whether the container has fast indexing.  In that 
> case, your only choice (if speed is an issue) is to not use opIndex.
>

Ok, so you want foo() to be able to tell if the collection has fast or slow 
indexing. What are you suggesting that foo() does when the collection does 
have slow indexing?

1. Should it fail to compile because foo's implementation uses [] and the 
slow-indexing collection doesn't implement []?

Well then how does foo know that it's the most important, most frequent 
thing being done on the collection? Suppose foo is something that needs to 
access elements in a somewhat random order, ie the kind of thing that lists 
are poorly suited for. Further suppose that collection C is some set of data 
that *usually* just gets insertions and deletions at nodes that the code 
already has direct references to. Further suppose that foo does *need* to be 
run on the collection, *but* very infrequently. So, should I *really* be 
forced to make C a collection that trades good insertion/deletion complexity 
for good indexing complexity, just because the occasionally-run foo() 
doesn't like it? And what if I want to run benchmarks to test what 
collection works best, in real-world use, for C? Should foo's intolerance of 
slow-indexing collections really be able force me to exclude testing of such 
collections?

2. Should foo revert to an alternate branch of code that doesn't use []?

This behavior can be implemented via interfaces like I described. The 
benefit of that is that [] can still serve as the shorthand it's intended 
for (see below) and you never need to introduce the inconsistency of "Gee, 
how do I get the Nth element of a collection?" "Well, on some collections 
it's getNth(), and on other collections it's []."

3. Something else?

>> As for the risk that could create of accidentially sending a linked list 
>> to a "search" (ie, a "search for an element which contains data X") that 
>> uses [] internally instead of iterators (but then, why wouldn't it just 
>> use iterators anyway?): I'll agree that in a case like this there should 
>> be some mechanism for automatic choosing of an algorithm, but that 
>> mechanism should be at a separate level of abstraction. There would be a 
>> function "search" that, through either RTTI or template constraints or 
>> something else, says "does collection 'c' implement 
>> ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the 
>> collection have attribute ForewardDirectionIndexingComplexity that is set 
>> equal to Complexity.Constant?", and based on that passes control to 
>> either IndexingSearch or IteratorSearch.
>
> To me, this is a bad design.  It's my opinion, but one that is shared 
> among many people.  You can do stuff this way, but it is not intuitive. 
> I'd much rather reserve opIndex to only quick lookups, and avoid the 
> possibility of accidentally using it incorrectly.
>

Preventing a collection from ever being used in a function that would 
typically perform poorly on that collection just smacks of premature 
optimization. How do you, as the collection author, know that the collection 
will never be used in a way such that *occasional* use in certain specific 
sub-optimal a manner might actually be necessary and/or acceptable?

If you omit [] then you've burnt the bridge (so to speak) and your only 
recourse is to add a standardized "getNth()" to every single collection 
which clutters the interface, hinders integration with third-party 
collections and algorithms, and is likely to still suffer from idiots who 
think that "get Nth element" is always better than O(n) (see below).

> In general, I'd say if you are using lists and frequently looking up the 
> nth value in the list, you have chosen the wrong container for the job.
>

If you're frequently looking up random elements in a list, then yes, you're 
probably using the wrong container. But that's beside the point. Even if you 
only do it once: If you have a collection with a natural order, and you want 
to get the nth element, you should be able to use the standard "get element 
at index X" notation, [].

I don't care how many people go around using [] and thinking they're 
guaranteed to get a cheap computation from it. In a language that supports 
overloading of [], the [] means "get the element at key/index X". Especially 
in a language like D where using [] on an associative array can trigger an 
unbounded allocation and GC run. Using [] in D (and various other languages) 
can be expensive, period, even in the standard lib (assoc array). So looking 
at a [] and thinking "guaranteed cheap", is incorrect, period. If most 
people think 2+2=5, you're not going to redesign arithmetic to work around 
that mistaken assumption.

>>
>> When people look at '+' they typically think "integer/float addition". 
>> Why would, for example, the risk of mistaking an O(n) "big int" addition 
>> for an O(1) integer/float addition be any worse than the risk of 
>> mistaking an O(n) linked list "get element at index" for an O(1) array 
>> "get element at index"?
>
> What good are integers that can't be added?  In this case, it is not 
> possible to have quick addition, no matter how you implement your 
> arbitrary-precision integer.  I think the time penalty is understood and 
> accepted.  With opIndex, the time penalty is not expected.  Like it or 
> not, this is how many users look at it.
>
>>
>> If you've got a linked list, and you want to get element N, are you 
>> *really* going to go reaching for a function named "search"? How often do 
>> you really see a generic function named "search" or "find" that takes a 
>> numeric index as a the "to be found" parameter instead of something to be 
>> matched against the element's value? I would argue that that would be 
>> confusing for most people. Like I said in a different post farther down, 
>> the implementation of a "getAtIndex()" is obviously going to work like a 
>> search, but from "outside the box", what you're asking for is not the 
>> same.
>
> If you are indexing into a tree, it is considered a binary search, if you 
> are indexing into a hash, it is a search at some point to deal with 
> collisions.  People don't think about indexing as being a search, but in 
> reality it is.  A really fast search.
>

It's implemented as a search, but I'd argue that the input/output 
specifications are different. And yes, I suppose that does put it into a bit 
of a grey area. But I wouldn't go so far as to say that, to the caller, it's 
the same thing, because there are differences. If you want get an element 
based on it's position in the collection, you call one function. If you want 
to get an element based on it's content instead of it's position, that's 
another function. If you want to get the position of an element based on 
it's content or it's identity, that's one or two more functions (depending, 
of course, if the element is a value type or reference type, respectively).

> And I don't think search would be the name of the member function, it 
> should be something like 'getNth', which returns a cursor that points to 
> the element.
>

Right, and outside of pure C, [] is the shorthand for and the standardized 
name for "getNth". If someone automatically assumes [] to be a simple 
lookup, chances are they're going to make the same assumption about anything 
named along the lines of "getNth". After all, that's what [] does, it gets 
the Nth.
August 29, 2008
Re: Why Strings as Classes?
Robert Fraser <fraserofthenight@gmail.com> wrote:
> First, a hash access is already time unbounded. hash["hello"] where 
> "hello" is not already in the hash will create a hash entry for hello. 
> This requires heap allocation, which can take arbitrarily long. So 
> having unbounded opIndex is in the language already!

Hash's opIndex() throws an ArrayBoundsError if given an unknown key. 
It's opIndexAssign() which allocates.

-- 
SnakE
August 29, 2008
Re: Why Strings as Classes?
"Nick Sabalausky" wrote
> Ok, so you want foo() to be able to tell if the collection has fast or 
> slow indexing. What are you suggesting that foo() does when the collection 
> does have slow indexing?

No, I don't want to be able to tell.  I don't want to HAVE to be able to 
tell.  In my ideal world, the collection does not implement opIndex unless 
it is fast, so there is no issue.  i.e. you cannot call foo with a linked 
list.

I'm really tired of this argument, you understand my point of view, I 
understand yours.  To you, the syntax sugar is more important than the 
complexity guarantees.  To me, what the syntax intuitively means should be 
what it does.  So I'll develop my collections library and you develop yours, 
fair enough?  I don't think either of us is right or wrong in the strict 
sense of the terms.

To be fair, I'll answer your other points as you took the time to write 
them.  And then I'm done.  I can't really be any clearer as to what I 
believe is the best design.

> 1. Should it fail to compile because foo's implementation uses [] and the 
> slow-indexing collection doesn't implement []?

No, foo will always compile because opIndex should always be fast, and then 
I can specify the complexity of foo without worry.

Using an O(n) lookup operation should be more painful because it requires 
more time.  It makes users use it less.

> 2. Should foo revert to an alternate branch of code that doesn't use []?
>
> This behavior can be implemented via interfaces like I described. The 
> benefit of that is that [] can still serve as the shorthand it's intended 
> for (see below) and you never need to introduce the inconsistency of "Gee, 
> how do I get the Nth element of a collection?" "Well, on some collections 
> it's getNth(), and on other collections it's []."

I believe that you shouldn't really ever be calling getNth on a link-list, 
and if you are, it should be a red flag, like a cast.

Furthermore [] isn't always equivalent to getNth, see below.

>>> As for the risk that could create of accidentially sending a linked list 
>>> to a "search" (ie, a "search for an element which contains data X") that 
>>> uses [] internally instead of iterators (but then, why wouldn't it just 
>>> use iterators anyway?): I'll agree that in a case like this there should 
>>> be some mechanism for automatic choosing of an algorithm, but that 
>>> mechanism should be at a separate level of abstraction. There would be a 
>>> function "search" that, through either RTTI or template constraints or 
>>> something else, says "does collection 'c' implement 
>>> ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the 
>>> collection have attribute ForewardDirectionIndexingComplexity that is 
>>> set equal to Complexity.Constant?", and based on that passes control to 
>>> either IndexingSearch or IteratorSearch.
>>
>> To me, this is a bad design.  It's my opinion, but one that is shared 
>> among many people.  You can do stuff this way, but it is not intuitive. 
>> I'd much rather reserve opIndex to only quick lookups, and avoid the 
>> possibility of accidentally using it incorrectly.
>>
>
> Preventing a collection from ever being used in a function that would 
> typically perform poorly on that collection just smacks of premature 
> optimization. How do you, as the collection author, know that the 
> collection will never be used in a way such that *occasional* use in 
> certain specific sub-optimal a manner might actually be necessary and/or 
> acceptable?

It's not premature optimization, it's not offering a feature that has little 
or no use.  It's like any contract for any object, you only want to define 
the interface for which your object is designed.  A linked list should not 
have an opIndex because it's not designed to be indexed.

If I designed a new car with which you could steer each front wheel 
independently, would that make you buy it?  It's another feature that the 
car has that other cars don't.  Who cares if it's useful, its another 
*feature*!  Sometimes a good design is not that a feature is included but 
that a feature is *not* included.

> If you omit [] then you've burnt the bridge (so to speak) and your only 
> recourse is to add a standardized "getNth()" to every single collection 
> which clutters the interface, hinders integration with third-party 
> collections and algorithms, and is likely to still suffer from idiots who 
> think that "get Nth element" is always better than O(n) (see below).

I'd reserve getNth for linked lists only, if I implemented it at all.  It is 
a useless feature.  The only common feature for all containers should be 
iteration, because 'iterate next element' is always an O(1) operation 
(amortized in the case of trees).

>> In general, I'd say if you are using lists and frequently looking up the 
>> nth value in the list, you have chosen the wrong container for the job.
>>
>
> If you're frequently looking up random elements in a list, then yes, 
> you're probably using the wrong container. But that's beside the point. 
> Even if you only do it once: If you have a collection with a natural 
> order, and you want to get the nth element, you should be able to use the 
> standard "get element at index X" notation, [].

I respectfully disagree.  For the reasons I've stated above.

> I don't care how many people go around using [] and thinking they're 
> guaranteed to get a cheap computation from it. In a language that supports 
> overloading of [], the [] means "get the element at key/index X". 
> Especially in a language like D where using [] on an associative array can 
> trigger an unbounded allocation and GC run. Using [] in D (and various 
> other languages) can be expensive, period, even in the standard lib (assoc 
> array). So looking at a [] and thinking "guaranteed cheap", is incorrect, 
> period. If most people think 2+2=5, you're not going to redesign 
> arithmetic to work around that mistaken assumption.

Your assumption is that 'get the Nth element' is the only expectation for 
opIndex interface.  My assumption is that opIndex implies 'get an element 
efficiently' is an important part of the interface.  We obviously disagree, 
and as I said above, neither of us is right or wrong, strictly speaking. 
It's a matter of what is intuitive to you.

Part of the problems I see with many bad designs is the author thinks they 
see a fit for an interface, but it's not quite there.  They are so excited 
about fitting into an interface that they forget the importance of leaving 
out elements of the interface that don't make sense.  To me this is one of 
them.  An interface is a fit IMO if it fits exactly.  If you have to do 
things like implement functions that throw exceptions because they don't 
belong, or break the contract that the interface specifies, then either the 
interface is too specific, or you are not implementing the correct 
interface.

>>> If you've got a linked list, and you want to get element N, are you 
>>> *really* going to go reaching for a function named "search"? How often 
>>> do you really see a generic function named "search" or "find" that takes 
>>> a numeric index as a the "to be found" parameter instead of something to 
>>> be matched against the element's value? I would argue that that would be 
>>> confusing for most people. Like I said in a different post farther down, 
>>> the implementation of a "getAtIndex()" is obviously going to work like a 
>>> search, but from "outside the box", what you're asking for is not the 
>>> same.
>>
>> If you are indexing into a tree, it is considered a binary search, if you 
>> are indexing into a hash, it is a search at some point to deal with 
>> collisions.  People don't think about indexing as being a search, but in 
>> reality it is.  A really fast search.
>>
>
> It's implemented as a search, but I'd argue that the input/output 
> specifications are different. And yes, I suppose that does put it into a 
> bit of a grey area. But I wouldn't go so far as to say that, to the 
> caller, it's the same thing, because there are differences. If you want 
> get an element based on it's position in the collection, you call one 
> function. If you want to get an element based on it's content instead of 
> it's position, that's another function. If you want to get the position of 
> an element based on it's content or it's identity, that's one or two more 
> functions (depending, of course, if the element is a value type or 
> reference type, respectively).

I disagree.  I view the numeric index of an ordered container as a 'key' 
into the container.  A keyed container has the ability to look up elements 
quickly with the key.

Take a quick look at dcollections' ArrayList.  It implements the Keyed 
interface, with uint as the key.  I have no key for LinkList, because I 
don't see a useful key.

>> And I don't think search would be the name of the member function, it 
>> should be something like 'getNth', which returns a cursor that points to 
>> the element.
>>
>
> Right, and outside of pure C, [] is the shorthand for and the standardized 
> name for "getNth". If someone automatically assumes [] to be a simple 
> lookup, chances are they're going to make the same assumption about 
> anything named along the lines of "getNth". After all, that's what [] 
> does, it gets the Nth.

I view [] as "getByIndex", index being a value that offers quick access to 
elements.  There is no implied 'get the nth element'.  Look at an 
associative array.  If I had a string[string] array, what would you expect 
to get if you passed an integer as the index?

So good luck with your linked-list-should-be-indexed battle.  I shall not be 
posting on this again.

-Steve
August 29, 2008
Re: Prototypes (was: Why Strings as Classes?)
On 2008-08-28 19:33:52 +0200, "Manfred_Nowak" <svv1999@hotmail.com> said:

> Fawzi Mohamed wrote:
> 
>> I think
> 
> Sorry. Although I cancelled my posting within seconds, you grabbed it
> even faster.
> 
> -manfred

well out of curiosity how do you cancel a post? (that way I could have 
removed also mine...

Fawzi
August 29, 2008
Re: Why Strings as Classes?
Steven Schveighoffer wrote:
> "Nick Sabalausky" wrote
>> "Don" <nospam@nospam.com.au> wrote in message 
>> news:g95td3$2tu0$1@digitalmars.com...
>>> Nick Sabalausky wrote:
>>>> "Don" <nospam@nospam.com.au> wrote in message 
>>>> news:g95ks5$2aon$1@digitalmars.com...
>>>>> Nick Sabalausky wrote:
>>>>>> "Dee Girl" <deegirl@noreply.com> wrote in message 
>>>>>> news:g94j7a$2875$1@digitalmars.com...
>>>>>>> I appreciate your view point. Please allow me explain. The view point 
>>>>>>> is in opposition with STL. In STL each algorithm defines what kind of 
>>>>>>> iterator it operates with. And it requires what iterator complexity.
>>>>>>>
>>>>>>> I agree that other design can be made. But STL has that design. In my 
>>>>>>> opinion is much part of what make STL so successful.
>>>>>>>
>>>>>>> I disagree that algorithm that knows complexity of iterator is 
>>>>>>> concrete. I think exactly contrary. Maybe it is good that you read 
>>>>>>> book about STL by Josuttis. STL algorithms are the most generic I 
>>>>>>> ever find in any language. I hope std.algorithm in D will be better. 
>>>>>>> But right now std.algorithm works only with array.
>>>>>>>
>>>>>>>> If an algoritm uses [] and doesn't know the
>>>>>>>> complexity of the []...good! It shouldn't know, and it shouldn't 
>>>>>>>> care. It's
>>>>>>>> the code that sends the collection to the algoritm that knows and 
>>>>>>>> cares.
>>>>>>> I think this is mistake. Algorithm should know. Otherwise "linear 
>>>>>>> find" is not "linear find"! It is "cuadratic find" (spell?). If you 
>>>>>>> want to define something called linear find then you must know 
>>>>>>> iterator complexity.
>>>>>>>
>>>>>> If a generic algorithm describes itself as "linear find" then I know 
>>>>>> damn well that it's referring to the behavior of *just* the function 
>>>>>> itself, and is not a statement that the function *combined* with the 
>>>>>> behavior of the collection and/or a custom comparison is always going 
>>>>>> to be O(n).
>>>>>>
>>>>>> A question about STL: If I create a collection that, internally, is 
>>>>>> like a linked list, but starts each indexing operation from the 
>>>>>> position of the last indexing operation (so that a "find first" would 
>>>>>> run in O(n) instead of O(n*n)), is it possible to send that collection 
>>>>>> to STL's generic "linear find first"? I would argue that it should 
>>>>>> somehow be possible *even* if the STL's generic "linear find first" 
>>>>>> guarantees a *total* performance of O(n) (Since, in this case, it 
>>>>>> would still be O(n) anyway). Because otherwise, the STL wouldn't be 
>>>>>> very extendable, which would be a bad thing for a library of "generic" 
>>>>>> algorithms.
>>>>> Yes, it will work.
>>>>>
>>>>>> Another STL question: It is possible to use STL to do a "linear find" 
>>>>>> using a custom comparison? If so, it is possible to make STL's "linear 
>>>>>> find" function use a comparison that just happens to be O(n)? If so, 
>>>>>> doesn't that violate the linear-time guarantee, too? If not, how does 
>>>>>> it know that the custom comparison is O(n) instead of O(1) or O(log 
>>>>>> n)?
>>>>> This will work too.
>>>>>
>>>>> IF you follow the conventions THEN the STL gives you the guarantees.
>>>> I'm not sure that's really a "guarantee" per se, but that's splitting 
>>>> hairs.
>>>>
>>>> In any case, it sounds like we're all arguing more or less the same 
>>>> point:
>>>>
>>>> Setting aside the issue of "should opIndex be used and when?", suppose I 
>>>> have the following collection interface and find function (not 
>>>> guaranteed to compile):
>>>>
>>>> interface ICollection(T)
>>>> {
>>>>     T getElement(index);
>>>>     int getSize();
>>>> }
>>>>
>>>> int find(T)(ICollection(T) c, T elem)
>>>> {
>>>>     for(int i=0; i<c.size(); i++)
>>>>     {
>>>>  if(c.getElement(i) == elem)
>>>>             return i;
>>>>     }
>>>> }
>>>>
>>>> It sounds like STL's approach is to do something roughly like that and 
>>>> say:
>>>>
>>>> "find()'s parameter 'c' should be an ICollection for which getElement() 
>>>> is O(1), in which case find() is guaranteed to be O(n)"
>>>>
>>>> What I've been advocating is, again, doing something like the code above 
>>>> and saying:
>>>>
>>>> "find()'s complexity is dependant on the complexity of the ICollection's 
>>>> getElement(). If getElement()'s complexity is O(m), then find()'s 
>>>> complexity is guaranteed to be O(m * n). Of course, this means that the 
>>>> only way to get ideal complexity from find() is to use an ICollection 
>>>> for which getElement() is O(1)".
>>>>
>>>> But, you see, those two statements are effectively equivilent.
>>> They are. But...
>>> if you don't adhere to the conventions, your code gets really hard to 
>>> reason about.
>>>
>>> "This class has an opIndex which is in O(n). Is that OK?" Well, that 
>>> depends on what it's being used for. So you have to look at all of the 
>>> places where it is used.
>>>
>>> It's much simpler to use the convention that opIndex _must_ be fast; this 
>>> way the performance requirements for containers and algorithms are 
>>> completely decoupled from each other. It's about good design.
>>>
>> Taking a slight detour, let me ask you this... Which of the following 
>> strategies do you consider to be better:
>>
>> //-- A --
>> value = 0;
>> for(int i=1; i<=10; i++)
>> {
>>    value += i*2;
>> }
>>
>> //-- B --
>> value = sum(map(1..10, {n * 2}));
>>
>> Both strategies compute the sum of the first 10 multiples of 2.
>>
>> Strategy A makes the low-level implementation details very clear, but IMO, 
>> it comes at the expense of high-level clarity. This is because the code 
>> intermixes the high-level "what I want to accomplish?" with the low-level 
>> details.
>>
>> Strategy B much more closely resembles the high-level desired result, and 
>> thus makes the high-level intent more clear. But this comes at the cost of 
>> hiding the low-level details behind a layer of abstraction.
>>
>> I may very well be wrong on this, but from what you've said it sounds like 
>> you (as well as the other people who prefer [] to never be O(n)) are the 
>> type of coder who would prefer "Strategy A". In that case, I can 
>> completely understand your viewpoint on opIndex, even though I don't agree 
>> with it (I'm a "Strategy B" kind of person).
>>
>> Of course, if I'm wrong on that assumption, then we're back to square one 
>> ;)
> 
> For me at least, you are wrong :)  In fact, I view it the other way, you 
> shouldn't have to care about the underlying implementation, as long as the 
> runtime is well defined.  If you tell me strategy B may or may not take up 
> to O(n^2) to compute, then you bet your ass I'm not going to even touch 
> option B, 'cause I can always get O(n) time with option A :)  Your solution 
> FORCES me to care about the details, it's not so much that I want to care 
> about them.

I agree.  It's about _which_ details do you want to abstract away. I 
don't care about the internals. But I _do_ care about the complexity of 
them.
August 29, 2008
Re: Prototypes (was: Why Strings as Classes?)
Fawzi Mohamed wrote:

> how do you cancel a post?

By using a news-client, that has this feature.

-manfred

-- 
If life is going to exist in this Universe, then the one thing it 
cannot afford to have is a sense of proportion. (Douglas Adams)
16 17 18 19 20 21
Top | Discussion index | About this forum | D home