Why Strings as Classes? (page 20) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Why Strings as Classes? (page 20)

August 28, 2008

Re: Why Strings as Classes?

Posted by Nick Sabalausky
in reply to Don

Nick Sabalausky

Posted in reply to Don

"Don" <nospam@nospam.com.au> wrote in message news:g95td3$2tu0$1@digitalmars.com...
> Nick Sabalausky wrote:
>> "Don" <nospam@nospam.com.au> wrote in message news:g95ks5$2aon$1@digitalmars.com...
>>> Nick Sabalausky wrote:
>>>> "Dee Girl" <deegirl@noreply.com> wrote in message news:g94j7a$2875$1@digitalmars.com...
>>>>> I appreciate your view point. Please allow me explain. The view point is in opposition with STL. In STL each algorithm defines what kind of iterator it operates with. And it requires what iterator complexity.
>>>>>
>>>>> I agree that other design can be made. But STL has that design. In my opinion is much part of what make STL so successful.
>>>>>
>>>>> I disagree that algorithm that knows complexity of iterator is concrete. I think exactly contrary. Maybe it is good that you read book about STL by Josuttis. STL algorithms are the most generic I ever find in any language. I hope std.algorithm in D will be better. But right now std.algorithm works only with array.
>>>>>
>>>>>> If an algoritm uses [] and doesn't know the
>>>>>> complexity of the []...good! It shouldn't know, and it shouldn't
>>>>>> care. It's
>>>>>> the code that sends the collection to the algoritm that knows and
>>>>>> cares.
>>>>> I think this is mistake. Algorithm should know. Otherwise "linear find" is not "linear find"! It is "cuadratic find" (spell?). If you want to define something called linear find then you must know iterator complexity.
>>>>>
>>>> If a generic algorithm describes itself as "linear find" then I know damn well that it's referring to the behavior of *just* the function itself, and is not a statement that the function *combined* with the behavior of the collection and/or a custom comparison is always going to be O(n).
>>>>
>>>> A question about STL: If I create a collection that, internally, is like a linked list, but starts each indexing operation from the position of the last indexing operation (so that a "find first" would run in O(n) instead of O(n*n)), is it possible to send that collection to STL's generic "linear find first"? I would argue that it should somehow be possible *even* if the STL's generic "linear find first" guarantees a *total* performance of O(n) (Since, in this case, it would still be O(n) anyway). Because otherwise, the STL wouldn't be very extendable, which would be a bad thing for a library of "generic" algorithms.
>>> Yes, it will work.
>>>
>>>> Another STL question: It is possible to use STL to do a "linear find" using a custom comparison? If so, it is possible to make STL's "linear find" function use a comparison that just happens to be O(n)? If so, doesn't that violate the linear-time guarantee, too? If not, how does it know that the custom comparison is O(n) instead of O(1) or O(log n)?
>>> This will work too.
>>>
>>> IF you follow the conventions THEN the STL gives you the guarantees.
>>
>> I'm not sure that's really a "guarantee" per se, but that's splitting hairs.
>>
>> In any case, it sounds like we're all arguing more or less the same point:
>>
>> Setting aside the issue of "should opIndex be used and when?", suppose I have the following collection interface and find function (not guaranteed to compile):
>>
>> interface ICollection(T)
>> {
>>     T getElement(index);
>>     int getSize();
>> }
>>
>> int find(T)(ICollection(T) c, T elem)
>> {
>>     for(int i=0; i<c.size(); i++)
>>     {
>>  if(c.getElement(i) == elem)
>>             return i;
>>     }
>> }
>>
>> It sounds like STL's approach is to do something roughly like that and say:
>>
>> "find()'s parameter 'c' should be an ICollection for which getElement() is O(1), in which case find() is guaranteed to be O(n)"
>>
>> What I've been advocating is, again, doing something like the code above and saying:
>>
>> "find()'s complexity is dependant on the complexity of the ICollection's getElement(). If getElement()'s complexity is O(m), then find()'s complexity is guaranteed to be O(m * n). Of course, this means that the only way to get ideal complexity from find() is to use an ICollection for which getElement() is O(1)".
>>
>> But, you see, those two statements are effectively equivilent.
>
> They are. But...
> if you don't adhere to the conventions, your code gets really hard to
> reason about.
>
> "This class has an opIndex which is in O(n). Is that OK?" Well, that depends on what it's being used for. So you have to look at all of the places where it is used.
>
> It's much simpler to use the convention that opIndex _must_ be fast; this way the performance requirements for containers and algorithms are completely decoupled from each other. It's about good design.
>

Taking a slight detour, let me ask you this... Which of the following strategies do you consider to be better:

//-- A --
value = 0;
for(int i=1; i<=10; i++)
{
    value += i*2;
}

//-- B --
value = sum(map(1..10, {n * 2}));

Both strategies compute the sum of the first 10 multiples of 2.

Strategy A makes the low-level implementation details very clear, but IMO, it comes at the expense of high-level clarity. This is because the code intermixes the high-level "what I want to accomplish?" with the low-level details.

Strategy B much more closely resembles the high-level desired result, and thus makes the high-level intent more clear. But this comes at the cost of hiding the low-level details behind a layer of abstraction.

I may very well be wrong on this, but from what you've said it sounds like you (as well as the other people who prefer [] to never be O(n)) are the type of coder who would prefer "Strategy A". In that case, I can completely understand your viewpoint on opIndex, even though I don't agree with it (I'm a "Strategy B" kind of person).

Of course, if I'm wrong on that assumption, then we're back to square one ;)

August 28, 2008

Re: Why Strings as Classes?

Posted by Steven Schveighoffer
in reply to Nick Sabalausky

Steven Schveighoffer

Posted in reply to Nick Sabalausky

"Nick Sabalausky" wrote
> "Steven Schveighoffer" wrote in message news:g9650h$cp9$1@digitalmars.com...
>> "Nick Sabalausky" wrote
>>> "Steven Schveighoffer" wrote in message news:g95b89$1jii$1@digitalmars.com...
>>>>
>>>> When someone sees an index operator the first thought is that it is a quick lookup.
>>>
>>> This seems to be a big part of the disagreement. Personally, I think it's insane to look at [] and just assume it's a cheap lookup. That's was true in pure C where, as someone else said, it was nothing more than a shorthand for a specific pointer arithmentic operation, but carrying that idea over to more modern languages (especially one that also uses [] for associative arrays) is a big mistake.
>>
>> Perhaps it is a mistake to assume it, but it is a common mistake. And the expectation is intuitive.
>
> Why in the world would any halfway competent programmer ever look at a *linked list* and assume that the linked lists's [] (if implemented) is O(1)?

You are writing this function:

void foo(IOrderedContainer cont)
{
    ....
}

IOrderedContainer implements opIndex(uint).  The problem is that you can't tell whether the object itself is a list or not, so you are powerless to make the decision as to whether the container has fast indexing.  In that case, your only choice (if speed is an issue) is to not use opIndex.

> As for the risk that could create of accidentially sending a linked list to a "search" (ie, a "search for an element which contains data X") that uses [] internally instead of iterators (but then, why wouldn't it just use iterators anyway?): I'll agree that in a case like this there should be some mechanism for automatic choosing of an algorithm, but that mechanism should be at a separate level of abstraction. There would be a function "search" that, through either RTTI or template constraints or something else, says "does collection 'c' implement ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the collection have attribute ForewardDirectionIndexingComplexity that is set equal to Complexity.Constant?", and based on that passes control to either IndexingSearch or IteratorSearch.

To me, this is a bad design.  It's my opinion, but one that is shared among many people.  You can do stuff this way, but it is not intuitive.  I'd much rather reserve opIndex to only quick lookups, and avoid the possibility of accidentally using it incorrectly.

In general, I'd say if you are using lists and frequently looking up the nth value in the list, you have chosen the wrong container for the job.

>>  You don't see people making light switches that look like outlets, even
>> though it's possible.  You might perhaps make a library where opIndex is
>> a linear search in your list, but I would expect that people would not
>> use that indexing feature correctly.  Just as if I plug my lamp into the
>> light switch that looks like an outlet, I'd expect it to get power, and
>> be confused when it doesn't.  Except the opIndex mistake is more subtle
>> because I *do* get what I actually want, but I just am not realizing the
>> cost of it.
>>
>>> The '+' operator means "add". Addition is typically O(1). But vectors can be added, and that's an O(n) operation. Should opAdd never be used for vectors?
>>
>> As long as it's addition, I have no problem with O(n) operation (and it depends on your view of n).  Indexing implies speed, look at all other cases where indexing is used.  For addition to be proportional to the size of the element is natural and expected.
>>
>
> When people look at '+' they typically think "integer/float addition". Why would, for example, the risk of mistaking an O(n) "big int" addition for an O(1) integer/float addition be any worse than the risk of mistaking an O(n) linked list "get element at index" for an O(1) array "get element at index"?

What good are integers that can't be added?  In this case, it is not possible to have quick addition, no matter how you implement your arbitrary-precision integer.  I think the time penalty is understood and accepted.  With opIndex, the time penalty is not expected.  Like it or not, this is how many users look at it.

>>>> You can force yourself to think differently, but the reality is that most people think that because of the universal usage of square brackets (except for VB, and I feel pity for anyone who needs to use VB) to mean 'lookup by key', and usually this is only useful on objects where the lookup is quick ( < O(n) ).  Although there is no requirement, nor enforcement, the 'quick' contract is expected by the user, no matter how much docs you throw at them.
>>>>
>>>> Look, for instance, at Tango's now-deprecated LinkMap, which uses a linked-list of key/value pairs (copied from Doug Lea's implementation). Nobody in their right mind would use link map because lookups are O(n), and it's just as easy to use a TreeMap or HashMap.  Would you ever use it?
>>>>
>>>>> If you *really* need that sort of guarantee (and I can imagine it may be useful in some cases), then the implementation of the guarantee does *not* belong in the realm of "implements vs doesn't-implement a particular operator overload". Doing so is an abuse of operator overloading, since operator overloading is there for defining syntactic sugar, not for acting as a makeshift contract.
>>>>
>>>> I don't think anybody is asking for a guarantee from the compiler or any specific tool.  I think what we are saying is that violating the 'opIndex is fast' notion is bad design because you end up with users thinking they are doing something that's quick.  You end up with people posting benchmarks on your containers saying 'why does python beat the pants off your list implementation?'.  You can say 'hey, it's not meant to be used that way', but then why can the user use it that way?  A better design is to nudge the user into using a correct container for the job by only supporting operations that make sense on the collections.
>>>>
>>>> And as far as operator semantic meaning, D's operators are purposely named after what they are supposed to do.  Notice that the operator for + is opAdd, not opPlus.  This is because opAdd is supposed to mean you are performing an addition operation.  Assigning a different semantic meaning is not disallowed by the compiler, but is considered bad design. opIndex is supposed to be an index function, not a linear search.  It's not called opSearch for a reason.  Sure you can redefine it however you want semantically, but it's considered bad design. That's all we're saying.
>>>>
>>>
>>> Nobody is suggesting using [] to invoke a search (Although we have talked about using [] *in the search function's implementation*). Search means you want to get the position of a given element, or in other words, "element" -> search -> "key/index". What we're talking about is the reverse: getting the element at a given position, ie, "key/index" -> [] -> "element". It doesn't matter if it's an array, a linked list, or a super-duper-collection-from-2092: that's still indexing, not searching.
>>
>> It is a search, you are searching for the element whose key matches. When the key can be used to lookup the element efficiently, we call that indexing.  Indexing is a more constrained form of searching.
>>
>
> If you've got a linked list, and you want to get element N, are you *really* going to go reaching for a function named "search"? How often do you really see a generic function named "search" or "find" that takes a numeric index as a the "to be found" parameter instead of something to be matched against the element's value? I would argue that that would be confusing for most people. Like I said in a different post farther down, the implementation of a "getAtIndex()" is obviously going to work like a search, but from "outside the box", what you're asking for is not the same.

If you are indexing into a tree, it is considered a binary search, if you are indexing into a hash, it is a search at some point to deal with collisions.  People don't think about indexing as being a search, but in reality it is.  A really fast search.

And I don't think search would be the name of the member function, it should be something like 'getNth', which returns a cursor that points to the element.

-Steve

August 28, 2008

Re: Why Strings as Classes?

Posted by Steven Schveighoffer
in reply to Nick Sabalausky

Steven Schveighoffer

Posted in reply to Nick Sabalausky

"Nick Sabalausky" wrote
> "Don" <nospam@nospam.com.au> wrote in message news:g95td3$2tu0$1@digitalmars.com...
>> Nick Sabalausky wrote:
>>> "Don" <nospam@nospam.com.au> wrote in message news:g95ks5$2aon$1@digitalmars.com...
>>>> Nick Sabalausky wrote:
>>>>> "Dee Girl" <deegirl@noreply.com> wrote in message news:g94j7a$2875$1@digitalmars.com...
>>>>>> I appreciate your view point. Please allow me explain. The view point is in opposition with STL. In STL each algorithm defines what kind of iterator it operates with. And it requires what iterator complexity.
>>>>>>
>>>>>> I agree that other design can be made. But STL has that design. In my opinion is much part of what make STL so successful.
>>>>>>
>>>>>> I disagree that algorithm that knows complexity of iterator is concrete. I think exactly contrary. Maybe it is good that you read book about STL by Josuttis. STL algorithms are the most generic I ever find in any language. I hope std.algorithm in D will be better. But right now std.algorithm works only with array.
>>>>>>
>>>>>>> If an algoritm uses [] and doesn't know the
>>>>>>> complexity of the []...good! It shouldn't know, and it shouldn't
>>>>>>> care. It's
>>>>>>> the code that sends the collection to the algoritm that knows and
>>>>>>> cares.
>>>>>> I think this is mistake. Algorithm should know. Otherwise "linear find" is not "linear find"! It is "cuadratic find" (spell?). If you want to define something called linear find then you must know iterator complexity.
>>>>>>
>>>>> If a generic algorithm describes itself as "linear find" then I know damn well that it's referring to the behavior of *just* the function itself, and is not a statement that the function *combined* with the behavior of the collection and/or a custom comparison is always going to be O(n).
>>>>>
>>>>> A question about STL: If I create a collection that, internally, is like a linked list, but starts each indexing operation from the position of the last indexing operation (so that a "find first" would run in O(n) instead of O(n*n)), is it possible to send that collection to STL's generic "linear find first"? I would argue that it should somehow be possible *even* if the STL's generic "linear find first" guarantees a *total* performance of O(n) (Since, in this case, it would still be O(n) anyway). Because otherwise, the STL wouldn't be very extendable, which would be a bad thing for a library of "generic" algorithms.
>>>> Yes, it will work.
>>>>
>>>>> Another STL question: It is possible to use STL to do a "linear find" using a custom comparison? If so, it is possible to make STL's "linear find" function use a comparison that just happens to be O(n)? If so, doesn't that violate the linear-time guarantee, too? If not, how does it know that the custom comparison is O(n) instead of O(1) or O(log n)?
>>>> This will work too.
>>>>
>>>> IF you follow the conventions THEN the STL gives you the guarantees.
>>>
>>> I'm not sure that's really a "guarantee" per se, but that's splitting hairs.
>>>
>>> In any case, it sounds like we're all arguing more or less the same point:
>>>
>>> Setting aside the issue of "should opIndex be used and when?", suppose I have the following collection interface and find function (not guaranteed to compile):
>>>
>>> interface ICollection(T)
>>> {
>>>     T getElement(index);
>>>     int getSize();
>>> }
>>>
>>> int find(T)(ICollection(T) c, T elem)
>>> {
>>>     for(int i=0; i<c.size(); i++)
>>>     {
>>>  if(c.getElement(i) == elem)
>>>             return i;
>>>     }
>>> }
>>>
>>> It sounds like STL's approach is to do something roughly like that and say:
>>>
>>> "find()'s parameter 'c' should be an ICollection for which getElement() is O(1), in which case find() is guaranteed to be O(n)"
>>>
>>> What I've been advocating is, again, doing something like the code above and saying:
>>>
>>> "find()'s complexity is dependant on the complexity of the ICollection's getElement(). If getElement()'s complexity is O(m), then find()'s complexity is guaranteed to be O(m * n). Of course, this means that the only way to get ideal complexity from find() is to use an ICollection for which getElement() is O(1)".
>>>
>>> But, you see, those two statements are effectively equivilent.
>>
>> They are. But...
>> if you don't adhere to the conventions, your code gets really hard to
>> reason about.
>>
>> "This class has an opIndex which is in O(n). Is that OK?" Well, that depends on what it's being used for. So you have to look at all of the places where it is used.
>>
>> It's much simpler to use the convention that opIndex _must_ be fast; this way the performance requirements for containers and algorithms are completely decoupled from each other. It's about good design.
>>
>
> Taking a slight detour, let me ask you this... Which of the following strategies do you consider to be better:
>
> //-- A --
> value = 0;
> for(int i=1; i<=10; i++)
> {
>    value += i*2;
> }
>
> //-- B --
> value = sum(map(1..10, {n * 2}));
>
> Both strategies compute the sum of the first 10 multiples of 2.
>
> Strategy A makes the low-level implementation details very clear, but IMO, it comes at the expense of high-level clarity. This is because the code intermixes the high-level "what I want to accomplish?" with the low-level details.
>
> Strategy B much more closely resembles the high-level desired result, and thus makes the high-level intent more clear. But this comes at the cost of hiding the low-level details behind a layer of abstraction.
>
> I may very well be wrong on this, but from what you've said it sounds like you (as well as the other people who prefer [] to never be O(n)) are the type of coder who would prefer "Strategy A". In that case, I can completely understand your viewpoint on opIndex, even though I don't agree with it (I'm a "Strategy B" kind of person).
>
> Of course, if I'm wrong on that assumption, then we're back to square one ;)

For me at least, you are wrong :)  In fact, I view it the other way, you shouldn't have to care about the underlying implementation, as long as the runtime is well defined.  If you tell me strategy B may or may not take up to O(n^2) to compute, then you bet your ass I'm not going to even touch option B, 'cause I can always get O(n) time with option A :)  Your solution FORCES me to care about the details, it's not so much that I want to care about them.

-Steve

August 28, 2008

Re: Why Strings as Classes?

Posted by Robert Fraser
in reply to Nick Sabalausky

Robert Fraser

Posted in reply to Nick Sabalausky

Nick Sabalausky wrote:
> Taking a slight detour, let me ask you this... Which of the following strategies do you consider to be better:
> 
> //-- A --
> value = 0;
> for(int i=1; i<=10; i++)
> {
>     value += i*2;
> }
> 
> //-- B --
> value = sum(map(1..10, {n * 2}));
> 
> Both strategies compute the sum of the first 10 multiples of 2.
> 
> Strategy A makes the low-level implementation details very clear, but IMO, it comes at the expense of high-level clarity. This is because the code intermixes the high-level "what I want to accomplish?" with the low-level details.
> 
> Strategy B much more closely resembles the high-level desired result, and thus makes the high-level intent more clear. But this comes at the cost of hiding the low-level details behind a layer of abstraction.

Didn't read the rest of the discussion, but I disagree here... Most programmers learn iterative languages first, and anyone whose taken Computer Science 101 can figure out what's going on in A. B takes a second to think about. I'm not into the zen of FP for sure, and that probably makes me a worse programmer... but I bet you if you take a random candidate for a development position, she'll be more likely to figure out (and write) A than B. [That may be projection; I haven't seen/done any studies]

The big problem IMO is the number of primitive things you need to understand. In A, you need to understand variables, looping and arithmetic operations. In B, you need to understand and think about closures/scoping, lists, the "map" function, aggregate functions, function compositions, and arithmetic operations. What hit me when first looking at it "where the **** did n come from?"

I'm not saying the functional style isn't perfect for a lot of things, I'm just saying that this not one of them.

August 28, 2008

Re: Why Strings as Classes?

Posted by Nick Sabalausky
in reply to Steven Schveighoffer

Nick Sabalausky

Posted in reply to Steven Schveighoffer

"Steven Schveighoffer" <schveiguy@yahoo.com> wrote in message news:g96vmq$3cc$1@digitalmars.com...
> "Nick Sabalausky" wrote
>> "Steven Schveighoffer" wrote in message news:g9650h$cp9$1@digitalmars.com...
>>> Perhaps it is a mistake to assume it, but it is a common mistake. And the expectation is intuitive.
>>
>> Why in the world would any halfway competent programmer ever look at a *linked list* and assume that the linked lists's [] (if implemented) is O(1)?
>
> You are writing this function:
>
> void foo(IOrderedContainer cont)
> {
>    ....
> }
>
> IOrderedContainer implements opIndex(uint).  The problem is that you can't tell whether the object itself is a list or not, so you are powerless to make the decision as to whether the container has fast indexing.  In that case, your only choice (if speed is an issue) is to not use opIndex.
>

Ok, so you want foo() to be able to tell if the collection has fast or slow indexing. What are you suggesting that foo() does when the collection does have slow indexing?

1. Should it fail to compile because foo's implementation uses [] and the slow-indexing collection doesn't implement []?

Well then how does foo know that it's the most important, most frequent thing being done on the collection? Suppose foo is something that needs to access elements in a somewhat random order, ie the kind of thing that lists are poorly suited for. Further suppose that collection C is some set of data that *usually* just gets insertions and deletions at nodes that the code already has direct references to. Further suppose that foo does *need* to be run on the collection, *but* very infrequently. So, should I *really* be forced to make C a collection that trades good insertion/deletion complexity for good indexing complexity, just because the occasionally-run foo() doesn't like it? And what if I want to run benchmarks to test what collection works best, in real-world use, for C? Should foo's intolerance of slow-indexing collections really be able force me to exclude testing of such collections?

2. Should foo revert to an alternate branch of code that doesn't use []?

This behavior can be implemented via interfaces like I described. The benefit of that is that [] can still serve as the shorthand it's intended for (see below) and you never need to introduce the inconsistency of "Gee, how do I get the Nth element of a collection?" "Well, on some collections it's getNth(), and on other collections it's []."

3. Something else?

>> As for the risk that could create of accidentially sending a linked list to a "search" (ie, a "search for an element which contains data X") that uses [] internally instead of iterators (but then, why wouldn't it just use iterators anyway?): I'll agree that in a case like this there should be some mechanism for automatic choosing of an algorithm, but that mechanism should be at a separate level of abstraction. There would be a function "search" that, through either RTTI or template constraints or something else, says "does collection 'c' implement ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the collection have attribute ForewardDirectionIndexingComplexity that is set equal to Complexity.Constant?", and based on that passes control to either IndexingSearch or IteratorSearch.
>
> To me, this is a bad design.  It's my opinion, but one that is shared among many people.  You can do stuff this way, but it is not intuitive. I'd much rather reserve opIndex to only quick lookups, and avoid the possibility of accidentally using it incorrectly.
>

Preventing a collection from ever being used in a function that would typically perform poorly on that collection just smacks of premature optimization. How do you, as the collection author, know that the collection will never be used in a way such that *occasional* use in certain specific sub-optimal a manner might actually be necessary and/or acceptable?

If you omit [] then you've burnt the bridge (so to speak) and your only recourse is to add a standardized "getNth()" to every single collection which clutters the interface, hinders integration with third-party collections and algorithms, and is likely to still suffer from idiots who think that "get Nth element" is always better than O(n) (see below).

> In general, I'd say if you are using lists and frequently looking up the nth value in the list, you have chosen the wrong container for the job.
>

If you're frequently looking up random elements in a list, then yes, you're probably using the wrong container. But that's beside the point. Even if you only do it once: If you have a collection with a natural order, and you want to get the nth element, you should be able to use the standard "get element at index X" notation, [].

I don't care how many people go around using [] and thinking they're guaranteed to get a cheap computation from it. In a language that supports overloading of [], the [] means "get the element at key/index X". Especially in a language like D where using [] on an associative array can trigger an unbounded allocation and GC run. Using [] in D (and various other languages) can be expensive, period, even in the standard lib (assoc array). So looking at a [] and thinking "guaranteed cheap", is incorrect, period. If most people think 2+2=5, you're not going to redesign arithmetic to work around that mistaken assumption.

>>
>> When people look at '+' they typically think "integer/float addition". Why would, for example, the risk of mistaking an O(n) "big int" addition for an O(1) integer/float addition be any worse than the risk of mistaking an O(n) linked list "get element at index" for an O(1) array "get element at index"?
>
> What good are integers that can't be added?  In this case, it is not possible to have quick addition, no matter how you implement your arbitrary-precision integer.  I think the time penalty is understood and accepted.  With opIndex, the time penalty is not expected.  Like it or not, this is how many users look at it.
>
>>
>> If you've got a linked list, and you want to get element N, are you *really* going to go reaching for a function named "search"? How often do you really see a generic function named "search" or "find" that takes a numeric index as a the "to be found" parameter instead of something to be matched against the element's value? I would argue that that would be confusing for most people. Like I said in a different post farther down, the implementation of a "getAtIndex()" is obviously going to work like a search, but from "outside the box", what you're asking for is not the same.
>
> If you are indexing into a tree, it is considered a binary search, if you are indexing into a hash, it is a search at some point to deal with collisions.  People don't think about indexing as being a search, but in reality it is.  A really fast search.
>

It's implemented as a search, but I'd argue that the input/output specifications are different. And yes, I suppose that does put it into a bit of a grey area. But I wouldn't go so far as to say that, to the caller, it's the same thing, because there are differences. If you want get an element based on it's position in the collection, you call one function. If you want to get an element based on it's content instead of it's position, that's another function. If you want to get the position of an element based on it's content or it's identity, that's one or two more functions (depending, of course, if the element is a value type or reference type, respectively).

> And I don't think search would be the name of the member function, it should be something like 'getNth', which returns a cursor that points to the element.
>

Right, and outside of pure C, [] is the shorthand for and the standardized name for "getNth". If someone automatically assumes [] to be a simple lookup, chances are they're going to make the same assumption about anything named along the lines of "getNth". After all, that's what [] does, it gets the Nth.

August 29, 2008

Re: Why Strings as Classes?

Posted by Sergey Gromov
in reply to Robert Fraser

Sergey Gromov

Posted in reply to Robert Fraser

Robert Fraser <fraserofthenight@gmail.com> wrote:
> First, a hash access is already time unbounded. hash["hello"] where "hello" is not already in the hash will create a hash entry for hello. This requires heap allocation, which can take arbitrarily long. So having unbounded opIndex is in the language already!

Hash's opIndex() throws an ArrayBoundsError if given an unknown key. It's opIndexAssign() which allocates.

-- 
SnakE

August 29, 2008

Re: Why Strings as Classes?

Posted by Steven Schveighoffer
in reply to Nick Sabalausky

Steven Schveighoffer

Posted in reply to Nick Sabalausky

"Nick Sabalausky" wrote
> Ok, so you want foo() to be able to tell if the collection has fast or slow indexing. What are you suggesting that foo() does when the collection does have slow indexing?

No, I don't want to be able to tell.  I don't want to HAVE to be able to tell.  In my ideal world, the collection does not implement opIndex unless it is fast, so there is no issue.  i.e. you cannot call foo with a linked list.

I'm really tired of this argument, you understand my point of view, I understand yours.  To you, the syntax sugar is more important than the complexity guarantees.  To me, what the syntax intuitively means should be what it does.  So I'll develop my collections library and you develop yours, fair enough?  I don't think either of us is right or wrong in the strict sense of the terms.

To be fair, I'll answer your other points as you took the time to write them.  And then I'm done.  I can't really be any clearer as to what I believe is the best design.

> 1. Should it fail to compile because foo's implementation uses [] and the slow-indexing collection doesn't implement []?

No, foo will always compile because opIndex should always be fast, and then I can specify the complexity of foo without worry.

Using an O(n) lookup operation should be more painful because it requires more time.  It makes users use it less.

> 2. Should foo revert to an alternate branch of code that doesn't use []?
>
> This behavior can be implemented via interfaces like I described. The benefit of that is that [] can still serve as the shorthand it's intended for (see below) and you never need to introduce the inconsistency of "Gee, how do I get the Nth element of a collection?" "Well, on some collections it's getNth(), and on other collections it's []."

I believe that you shouldn't really ever be calling getNth on a link-list, and if you are, it should be a red flag, like a cast.

Furthermore [] isn't always equivalent to getNth, see below.

>>> As for the risk that could create of accidentially sending a linked list to a "search" (ie, a "search for an element which contains data X") that uses [] internally instead of iterators (but then, why wouldn't it just use iterators anyway?): I'll agree that in a case like this there should be some mechanism for automatic choosing of an algorithm, but that mechanism should be at a separate level of abstraction. There would be a function "search" that, through either RTTI or template constraints or something else, says "does collection 'c' implement ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the collection have attribute ForewardDirectionIndexingComplexity that is set equal to Complexity.Constant?", and based on that passes control to either IndexingSearch or IteratorSearch.
>>
>> To me, this is a bad design.  It's my opinion, but one that is shared among many people.  You can do stuff this way, but it is not intuitive. I'd much rather reserve opIndex to only quick lookups, and avoid the possibility of accidentally using it incorrectly.
>>
>
> Preventing a collection from ever being used in a function that would typically perform poorly on that collection just smacks of premature optimization. How do you, as the collection author, know that the collection will never be used in a way such that *occasional* use in certain specific sub-optimal a manner might actually be necessary and/or acceptable?

It's not premature optimization, it's not offering a feature that has little or no use.  It's like any contract for any object, you only want to define the interface for which your object is designed.  A linked list should not have an opIndex because it's not designed to be indexed.

If I designed a new car with which you could steer each front wheel independently, would that make you buy it?  It's another feature that the car has that other cars don't.  Who cares if it's useful, its another *feature*!  Sometimes a good design is not that a feature is included but that a feature is *not* included.

> If you omit [] then you've burnt the bridge (so to speak) and your only recourse is to add a standardized "getNth()" to every single collection which clutters the interface, hinders integration with third-party collections and algorithms, and is likely to still suffer from idiots who think that "get Nth element" is always better than O(n) (see below).

I'd reserve getNth for linked lists only, if I implemented it at all.  It is a useless feature.  The only common feature for all containers should be iteration, because 'iterate next element' is always an O(1) operation (amortized in the case of trees).

>> In general, I'd say if you are using lists and frequently looking up the nth value in the list, you have chosen the wrong container for the job.
>>
>
> If you're frequently looking up random elements in a list, then yes, you're probably using the wrong container. But that's beside the point. Even if you only do it once: If you have a collection with a natural order, and you want to get the nth element, you should be able to use the standard "get element at index X" notation, [].

I respectfully disagree.  For the reasons I've stated above.

> I don't care how many people go around using [] and thinking they're guaranteed to get a cheap computation from it. In a language that supports overloading of [], the [] means "get the element at key/index X". Especially in a language like D where using [] on an associative array can trigger an unbounded allocation and GC run. Using [] in D (and various other languages) can be expensive, period, even in the standard lib (assoc array). So looking at a [] and thinking "guaranteed cheap", is incorrect, period. If most people think 2+2=5, you're not going to redesign arithmetic to work around that mistaken assumption.

Your assumption is that 'get the Nth element' is the only expectation for opIndex interface.  My assumption is that opIndex implies 'get an element efficiently' is an important part of the interface.  We obviously disagree, and as I said above, neither of us is right or wrong, strictly speaking. It's a matter of what is intuitive to you.

Part of the problems I see with many bad designs is the author thinks they see a fit for an interface, but it's not quite there.  They are so excited about fitting into an interface that they forget the importance of leaving out elements of the interface that don't make sense.  To me this is one of them.  An interface is a fit IMO if it fits exactly.  If you have to do things like implement functions that throw exceptions because they don't belong, or break the contract that the interface specifies, then either the interface is too specific, or you are not implementing the correct interface.

>>> If you've got a linked list, and you want to get element N, are you *really* going to go reaching for a function named "search"? How often do you really see a generic function named "search" or "find" that takes a numeric index as a the "to be found" parameter instead of something to be matched against the element's value? I would argue that that would be confusing for most people. Like I said in a different post farther down, the implementation of a "getAtIndex()" is obviously going to work like a search, but from "outside the box", what you're asking for is not the same.
>>
>> If you are indexing into a tree, it is considered a binary search, if you are indexing into a hash, it is a search at some point to deal with collisions.  People don't think about indexing as being a search, but in reality it is.  A really fast search.
>>
>
> It's implemented as a search, but I'd argue that the input/output specifications are different. And yes, I suppose that does put it into a bit of a grey area. But I wouldn't go so far as to say that, to the caller, it's the same thing, because there are differences. If you want get an element based on it's position in the collection, you call one function. If you want to get an element based on it's content instead of it's position, that's another function. If you want to get the position of an element based on it's content or it's identity, that's one or two more functions (depending, of course, if the element is a value type or reference type, respectively).

I disagree.  I view the numeric index of an ordered container as a 'key' into the container.  A keyed container has the ability to look up elements quickly with the key.

Take a quick look at dcollections' ArrayList.  It implements the Keyed interface, with uint as the key.  I have no key for LinkList, because I don't see a useful key.

>> And I don't think search would be the name of the member function, it should be something like 'getNth', which returns a cursor that points to the element.
>>
>
> Right, and outside of pure C, [] is the shorthand for and the standardized name for "getNth". If someone automatically assumes [] to be a simple lookup, chances are they're going to make the same assumption about anything named along the lines of "getNth". After all, that's what [] does, it gets the Nth.

I view [] as "getByIndex", index being a value that offers quick access to elements.  There is no implied 'get the nth element'.  Look at an associative array.  If I had a string[string] array, what would you expect to get if you passed an integer as the index?

So good luck with your linked-list-should-be-indexed battle.  I shall not be posting on this again.

-Steve

August 29, 2008

Re: Prototypes (was: Why Strings as Classes?)

Posted by Fawzi Mohamed
in reply to Manfred_Nowak

Fawzi Mohamed

Posted in reply to Manfred_Nowak

On 2008-08-28 19:33:52 +0200, "Manfred_Nowak" <svv1999@hotmail.com> said:

> Fawzi Mohamed wrote:
> 
>> I think
> 
> Sorry. Although I cancelled my posting within seconds, you grabbed it
> even faster.
> 
> -manfred

well out of curiosity how do you cancel a post? (that way I could have removed also mine...

Fawzi

August 29, 2008

Re: Why Strings as Classes?

Posted by Don
in reply to Steven Schveighoffer

Don

Posted in reply to Steven Schveighoffer

Steven Schveighoffer wrote:
> "Nick Sabalausky" wrote
>> "Don" <nospam@nospam.com.au> wrote in message news:g95td3$2tu0$1@digitalmars.com...
>>> Nick Sabalausky wrote:
>>>> "Don" <nospam@nospam.com.au> wrote in message news:g95ks5$2aon$1@digitalmars.com...
>>>>> Nick Sabalausky wrote:
>>>>>> "Dee Girl" <deegirl@noreply.com> wrote in message news:g94j7a$2875$1@digitalmars.com...
>>>>>>> I appreciate your view point. Please allow me explain. The view point is in opposition with STL. In STL each algorithm defines what kind of iterator it operates with. And it requires what iterator complexity.
>>>>>>>
>>>>>>> I agree that other design can be made. But STL has that design. In my opinion is much part of what make STL so successful.
>>>>>>>
>>>>>>> I disagree that algorithm that knows complexity of iterator is concrete. I think exactly contrary. Maybe it is good that you read book about STL by Josuttis. STL algorithms are the most generic I ever find in any language. I hope std.algorithm in D will be better. But right now std.algorithm works only with array.
>>>>>>>
>>>>>>>> If an algoritm uses [] and doesn't know the
>>>>>>>> complexity of the []...good! It shouldn't know, and it shouldn't care. It's
>>>>>>>> the code that sends the collection to the algoritm that knows and cares.
>>>>>>> I think this is mistake. Algorithm should know. Otherwise "linear find" is not "linear find"! It is "cuadratic find" (spell?). If you want to define something called linear find then you must know iterator complexity.
>>>>>>>
>>>>>> If a generic algorithm describes itself as "linear find" then I know damn well that it's referring to the behavior of *just* the function itself, and is not a statement that the function *combined* with the behavior of the collection and/or a custom comparison is always going to be O(n).
>>>>>>
>>>>>> A question about STL: If I create a collection that, internally, is like a linked list, but starts each indexing operation from the position of the last indexing operation (so that a "find first" would run in O(n) instead of O(n*n)), is it possible to send that collection to STL's generic "linear find first"? I would argue that it should somehow be possible *even* if the STL's generic "linear find first" guarantees a *total* performance of O(n) (Since, in this case, it would still be O(n) anyway). Because otherwise, the STL wouldn't be very extendable, which would be a bad thing for a library of "generic" algorithms.
>>>>> Yes, it will work.
>>>>>
>>>>>> Another STL question: It is possible to use STL to do a "linear find" using a custom comparison? If so, it is possible to make STL's "linear find" function use a comparison that just happens to be O(n)? If so, doesn't that violate the linear-time guarantee, too? If not, how does it know that the custom comparison is O(n) instead of O(1) or O(log n)?
>>>>> This will work too.
>>>>>
>>>>> IF you follow the conventions THEN the STL gives you the guarantees.
>>>> I'm not sure that's really a "guarantee" per se, but that's splitting hairs.
>>>>
>>>> In any case, it sounds like we're all arguing more or less the same point:
>>>>
>>>> Setting aside the issue of "should opIndex be used and when?", suppose I have the following collection interface and find function (not guaranteed to compile):
>>>>
>>>> interface ICollection(T)
>>>> {
>>>>     T getElement(index);
>>>>     int getSize();
>>>> }
>>>>
>>>> int find(T)(ICollection(T) c, T elem)
>>>> {
>>>>     for(int i=0; i<c.size(); i++)
>>>>     {
>>>>  if(c.getElement(i) == elem)
>>>>             return i;
>>>>     }
>>>> }
>>>>
>>>> It sounds like STL's approach is to do something roughly like that and say:
>>>>
>>>> "find()'s parameter 'c' should be an ICollection for which getElement() is O(1), in which case find() is guaranteed to be O(n)"
>>>>
>>>> What I've been advocating is, again, doing something like the code above and saying:
>>>>
>>>> "find()'s complexity is dependant on the complexity of the ICollection's getElement(). If getElement()'s complexity is O(m), then find()'s complexity is guaranteed to be O(m * n). Of course, this means that the only way to get ideal complexity from find() is to use an ICollection for which getElement() is O(1)".
>>>>
>>>> But, you see, those two statements are effectively equivilent.
>>> They are. But...
>>> if you don't adhere to the conventions, your code gets really hard to reason about.
>>>
>>> "This class has an opIndex which is in O(n). Is that OK?" Well, that depends on what it's being used for. So you have to look at all of the places where it is used.
>>>
>>> It's much simpler to use the convention that opIndex _must_ be fast; this way the performance requirements for containers and algorithms are completely decoupled from each other. It's about good design.
>>>
>> Taking a slight detour, let me ask you this... Which of the following strategies do you consider to be better:
>>
>> //-- A --
>> value = 0;
>> for(int i=1; i<=10; i++)
>> {
>>    value += i*2;
>> }
>>
>> //-- B --
>> value = sum(map(1..10, {n * 2}));
>>
>> Both strategies compute the sum of the first 10 multiples of 2.
>>
>> Strategy A makes the low-level implementation details very clear, but IMO, it comes at the expense of high-level clarity. This is because the code intermixes the high-level "what I want to accomplish?" with the low-level details.
>>
>> Strategy B much more closely resembles the high-level desired result, and thus makes the high-level intent more clear. But this comes at the cost of hiding the low-level details behind a layer of abstraction.
>>
>> I may very well be wrong on this, but from what you've said it sounds like you (as well as the other people who prefer [] to never be O(n)) are the type of coder who would prefer "Strategy A". In that case, I can completely understand your viewpoint on opIndex, even though I don't agree with it (I'm a "Strategy B" kind of person).
>>
>> Of course, if I'm wrong on that assumption, then we're back to square one ;)
> 
> For me at least, you are wrong :)  In fact, I view it the other way, you shouldn't have to care about the underlying implementation, as long as the runtime is well defined.  If you tell me strategy B may or may not take up to O(n^2) to compute, then you bet your ass I'm not going to even touch option B, 'cause I can always get O(n) time with option A :)  Your solution FORCES me to care about the details, it's not so much that I want to care about them.

I agree.  It's about _which_ details do you want to abstract away. I don't care about the internals. But I _do_ care about the complexity of them.

August 29, 2008

Re: Prototypes (was: Why Strings as Classes?)

Posted by Manfred_Nowak
in reply to Fawzi Mohamed

Manfred_Nowak

Posted in reply to Fawzi Mohamed

Fawzi Mohamed wrote:

> how do you cancel a post?

By using a news-client, that has this feature.

-manfred

-- 
If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation