Prototypes (was: Why Strings as Classes?) (page 21) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Prototypes (was: Why Strings as Classes?) (page 21)

August 29, 2008

Re: Prototypes (was: Why Strings as Classes?)

Posted by Michiel Helvensteijn
in reply to Manfred_Nowak

Michiel Helvensteijn

Posted in reply to Manfred_Nowak

Manfred_Nowak wrote:

>> how do you cancel a post?
> 
> By using a news-client, that has this feature.

But the news-server also needs to have this feature, and not all do. (Does
this one?)

-- 
Michiel

August 29, 2008

Posted by Walter Bright
in reply to Fawzi Mohamed

Walter Bright

Posted in reply to Fawzi Mohamed

Fawzi Mohamed wrote:
> I think that the invitation should be read as the possibility to experiment with some changes for AA, see their effect, and if worthwhile provide them back, so that they can be applied to the "official" version.
> 
> making the standard version changeable seems just horrible form the portability and maintainability and clarity of the code: if the standard version is not ok for your use you should explicitly use another one, otherwise mixing codes that use two different standard versions becomes a nightmare.

I agree.

> On the other hand if you think that you can improve the standard version for everybody, changing internal/aaA.d is what you should do...

Right.

August 29, 2008

Re: Why Strings as Classes?

Posted by Nick Sabalausky
in reply to Steven Schveighoffer

Nick Sabalausky

Posted in reply to Steven Schveighoffer

 "Steven Schveighoffer" <schveiguy@yahoo.com> wrote in message
news:g983f5$2ns2$1@digitalmars.com...
> "Nick Sabalausky" wrote
>> Ok, so you want foo() to be able to tell if the collection has fast or slow indexing. What are you suggesting that foo() does when the collection does have slow indexing?
>
> No, I don't want to be able to tell.  I don't want to HAVE to be able to tell.

You're missing the point. Since, as you say below, you want foo to not be callable with the collection since it doesn't implement opIndex, your answer is clearly "#1, The program should fail to compile because foo's implementation uses [] and the slow-indexing collection doesn't implement []".

> In my ideal world, the collection does not implement opIndex unless it is fast, so there is no issue.  i.e. you cannot call foo with a linked list.
>
> I'm really tired of this argument, you understand my point of view, I understand yours.

..(line split for clarity)..
> To you, the syntax sugar is more important than the complexity guarantees.

Not at all. And to that effect, I've already presented a way that we can have both syntactic sugar and, when desired, complexity guarantees. In fact, the method I presented actually provides more protection against poor complexity than your method (Since the guarantee doesn't break when faced with code from people with my viewpoint on [], which as you admit below is neither more right nor more wrong than your viewpoint on []). Just because I don't agree with your method of implementing complexity guarantees, doesn't mean I don't think they can be valuable.

> To me, what the syntax intuitively means should be what it does.

I absolutely agree that "What the syntax intuitively means should be what it does". Where we disagree is on "what the [] syntax intuitively means".

> So I'll develop my collections library and you develop yours, fair enough? I don't think either of us is right or wrong in the strict sense of the terms.
>
> To be fair, I'll answer your other points as you took the time to write them.  And then I'm done.  I can't really be any clearer as to what I believe is the best design.
>
>> 1. Should it fail to compile because foo's implementation uses [] and the slow-indexing collection doesn't implement []?
>
> No, foo will always compile because opIndex should always be fast, and then I can specify the complexity of foo without worry.
>
> Using an O(n) lookup operation should be more painful because it requires more time.  It makes users use it less.
>
>> 2. Should foo revert to an alternate branch of code that doesn't use []?
>>
>> This behavior can be implemented via interfaces like I described. The benefit of that is that [] can still serve as the shorthand it's intended for (see below) and you never need to introduce the inconsistency of "Gee, how do I get the Nth element of a collection?" "Well, on some collections it's getNth(), and on other collections it's []."
>
> I believe that you shouldn't really ever be calling getNth on a link-list, and if you are, it should be a red flag, like a cast.
>
> Furthermore [] isn't always equivalent to getNth, see below.
>

Addressed below...

>>>> As for the risk that could create of accidentially sending a linked list to a "search" (ie, a "search for an element which contains data X") that uses [] internally instead of iterators (but then, why wouldn't it just use iterators anyway?): I'll agree that in a case like this there should be some mechanism for automatic choosing of an algorithm, but that mechanism should be at a separate level of abstraction. There would be a function "search" that, through either RTTI or template constraints or something else, says "does collection 'c' implement ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the collection have attribute ForewardDirectionIndexingComplexity that is set equal to Complexity.Constant?", and based on that passes control to either IndexingSearch or IteratorSearch.
>>>
>>> To me, this is a bad design.  It's my opinion, but one that is shared among many people.  You can do stuff this way, but it is not intuitive. I'd much rather reserve opIndex to only quick lookups, and avoid the possibility of accidentally using it incorrectly.
>>>
>>
>> Preventing a collection from ever being used in a function that would typically perform poorly on that collection just smacks of premature optimization. How do you, as the collection author, know that the collection will never be used in a way such that *occasional* use in certain specific sub-optimal a manner might actually be necessary and/or acceptable?
>
> It's not premature optimization, it's not offering a feature that has little or no use.  It's like any contract for any object, you only want to define the interface for which your object is designed.  A linked list should not have an opIndex because it's not designed to be indexed.
>

Addressed below...

> If I designed a new car with which you could steer each front wheel independently, would that make you buy it?  It's another feature that the car has that other cars don't.  Who cares if it's useful, its another *feature*!  Sometimes a good design is not that a feature is included but that a feature is *not* included.
>

So, in other words, it sounds like you're saying that in my scenario above, you think that a linked list should not be usable, even if it is faster in the greater context (Without actually saying so directly). Or do you claim that the scenario can never happen?

>> If you omit [] then you've burnt the bridge (so to speak) and your only recourse is to add a standardized "getNth()" to every single collection which clutters the interface, hinders integration with third-party collections and algorithms, and is likely to still suffer from idiots who think that "get Nth element" is always better than O(n) (see below).
>
> I'd reserve getNth for linked lists only, if I implemented it at all.  It is a useless feature.  The only common feature for all containers should be iteration, because 'iterate next element' is always an O(1) operation (amortized in the case of trees).
>
>>> In general, I'd say if you are using lists and frequently looking up the nth value in the list, you have chosen the wrong container for the job.
>>>
>>
>> If you're frequently looking up random elements in a list, then yes, you're probably using the wrong container. But that's beside the point. Even if you only do it once: If you have a collection with a natural order, and you want to get the nth element, you should be able to use the standard "get element at index X" notation, [].
>
> I respectfully disagree.  For the reasons I've stated above.
>
>> I don't care how many people go around using [] and thinking they're guaranteed to get a cheap computation from it. In a language that supports overloading of [], the [] means "get the element at key/index X". Especially in a language like D where using [] on an associative array can trigger an unbounded allocation and GC run. Using [] in D (and various other languages) can be expensive, period, even in the standard lib (assoc array). So looking at a [] and thinking "guaranteed cheap", is incorrect, period. If most people think 2+2=5, you're not going to redesign arithmetic to work around that mistaken assumption.
>
> Your assumption is that 'get the Nth element' is the only expectation for opIndex interface.  My assumption is that opIndex implies 'get an element efficiently' is an important part of the interface.  We obviously disagree, and as I said above, neither of us is right or wrong, strictly speaking. It's a matter of what is intuitive to you.
>
> Part of the problems I see with many bad designs is the author thinks they see a fit for an interface, but it's not quite there.  They are so excited about fitting into an interface that they forget the importance of leaving out elements of the interface that don't make sense.  To me this is one of them.  An interface is a fit IMO if it fits exactly.  If you have to do things like implement functions that throw exceptions because they don't belong, or break the contract that the interface specifies, then either the interface is too specific, or you are not implementing the correct interface.
>

(From the above "Addressed below..."'s)

I fully agree that leaving the wrong things out of an interface is just as important as putting the right things in. But I don't think that's applicable here.

An array can do anything a linked list can do (even insert). A linked list can do anything an array can do (even sort). They are both capable of the same exact set of basic operations: insert, delete, get at position, get position of, append, iterate, etc). The only thing that ever differs is how well each type of collection scales on each of those basic operations. The *whole point* of having both arrays and linked lists is that they provide different performance tradeoffs, not that they "implement different interfaces", because obviously they're all capable of doing the same things. It's the performance tradeoffs that are the whole point of "array vs linked list". But it's rarely as simple as just looking at the basic operations individually...

Its rare that a collection would ever be used for just one basic operation. What's the point sorting a collection if you're never going to insert anything into it? What's the point of inserting data if you're never going to retrieve any? In most cases, you're going to be doing multiple types of operations on the collection, therefore the choice of collection becomes "Which set of tradeoffs are the most worthwhile for my overall usage patters?"

You can speculate and analyze all you want about the usage patterns and the appropriate tradeoffs, and that's good, you certainly should. But it ultimately comes down to the real word tests: profiling. And if you're profiling, you're going to want to compare the performance of different types of collections. And if you're going to do that, why should you prevent yourself from making it a one-line change ("Vector myBunchOfStuff" <-> "List myBunchOfStuff"), just because the fear of someone using an array for an insert-intensive purpose, or a list for a random-access-intensive purpose, drove you to design your code in a way that forces a single change of type to (in many cases) be an all-out refactoring - and it'll be the type of refactoring that no automatic refactoring tool is going to do for you.

And suppose you do successfully find that optimal container, through your method or mine. Then a program feature/requirement is changed/added/removed, and all of a sudden, the usage patterns have changed! Now you get to do it all again! Major refactor then profile or change a line then profile?

You're looking at guaranteeing the performance of very narrow slices of a program. I'll agree that can be useful in some cases (hence, my proposal for how to implement performance guarantees). But in many cases, that's effectively a "taken out of context" fallacy and can lead to trouble.

>>>> If you've got a linked list, and you want to get element N, are you *really* going to go reaching for a function named "search"? How often do you really see a generic function named "search" or "find" that takes a numeric index as a the "to be found" parameter instead of something to be matched against the element's value? I would argue that that would be confusing for most people. Like I said in a different post farther down, the implementation of a "getAtIndex()" is obviously going to work like a search, but from "outside the box", what you're asking for is not the same.
>>>
>>> If you are indexing into a tree, it is considered a binary search, if you are indexing into a hash, it is a search at some point to deal with collisions.  People don't think about indexing as being a search, but in reality it is.  A really fast search.
>>>
>>
>> It's implemented as a search, but I'd argue that the input/output specifications are different. And yes, I suppose that does put it into a bit of a grey area. But I wouldn't go so far as to say that, to the caller, it's the same thing, because there are differences. If you want get an element based on it's position in the collection, you call one function. If you want to get an element based on it's content instead of it's position, that's another function. If you want to get the position of an element based on it's content or it's identity, that's one or two more functions (depending, of course, if the element is a value type or reference type, respectively).
>
> I disagree.  I view the numeric index of an ordered container as a 'key' into the container.  A keyed container has the ability to look up elements quickly with the key.
>
> Take a quick look at dcollections' ArrayList.  It implements the Keyed interface, with uint as the key.  I have no key for LinkList, because I don't see a useful key.
>
>>> And I don't think search would be the name of the member function, it should be something like 'getNth', which returns a cursor that points to the element.
>>>
>>
>> Right, and outside of pure C, [] is the shorthand for and the standardized name for "getNth". If someone automatically assumes [] to be a simple lookup, chances are they're going to make the same assumption about anything named along the lines of "getNth". After all, that's what [] does, it gets the Nth.
>
> I view [] as "getByIndex", index being a value that offers quick access to elements.  There is no implied 'get the nth element'.  Look at an associative array.  If I had a string[string] array, what would you expect to get if you passed an integer as the index?
>

You misunderstand. I'm well aware of the sequentially-indexed array vs associative array issues. I was just using "sequentially-indexed array" terminology to avoid cluttering the explanations with more general terms that would have distracted from bigger points. By "getNth", what I was getting at was "getByPosition". Maybe I should have been saying "getByPosition" from the start, my mistake. As you can see, I still consider the key of an associative array to be it's position. I'll explain why:

An associative array is the dynamic/runtime equivalent of a static/compiletime named variable (After all, in many dynamic languages, like PHP (not that I like PHP), named variables literally are keys into an implicit associative array). In a typical static or dynamic language, all variables are essentially made up of two parts: The raw data and a label. The label, obviously, is what's used to refer to the data. The label can be one of two things, an identifier or (in a non-sandboxed language) a dereferenced memory address.

So, borrowing the usual pointer metaphor of "memory as a series of labeled boxes", we can have the data "7" in the 0xA04D6'th "box" which is also labeled with the identifier "myInt". The memory address, obviously, is the position of the data. The identifier is another way to to refer the same position. "CPU: Where should I put this 7?" "High-level Code: In the location labeled with the identifier myInt".

The data of a variable corresponds to an element of any collection (array, assoc array, list). The memory addresses not only correspond to, but literally are sequential indicies into the array of addressable memory (ie, the key/position in a sequentially-indexed array). The identifier corresponds to the key of an associative array or other such collection. "CPU: Where, within the assoc array, should I put this 7?" "High-level Code: In the assoc array's box/element labeled myInt"

(With a linked list, of course, there's nothing that corresponds to the key of an assoc array, but it does have a natural sequential order.)

Maybe I can explain the "sorting" distinction I see a little bit better with our terminology hopefully now in closer sync: For any collection, each element has a concept of position (index/key/nth/whatever) and a concept of data. A collection is a series of "boxes". On the outside of each box is a label (position/index/key/nth/whatever). On the inside of each box is data. If the collection's base type is a reference type, then this "inside data" is, of course, a pointer/reference to more data somewhere else. There are two basic conceptual operations: "outside label -> inside data", and "inside data -> outside label".

The "inside data -> outside label" is always a search (although if the inside data contains a cached copy of it's outside label, then that's somewhat of a grey area. Personally, I would count it as a "cached search": usable just like a search, but faster).

The "outside label -> inside data" is, of course, our disputed "getAtPosition". In a linked list, it's a grey area similar to hat I called a "cached search" above. It's usable like an ordinary "getAtPosition", but slower. Sure, the implementation is done via a search algoritm, but if you call it a search that means that for a linked list, "getAtPosition" and search are the same thing (for whatever that implies, I don't have time to go any further on that ATM, so take it as you will).

I do understand though, that you're defining "index" and "search" essentially as "fast" and "slow" versions (respectively) of "X" -> "Y" regardless of which of X or Y is "outside label" and which is "inside data". Personally, I find that awkward and somewhat less useful since that means "index" and "search" each have multiple "input vs. output" behaviors (Ie, there's still the question of "Am I giving the outside position and getting the inside data, or vice versa?").

August 29, 2008

Posted by Manfred_Nowak
in reply to Walter Bright

Manfred_Nowak

Posted in reply to Walter Bright

Walter Bright wrote:

>  use two different standard versions becomes a nightmare.
> I agree.

I retracted my posting immediately because it wasn't well thought out. However, the least I wanted was to have "several" "standard" versions.

So we all agree on this.

But even when I read my rectracted posting again, I can not imagine how one can come to the conclusion, that I wanted to have several.

> 
>> On the other hand if you think that you can improve the standard version for everybody, changing internal/aaA.d is what you should do...
> Right.

I wrote about that some years ago and got no answer:
   what is an improvement for everybody---or
   what is the general usage?

Whithout an agreed definition on that, every change will make someone else cry.

-manfred

-- 
If life is going to exist in this Universe, then the one thing it cannot afford to have is a sense of proportion. (Douglas Adams)

August 30, 2008

Re: Why Strings as Classes?

Posted by Christopher Wright
in reply to Don

Christopher Wright

Posted in reply to Don

Don wrote:
> Steven Schveighoffer wrote:
>> "Nick Sabalausky" wrote
>>> "Don" <nospam@nospam.com.au> wrote in message news:g95td3$2tu0$1@digitalmars.com...
>>>> Nick Sabalausky wrote:
>>>>> "Don" <nospam@nospam.com.au> wrote in message news:g95ks5$2aon$1@digitalmars.com...
>>>>>> Nick Sabalausky wrote:
>>>>>>> "Dee Girl" <deegirl@noreply.com> wrote in message news:g94j7a$2875$1@digitalmars.com...
>>>>>>>> I appreciate your view point. Please allow me explain. The view point is in opposition with STL. In STL each algorithm defines what kind of iterator it operates with. And it requires what iterator complexity.
>>>>>>>>
>>>>>>>> I agree that other design can be made. But STL has that design. In my opinion is much part of what make STL so successful.
>>>>>>>>
>>>>>>>> I disagree that algorithm that knows complexity of iterator is concrete. I think exactly contrary. Maybe it is good that you read book about STL by Josuttis. STL algorithms are the most generic I ever find in any language. I hope std.algorithm in D will be better. But right now std.algorithm works only with array.
>>>>>>>>
>>>>>>>>> If an algoritm uses [] and doesn't know the
>>>>>>>>> complexity of the []...good! It shouldn't know, and it shouldn't care. It's
>>>>>>>>> the code that sends the collection to the algoritm that knows and cares.
>>>>>>>> I think this is mistake. Algorithm should know. Otherwise "linear find" is not "linear find"! It is "cuadratic find" (spell?). If you want to define something called linear find then you must know iterator complexity.
>>>>>>>>
>>>>>>> If a generic algorithm describes itself as "linear find" then I know damn well that it's referring to the behavior of *just* the function itself, and is not a statement that the function *combined* with the behavior of the collection and/or a custom comparison is always going to be O(n).
>>>>>>>
>>>>>>> A question about STL: If I create a collection that, internally, is like a linked list, but starts each indexing operation from the position of the last indexing operation (so that a "find first" would run in O(n) instead of O(n*n)), is it possible to send that collection to STL's generic "linear find first"? I would argue that it should somehow be possible *even* if the STL's generic "linear find first" guarantees a *total* performance of O(n) (Since, in this case, it would still be O(n) anyway). Because otherwise, the STL wouldn't be very extendable, which would be a bad thing for a library of "generic" algorithms.
>>>>>> Yes, it will work.
>>>>>>
>>>>>>> Another STL question: It is possible to use STL to do a "linear find" using a custom comparison? If so, it is possible to make STL's "linear find" function use a comparison that just happens to be O(n)? If so, doesn't that violate the linear-time guarantee, too? If not, how does it know that the custom comparison is O(n) instead of O(1) or O(log n)?
>>>>>> This will work too.
>>>>>>
>>>>>> IF you follow the conventions THEN the STL gives you the guarantees.
>>>>> I'm not sure that's really a "guarantee" per se, but that's splitting hairs.
>>>>>
>>>>> In any case, it sounds like we're all arguing more or less the same point:
>>>>>
>>>>> Setting aside the issue of "should opIndex be used and when?", suppose I have the following collection interface and find function (not guaranteed to compile):
>>>>>
>>>>> interface ICollection(T)
>>>>> {
>>>>>     T getElement(index);
>>>>>     int getSize();
>>>>> }
>>>>>
>>>>> int find(T)(ICollection(T) c, T elem)
>>>>> {
>>>>>     for(int i=0; i<c.size(); i++)
>>>>>     {
>>>>>  if(c.getElement(i) == elem)
>>>>>             return i;
>>>>>     }
>>>>> }
>>>>>
>>>>> It sounds like STL's approach is to do something roughly like that and say:
>>>>>
>>>>> "find()'s parameter 'c' should be an ICollection for which getElement() is O(1), in which case find() is guaranteed to be O(n)"
>>>>>
>>>>> What I've been advocating is, again, doing something like the code above and saying:
>>>>>
>>>>> "find()'s complexity is dependant on the complexity of the ICollection's getElement(). If getElement()'s complexity is O(m), then find()'s complexity is guaranteed to be O(m * n). Of course, this means that the only way to get ideal complexity from find() is to use an ICollection for which getElement() is O(1)".
>>>>>
>>>>> But, you see, those two statements are effectively equivilent.
>>>> They are. But...
>>>> if you don't adhere to the conventions, your code gets really hard to reason about.
>>>>
>>>> "This class has an opIndex which is in O(n). Is that OK?" Well, that depends on what it's being used for. So you have to look at all of the places where it is used.
>>>>
>>>> It's much simpler to use the convention that opIndex _must_ be fast; this way the performance requirements for containers and algorithms are completely decoupled from each other. It's about good design.
>>>>
>>> Taking a slight detour, let me ask you this... Which of the following strategies do you consider to be better:
>>>
>>> //-- A --
>>> value = 0;
>>> for(int i=1; i<=10; i++)
>>> {
>>>    value += i*2;
>>> }
>>>
>>> //-- B --
>>> value = sum(map(1..10, {n * 2}));
>>>
>>> Both strategies compute the sum of the first 10 multiples of 2.
>>>
>>> Strategy A makes the low-level implementation details very clear, but IMO, it comes at the expense of high-level clarity. This is because the code intermixes the high-level "what I want to accomplish?" with the low-level details.
>>>
>>> Strategy B much more closely resembles the high-level desired result, and thus makes the high-level intent more clear. But this comes at the cost of hiding the low-level details behind a layer of abstraction.
>>>
>>> I may very well be wrong on this, but from what you've said it sounds like you (as well as the other people who prefer [] to never be O(n)) are the type of coder who would prefer "Strategy A". In that case, I can completely understand your viewpoint on opIndex, even though I don't agree with it (I'm a "Strategy B" kind of person).
>>>
>>> Of course, if I'm wrong on that assumption, then we're back to square one ;)
>>
>> For me at least, you are wrong :)  In fact, I view it the other way, you shouldn't have to care about the underlying implementation, as long as the runtime is well defined.  If you tell me strategy B may or may not take up to O(n^2) to compute, then you bet your ass I'm not going to even touch option B, 'cause I can always get O(n) time with option A :)  Your solution FORCES me to care about the details, it's not so much that I want to care about them.
> 
> I agree.  It's about _which_ details do you want to abstract away. I don't care about the internals. But I _do_ care about the complexity of them.

We all agree about this. What we disagree about is how to find out about the complexity of an operation -- by whether it overloads an operator or by some metadata.

In terms of code, the difference is:
/* Operator overloading */
void foo(T)(T collection)
{
	static if (is (typeof (T[0]))) { ... }
}

/* Metadata */
void foo(T)(ICollection!(T) collection)
{
	if ((cast(FastIndexedCollection)collection) !is null) { ... }
}


You do need a metadata solution, whichever you choose. Otherwise you can't differentiate at runtime.

August 30, 2008

Re: Why Strings as Classes?

Posted by Walter Bright
in reply to Robert Fraser

Walter Bright

Posted in reply to Robert Fraser

Robert Fraser wrote:
> The big problem IMO is the number of primitive things you need to understand. In A, you need to understand variables, looping and arithmetic operations. In B, you need to understand and think about closures/scoping, lists, the "map" function, aggregate functions, function compositions, and arithmetic operations. What hit me when first looking at it "where the **** did n come from?"

I think B should be clearer and more intuitive, it's just that I'm not used to B at all whereas A style has worn a very deep groove in my brain.

August 30, 2008

Re: Why Strings as Classes?

Posted by bearophile
in reply to Walter Bright

bearophile

Posted in reply to Walter Bright

Walter Bright:

>I think B should be clearer and more intuitive, it's just that I'm not used to B at all whereas A style has worn a very deep groove in my brain.<

Well, if you use D 2 you write it this way:

value = 0;
foreach (i; 1 .. 11)
    value += i * 2;

Using my libs you can write:

auto value = sum(map((int i){return i * 2;}, range(1, 11)));

But that creates two intermediate lists, so you may want to go all lazy instead:

auto value = sum(xmap((int i){return i * 2;}, xrange(1, 11)));

That's short and fast and uses very little (a constant amount of) memory, but you have to count the open and closed brackets to be sure the expression is correct...
So for me the most clear solution is the Python (lazy) one:

value = sum(i * 2 for i in xrange(1, 11))

That's why I suggested a similar syntax for D too ;-)

Bye,
bearophile

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation