August 27, 2008
Dee Girl wrote:
> Michiel Helvensteijn Wrote:
>> That's simple. a[i] looks much nicer than a.nth(i).
> 
> It is not nicer. It is more deceiving (correct spell?). If you look at code it looks like array code.
> 
> foreach (i; 0 .. a.length)
> {
>     a[i] += 1;
> }
> 
> For array works nice. But for list it is terrible! Many operations for incrementing only small list.

Well, that's what you get with operator overloading.

The same thing could be said for "+" or "-". They're inherently deceiving, because they look like builtin operations on primitive data types.

For expensive operations (like performing division on an unlimited-precision decimal object), should the author of the code use "opDiv" or should he implement a separate "divide" function?

Forget opIndex for a moment, and ask the more general question about all overloaded operators. Should they imply any sort of asymptotic complexity guarantee?

Personally, I don't think so.

I don't like "nth".

I'd rather use the opIndex. And if I'm using a linked list, I'll be aware of the fact that it'll exhibit linear-time indexing, and I'll be cautious about which algorithms to use.

--benji
August 27, 2008
Michiel Helvensteijn Wrote:

> Dee Girl wrote:
> 
> >> Yes, the first 'trick' makes it a different datastructure. The second does not. Would you still be opposed to using opIndex if its time-complexity is O(log n)?
> > 
> > This is different question. And tricks are not answer for problem. Problem is list has other access method than array.
> 
> And what's the answer?

I accept logarithm complexity with []. Logarithm grows slow.

> >> With that second trick the loop would have the same complexity for lists.
> > 
> > Not for singly linked lists.
> 
> Yeah, also for singly linked lists.

May be it is not interesting discuss trick more. I am sure many tricks can be done. And many serious things. Can be done and have been done. They make list not a list any more.

> >> But putting that aside for the moment, are you saying you would allow yourself to be deceived by a syntax detail? No, mentally attaching O(1) to the *subscripting operator* is simply a legacy from C, where it is syntactic sugar for pointer arithmetic.
> > 
> > I do not think so. I am sorry. If a[n] is not allowed then other array
> > access primitive is allowed. Give index(a, n) as example. If language say
> > index(a, n) is array access then it is big mistake for list to also define
> > index(a, n). List maybe should define findAt(a, n). Then array also can
> > define findAt(a, n). It is not mistake.
> 
> Yes, I agree for "array access". That term implies O(1), since it uses the
> word "array". But I was argueing against the subscripting operator to be
> forced into O(1).

I am sorry. I do not understand your logic. My logic was this. Language has a[n] for array index. My opinion was then a[n] should not be linear search. I said also you can replace a[n] with index(a, n) and my same reason is the same. How are you argueing?

I did not want to get in this discussion. I see how it is confusing fast ^_^.

> >> Lists allocate memory for bare nodes, but never have to copy their elements. Arrays have to move their whole content to a larger memory location each time they are outgrown. For more complex data-types that means potentially very expensive copies.
> > 
> > I think this is mistake. I think you should google "amortized complexity". Maybe that can help much.
> 
> Amortized complexity has nothing to do with it. Dynamic arrays have to copy their elements and lists do not. It's as simple as that.

No, it is not. I am sorry! In STL there is copy. In D there is std.move. I think it only copies data by bits and clears source. And amortized complexity shows that there is o(1) bit copy on many append.

> >> But don't you understand that if this tool did exist, and the language had a standard notation for time/space-complexity, I could simply write:
> >> 
> >> sequence<T> s;
> >> /* fill sequence */
> >> sort(s);
> >> 
> >> And the compiler (in cooperation with this 'tool') could automatically find the most effective combination of data-structure and algorithm. The code would be more readable and efficient.
> > 
> > Michiel-san, STL does that. Or I misunderstand you?
> 
> STL will choose the right sorting algorithm, given a specific data-structure. But I am saying it may be possible also for the data-structure to be automatically chosen, based on what the programmer does with it.

I think this is interesting. Then why argueing for bad container design? I do not understand. Thank you, Dee Girl.
August 27, 2008
bearophile wrote:

> anyone is invited to spot problems

Biggest of all: using the wrong tool. I.e. using a hash map for a maximal populated key range.

-manfred


-- 
Maybe some knowledge of some types of disagreeing and their relation can turn out to be useful: http://blog.createdebate.com/2008/04/07/writing-strong-arguments/
August 27, 2008
Benji Smith Wrote:

> Dee Girl wrote:
> > Michiel Helvensteijn Wrote:
> >> That's simple. a[i] looks much nicer than a.nth(i).
> > 
> > It is not nicer. It is more deceiving (correct spell?). If you look at code it looks like array code.
> > 
> > foreach (i; 0 .. a.length)
> > {
> >     a[i] += 1;
> > }
> > 
> > For array works nice. But for list it is terrible! Many operations for incrementing only small list.
> 
> Well, that's what you get with operator overloading.

I am sorry. I disagree. I think that is what you get with bad design.

> The same thing could be said for "+" or "-". They're inherently deceiving, because they look like builtin operations on primitive data types.
> 
> For expensive operations (like performing division on an unlimited-precision decimal object), should the author of the code use "opDiv" or should he implement a separate "divide" function?

The cost of + and - is proportional to digits in number. For small number of digits computer does fast in hardware. For many digits the cost grows. The number of digits is log n. I think + and - are fine for big integer. I am not surprise.

> Forget opIndex for a moment, and ask the more general question about all overloaded operators. Should they imply any sort of asymptotic complexity guarantee?

I think depends on good design. For example I think ++ or -- for iterator. If it is O(n) it is bad design. Bad design make people say like you "This is what you get with operator overloading".

> Personally, I don't think so.
> 
> I don't like "nth".
> 
> I'd rather use the opIndex. And if I'm using a linked list, I'll be aware of the fact that it'll exhibit linear-time indexing, and I'll be cautious about which algorithms to use.

But inside algorithm you do not know if you use a linked list or a vector. You lost that information in bad abstraction. Also abstraction is bad because if you change data structure you have concept errors that still compile. And run until tomorrow ^_^.

I also like or do not like things. But good reason can convince me? Thank you, Dee Girl.
August 27, 2008
Dee Girl wrote:

>> >> But putting that aside for the moment, are you saying you would allow yourself to be deceived by a syntax detail? No, mentally attaching O(1) to the *subscripting operator* is simply a legacy from C, where it is syntactic sugar for pointer arithmetic.
>> > 
>> > I do not think so. I am sorry. If a[n] is not allowed then other array
>> > access primitive is allowed. Give index(a, n) as example. If language
>> > say index(a, n) is array access then it is big mistake for list to also
>> > define index(a, n). List maybe should define findAt(a, n). Then array
>> > also can define findAt(a, n). It is not mistake.
>> 
>> Yes, I agree for "array access". That term implies O(1), since it uses
>> the word "array". But I was argueing against the subscripting operator to
>> be forced into O(1).
> 
> I am sorry. I do not understand your logic. My logic was this. Language has a[n] for array index. My opinion was then a[n] should not be linear search. I said also you can replace a[n] with index(a, n) and my same reason is the same. How are you argueing?

Let me try again. I agree that you may impose complexity-restrictions in function contracts. If you write a function called index(a, n), you may impose the O(log n) syntax, for all I care. But the a[n] syntax is so convenient that I would hate for it to be likewise restricted. I would like to use it for lists and associative containers and the complexity may be O(n). The programmer should just be careful.

> I did not want to get in this discussion. I see how it is confusing fast ^_^.

I find much of this subthread confusing. (Starting with the discussion of Benji and superdan). It looks to me like 80% of the discussion is based on misunderstandings.

>> Amortized complexity has nothing to do with it. Dynamic arrays have to copy their elements and lists do not. It's as simple as that.
> 
> No, it is not. I am sorry! In STL there is copy. In D there is std.move. I think it only copies data by bits and clears source. And amortized complexity shows that there is o(1) bit copy on many append.

Yes, a bit-copy would be ok. I was thinking of executing the potentially more expensive copy constructor. It's nice that D doesn't have to do this.

>> STL will choose the right sorting algorithm, given a specific data-structure. But I am saying it may be possible also for the data-structure to be automatically chosen, based on what the programmer does with it.
> 
> I think this is interesting. Then why argueing for bad container design? I do not understand. Thank you, Dee Girl.

Where am I argueing for bad design? All I've been arguing for is looser restrictions for the subscripting operator. You should be able to use it on a list, even though the complexity is O(n). But if it is used often enough (in a deeply nested loop), the compiler will probably automatically use an array instead.

-- 
Michiel

August 27, 2008
Michiel Helvensteijn Wrote:

> Dee Girl wrote:
> 
> >> >> But putting that aside for the moment, are you saying you would allow yourself to be deceived by a syntax detail? No, mentally attaching O(1) to the *subscripting operator* is simply a legacy from C, where it is syntactic sugar for pointer arithmetic.
> >> > 
> >> > I do not think so. I am sorry. If a[n] is not allowed then other array
> >> > access primitive is allowed. Give index(a, n) as example. If language
> >> > say index(a, n) is array access then it is big mistake for list to also
> >> > define index(a, n). List maybe should define findAt(a, n). Then array
> >> > also can define findAt(a, n). It is not mistake.
> >> 
> >> Yes, I agree for "array access". That term implies O(1), since it uses
> >> the word "array". But I was argueing against the subscripting operator to
> >> be forced into O(1).
> > 
> > I am sorry. I do not understand your logic. My logic was this. Language has a[n] for array index. My opinion was then a[n] should not be linear search. I said also you can replace a[n] with index(a, n) and my same reason is the same. How are you argueing?
> 
> Let me try again. I agree that you may impose complexity-restrictions in function contracts. If you write a function called index(a, n), you may impose the O(log n) syntax, for all I care. But the a[n] syntax is so convenient that I would hate for it to be likewise restricted. I would like to use it for lists and associative containers and the complexity may be O(n). The programmer should just be careful.

Thank you for trying again. Thank you! I understand. Yes, a[n] is very convenient! And I would have agree 100% with you if a[n] was not build in language for array access. But because of that I 100% disagree ^_^.

I also think there is objective mistake. In concrete code programmer can be careful. But in generic code programmer can not be careful. I think this must to be explained better. But I am not sure I can.

> > I did not want to get in this discussion. I see how it is confusing fast ^_^.
> 
> I find much of this subthread confusing. (Starting with the discussion of Benji and superdan). It looks to me like 80% of the discussion is based on misunderstandings.
> 
> >> Amortized complexity has nothing to do with it. Dynamic arrays have to copy their elements and lists do not. It's as simple as that.
> > 
> > No, it is not. I am sorry! In STL there is copy. In D there is std.move. I think it only copies data by bits and clears source. And amortized complexity shows that there is o(1) bit copy on many append.
> 
> Yes, a bit-copy would be ok. I was thinking of executing the potentially more expensive copy constructor. It's nice that D doesn't have to do this.
> 
> >> STL will choose the right sorting algorithm, given a specific data-structure. But I am saying it may be possible also for the data-structure to be automatically chosen, based on what the programmer does with it.
> > 
> > I think this is interesting. Then why argueing for bad container design? I do not understand. Thank you, Dee Girl.
> 
> Where am I argueing for bad design? All I've been arguing for is looser restrictions for the subscripting operator. You should be able to use it on a list, even though the complexity is O(n). But if it is used often enough (in a deeply nested loop), the compiler will probably automatically use an array instead.

The idea is nice. But I think it can not be done. Tool is not mind reader. If I make some insert and some index. They want different structure. How does the tool know what I want fast?

I say you want bad design because tool does not exist. So we do not have the tool. But we can make good library with what we have. I am sure you can write library with a[n] in O(n). And it works. But I say is more inferior design than STL. Because your library allows things that should not work and does not warn programmer.
August 27, 2008
Manfred_Nowak:
> Biggest of all: using the wrong tool. I.e. using a hash map for a maximal populated key range.

But if the hash machinery is good it must work well in this common situation too.

Anyway, let's see how I can write a benchmark that you may like. I can use a very fast random generator to create random integer keys. This will probably make the Python version in disadvantage, because such language isn't fit for doing integer operations as fast as a compiled language (a sum among two integers may be 100 times slower in Python). This problem can be solved pre-computing the numbers first and putting them in the associative array later.
Is this enough to satisfy you?

Note that in the meantime I have created another associative array benchmark, this is string-based, and this time Python-Psyco comes out only about 2-2.5 times faster. I'll show it when I can...

Bye,
bearophile
August 27, 2008
Dee Girl wrote:

>> Where am I argueing for bad design? All I've been arguing for is looser restrictions for the subscripting operator. You should be able to use it on a list, even though the complexity is O(n). But if it is used often enough (in a deeply nested loop), the compiler will probably automatically use an array instead.
> 
> The idea is nice. But I think it can not be done. Tool is not mind reader. If I make some insert and some index. They want different structure. How does the tool know what I want fast?

In the future it may be possible to do such analysis. If the indexing is in a deeper loop, it may weigh more than the insertions you are doing. But failing that, the programmer might give the compiler 'hints' on which functions he/she wants faster.

-- 
Michiel

August 27, 2008
Michiel Helvensteijn:
> In the future it may be possible to do such analysis. If the indexing is in a deeper loop, it may weigh more than the insertions you are doing. But failing that, the programmer might give the compiler 'hints' on which functions he/she wants faster.

For example you can write Deque data structure made with a double linked list of small arrays. During run time it is able to collect few simple statistics of its usage, and it can grow or shrink the length of the arrays according to the cache line length and the patterns of its usage at runtime. There's a boolean constant that at compile time can switch off such collection of statistics, to make the data structure a bit faster but not adaptive. You may want the data structure not adaptive if you know very well what its future usage will be in the program, or in programs that run for few minutes/seconds. In programs that run for hours or days you may prefer a more adaptive data structure.

You can create similar data structures with languages that compile statically, but those operations look fitter when there's a virtual machine (HotSpot for example compiles and uncompiles code dynamically). LLMV looks like being able to be used in both situations :-)

Bye,
bearophile
August 27, 2008
"Dee Girl" wrote
> I think depends on good design. For example I think ++ or -- for iterator. If it is O(n) it is bad design. Bad design make people say like you "This is what you get with operator overloading".

Slightly off topic, when I was developing dcollections, I was a bit annoyed that there was no opInc or opDec, instead you have to use opAddAssign and opSubAssign.  What this means is that for a list iterator, if you want to allow the syntax:

iterator it = list.find(x);
(++it).value = 5;

or such, you have to define the operator opAddAssign.  This makes it possible to do:

it += 10;

Which I don't like for the same reason we are arguing about this, it suggests this is a simple operation, when in fact, it is O(n).  But there's no way around it, as you can't define ++it without defining +=.  Of course, I could throw an exception, but I decided against that.  Instead, I just warn the user in the docs to only ever use the ++x version.

Annoying...

-Steve