August 26, 2008
bearophile wrote:
> Walter Bright:
>>> We've already observed that D assoc arrays are less performant than even Python maps, so the extra cost of lookup operations is unwelcome.
>> Every one of those benchmarks that purported to show that D AA's were relatively slow turned out to be, on closer examination, D running the garbage collector more often than Python does. It had NOTHING to do with the AA's.
> 
> Really? I must have missed those conclusions then, despite reading all the posts on the subject. What solutions do you propose for the problem then? I recall that disabling the GC didn't improve the situation much. So the problem now becomes how to improve the D GC? 
> 
> In my site I am keeping a gallery of tiny benchmarks where D code (with DMD) is 10 or more times slower than very equivalent Python, C, Java code (I have about 12 programs so far, very different from each other. There's a benchmark regarding the associative arrays too). Hopefully it will become useful once people will start tuning D implementations.
> 
> Bye,
> bearophile

Might I ask where that site is?
I'd like to compare them against LLVMDC if possible

Tomas
August 26, 2008
Christopher Wright Wrote:

> superdan wrote:
> >> class LinkedList(T)
> >> {
> >> 	Node!(T) head;
> >>
> >> 	/** Gets the ith item of the list. Throws: SIGSEGV if i >= length of
> >> the list. Time complexity: O(N) for a list of length N. This operation
> >> is provided for completeness and not recommended for frequent use in
> >> large lists. */
> >> 	T opIndex(int i)
> >> 	{
> >> 		auto current = head;
> >> 		while (i)
> >> 		{
> >> 			current = current.next;
> >> 		}
> >> 		return current.value;
> >> 	}
> >> }
> > 
> > oopsies. houston we got a problem here. problem is all that pluggable sort business works only if it can count on a constant time opIndex. why? because sort has right in its spec that it takes o(n log n) time. if u pass LinkedList to it you obtain a nonsensical design that compiles but runs afoul of the spec. because with that opIndex sort will run in quadratic time and no amount of commentin' is gonna save it from that.
> 
> WRONG!
> Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
> for this linked list.

you got the wrong definition of correct. sort became a non-scalable algo from a scalable algo. whacha gonna say next. bubble sort is viable?!?

> > this design is a stillborn.
> > 
> > what needs done is to allow different kinds of containers implement different interfaces. in fact a better way to factor things is via iterators, as stl has shown.
> 
> You didn't ask for an O(1) opIndex on a linked list. You asked for a correct opIndex on a linked list.

the correct opIndex runs in o(1).

> Any sorting algorithm you name that would work on an array would also work on this linked list. Admittedly, insertion sort would be much faster than qsort, but if your library provides that, you, knowing that you are using a linked list, would choose the insertion sort algorithm.

no. the initial idea was for a design that allows that cool mixing and matching gig thru interfaces without knowing what is where. but ur design leads to a lot of unworkable mixing and matching.

it is a stillborn design.

> >>> how do you make a hashtable, an array, and a linked list obey the same interface? guess hashtable has stuff that others don't support?
> >> You have an interface for collections (you can add, remove, and get the length, maybe a couple other things).
> > 
> > this is an incomplete response. what do you add for a vector!(T)? A T. what do you add for a hash!(T, U)? you tell me. and you tell me how you make that signature consistent across vector and hash.
> 
> interface List(T) : Collection!(T) {}
> 
> class Vector(T) : List!(T) {}
> 
> class HashMap(T, U) : Collection!(KeyValuePair!(T, U)) {}
> 
> Have a look at C#'s collection classes. They solved this problem.

dood. i know how the problem is solved. stl solved that before c#. my point was that making vector!(string) and hash!(int, string) offer the same interface is a tenuous proposition.

> >> You have an interface for lists (they're collections, and you can index them).
> > 
> > wrong. don't ever mention a linear-time indexing operator in an interview. you will fail it right then. you can always define linear-time indexing as a named function. but never masquerade it as an index operator.
> 
> If you create a linked list with O(1) indexing, that might suffice to get you a PhD. If you claim that you can do so in an interview, you should be required to show proof; and should you fail to do so, you will probably be shown the door.
> 
> Even if you did prove it in the interview, they would probably consider you overqualified, unless the company's focus was data structures.

my point was opIndex should not be written for a list to begin with.

> >> Then you can use all the collection-oriented stuff with lists, and you can do special list-type things with them if you want.
> >>
> >>> these are serious questions. not in jest not rhetorical and not trick.
> >> Yes, but they're solved problems.
> > 
> > apparently not since you failed at'em.
> 
> You claimed that a problem with an inefficient solution has no solution, and then you execrate me for providing an inefficient solution. Why?

because the correct answer was: a list cannot implement opIndex. it must be in a different hierarchy branch than a vector. which reveals one of the wrongs in the post i answered.

> >>>> Someone else comes along and writes a library of algorithms. The algorithms can operate on any container that implements the necessary operations.
> >>> hm. things are starting to screech a bit. but let's see your answers to your questions above.
> >>>
> >>>> When someone clever comes along and writes a new sorting algorithm, I can plug my new container class right into it, and get the algorithm for free. Likewise for the guy with the clever new collection class.
> >>> things ain't that simple.
> >> Collection-oriented library code will care sufficiently about performance that this mix-and-match stuff is not feasible.
> > 
> > what's that supposed to mean? you sayin' stl don't exist?
> 
> No, I'm saying that for efficiency, you need to know about the internals of a data structure to implement a number of collection-oriented algorithms. Just like the linked list example.

wrong. you only need to define your abstract types appropriately. e.g. stl defines forward and random iterators. a forward iterators has ++ but no []. random iterator has both. so a random iterator can be substituted for a forward iterator. but not the other way. bottom line, sort won't compile on a forward iterator. your design allowed it to compile. which makes the design wrong.

> >> Almost anything else doesn't care enough to take only an AssociativeArrayHashSet and not a TreeSet or a LinkedList or a primitive array.
> > 
> > wrong for the reasons above.
> 
> They expose very similar interfaces. You might care about which you choose because of the efficiency of various operations, but most of your code won't care which type it gets; it would still be correct.

no. you got the wrong notion of correctness.

> Well, sets have a property that no element appears in them twice, so that is an algorithmic consideration sometimes.

finally a good point. true. thing is, that can't be told with types. indexing can.

> >>> saw this flick "the devil wears prada", an ok movie but one funny remark stayed with me. "you are in desperate need of chanel."
> >>>
> >>> i'll paraphrase. "you are in desperate need of stl." you need to learn stl and then you'll figure why you can't plug a new sorting algorithm into a container. you need more guarantees. and you need iterators.
> >>>
> >>>> We don't bat an eye at the idea of containers & algorithms connecting to one another using a reciprocal set of interfaces.
> >>> i do. it's completely wrong. you need iterators that broker between containers and algos. and iterators must give complexity guarantees.
> >> I don't. If I'm just going to iterate through the items of a collection, I only care about the opApply. If I need to index stuff, I don't care if I get a primitive array or Bob Dole's brand-new ArrayList class.
> > 
> > you too are in desperate need for stl.
> 
> You are in desperate need of System.Collections.Generic. Or tango.util.container.

guess my advice fell on deaf ears eh.

btw my respect for tango improved when i found no opIndex in their list container.
August 26, 2008
"superdan" wrote
> Christopher Wright Wrote:
>
>> superdan wrote:
>> >> class LinkedList(T)
>> >> {
>> >> Node!(T) head;
>> >>
>> >> /** Gets the ith item of the list. Throws: SIGSEGV if i >= length of
>> >> the list. Time complexity: O(N) for a list of length N. This operation
>> >> is provided for completeness and not recommended for frequent use in
>> >> large lists. */
>> >> T opIndex(int i)
>> >> {
>> >> auto current = head;
>> >> while (i)
>> >> {
>> >> current = current.next;
>> >> }
>> >> return current.value;
>> >> }
>> >> }
>> >
>> > oopsies. houston we got a problem here. problem is all that pluggable sort business works only if it can count on a constant time opIndex. why? because sort has right in its spec that it takes o(n log n) time. if u pass LinkedList to it you obtain a nonsensical design that compiles but runs afoul of the spec. because with that opIndex sort will run in quadratic time and no amount of commentin' is gonna save it from that.
>>
>> WRONG!
>> Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
>> for this linked list.
>
> you got the wrong definition of correct. sort became a non-scalable algo from a scalable algo. whacha gonna say next. bubble sort is viable?!?

O(n^2 log n) is still considered solved.  Anything that is not exponential is usually considered P-complete, meaning it can be solved in polynomial time.  The unsolvable problems are NP-complete, i.e. non-polynomial.  Non polynomial usually means n is in one of the exponents, e.g.:

O(2^n).

That being said, it doesn't take a genius to figure out that a standard sorting algorithm on a linked list while trying to use random access is going to run longer than the same sorting algorithm on a random-access list. But there are ways around this.  For instance, you can sort a linked list in O(n log n) time with (in pseudocode):

vector v = list; // copy all elements to v, O(n)
v.sort; // O(n lgn)
list.replaceAll(v); // O(n)

So the total is O(2n + n lgn), and we all know you always take the most significant part of the polynomial, so it then becomes:

O(n lgn)

Can I have my PhD now? :P

In all seriousness though, with the way you can call functions with arrays as the first argument like member functions, it almost seems like they are already classes.  One thing I have against having a string class be the default, is that you can use substring on a string in D without any heap allocation, and it is super-fast.  And I think substring (slicing) is one of the best features that D has.

FWIW, you can have both a string class and an array representing a string, and you can define the string class to use an array as it's backing storage. I do this in dcollections (ArrayList).  If you want the interface, wrap the array, if you want the speed of an array, it is accessible as a member. This allows you to decide whichever one you want to use.  You can even use algorithms on the array like sort by using the member because you are accessing the actual storage of the ArrayList.

-Steve


August 26, 2008
Steven Schveighoffer Wrote:

> "superdan" wrote
> > Christopher Wright Wrote:
> >
> >> superdan wrote:
> >> >> class LinkedList(T)
> >> >> {
> >> >> Node!(T) head;
> >> >>
> >> >> /** Gets the ith item of the list. Throws: SIGSEGV if i >= length of
> >> >> the list. Time complexity: O(N) for a list of length N. This operation
> >> >> is provided for completeness and not recommended for frequent use in
> >> >> large lists. */
> >> >> T opIndex(int i)
> >> >> {
> >> >> auto current = head;
> >> >> while (i)
> >> >> {
> >> >> current = current.next;
> >> >> }
> >> >> return current.value;
> >> >> }
> >> >> }
> >> >
> >> > oopsies. houston we got a problem here. problem is all that pluggable sort business works only if it can count on a constant time opIndex. why? because sort has right in its spec that it takes o(n log n) time. if u pass LinkedList to it you obtain a nonsensical design that compiles but runs afoul of the spec. because with that opIndex sort will run in quadratic time and no amount of commentin' is gonna save it from that.
> >>
> >> WRONG!
> >> Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
> >> for this linked list.
> >
> > you got the wrong definition of correct. sort became a non-scalable algo from a scalable algo. whacha gonna say next. bubble sort is viable?!?
> 
> O(n^2 log n) is still considered solved.

of course. for traveling salesman that is. just not sorting. o(n^2 log n) sorting algo, that's  a no-go. notice also i didn't say solved/unsolved. i said scalable. stuff beyond n log n don't scale.

>  Anything that is not exponential
> is usually considered P-complete, meaning it can be solved in polynomial
> time.  The unsolvable problems are NP-complete, i.e. non-polynomial.  Non
> polynomial usually means n is in one of the exponents, e.g.:
> 
> O(2^n).

sure thing.

> That being said, it doesn't take a genius to figure out that a standard sorting algorithm on a linked list while trying to use random access is going to run longer than the same sorting algorithm on a random-access list. But there are ways around this.  For instance, you can sort a linked list in O(n log n) time with (in pseudocode):
> 
> vector v = list; // copy all elements to v, O(n)
> v.sort; // O(n lgn)
> list.replaceAll(v); // O(n)

sure thing. problem is, you must know it's a list. otherwise you wouldn't make a copy.

don't forget how this all started when answering tiny bits of my post. it started with a dood claiming vector and list both have opIndex and then sort works with both without knowing the details. it don't work with both.

> So the total is O(2n + n lgn), and we all know you always take the most significant part of the polynomial, so it then becomes:
> 
> O(n lgn)
> 
> Can I have my PhD now? :P

sure. i must have seen an email with an offer somewhere ;)

> In all seriousness though, with the way you can call functions with arrays as the first argument like member functions, it almost seems like they are already classes.  One thing I have against having a string class be the default, is that you can use substring on a string in D without any heap allocation, and it is super-fast.  And I think substring (slicing) is one of the best features that D has.
> 
> FWIW, you can have both a string class and an array representing a string, and you can define the string class to use an array as it's backing storage. I do this in dcollections (ArrayList).  If you want the interface, wrap the array, if you want the speed of an array, it is accessible as a member. This allows you to decide whichever one you want to use.  You can even use algorithms on the array like sort by using the member because you are accessing the actual storage of the ArrayList.

that sounds better.
August 26, 2008
Jb Wrote:

> 
> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g90iia$2jc4$3@digitalmars.com...
> > Yigal Chripun wrote:
> >> a) people here said that a virtual call will make it slow. How much slow? how much of an overhead is it on modern hardware considering also that this is a place where hardware manufacturers spend time on optimizations?
> >
> > Virtual function calls have been a problem for hardware optimization. Direct function calls can be speculatively executed, but not virtual ones, because the hardware cannot predict where it will go. This means virtual calls can be much slower than direct function calls.
> 
> Modern x86 branch prediction treats indirect calls the same as conditional branches. They get a slot in the branch target buffer, so they do get speculatively executed. And if correctly predicted it's only a couple of cycles more costly direct calls.
> 
> See the thread "Feature Request: nontrivial functions and vtable optimizations" about 2 weeks ago.
> 
> I cited the technical docs and a few doubters ran benchmarks, which proved that virtual methods are not as evil as many people think. In fact they are no more evil than a conditional branch.

you're right. but direct calls don't speculate. they don't need speculation because they're direct jump. so they are loaded straight into the pipeline.

so walter was right but used the wrong term.
August 26, 2008
Jb Wrote:

> 
> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g90iia$2jc4$3@digitalmars.com...
> > Yigal Chripun wrote:
> >> a) people here said that a virtual call will make it slow. How much slow? how much of an overhead is it on modern hardware considering also that this is a place where hardware manufacturers spend time on optimizations?
> >
> > Virtual function calls have been a problem for hardware optimization. Direct function calls can be speculatively executed, but not virtual ones, because the hardware cannot predict where it will go. This means virtual calls can be much slower than direct function calls.
> 
> Modern x86 branch prediction treats indirect calls the same as conditional branches. They get a slot in the branch target buffer, so they do get speculatively executed. And if correctly predicted it's only a couple of cycles more costly direct calls.
> 
> See the thread "Feature Request: nontrivial functions and vtable optimizations" about 2 weeks ago.
> 
> I cited the technical docs and a few doubters ran benchmarks, which proved that virtual methods are not as evil as many people think. In fact they are no more evil than a conditional branch.

you're right. but direct calls don't speculate. they don't need speculation because they're direct jump. so they are loaded straight into the pipeline.

walt was right but used the wrong term.
August 26, 2008
"superdan" <super@dan.org> wrote in message news:g912vh$mbe$1@digitalmars.com...
> Jb Wrote:
>
>>
>> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g90iia$2jc4$3@digitalmars.com...
>> > Yigal Chripun wrote:
>> >> a) people here said that a virtual call will make it slow. How much
>> >> slow? how much of an overhead is it on modern hardware considering
>> >> also
>> >> that this is a place where hardware manufacturers spend time on
>> >> optimizations?
>> >
>> > Virtual function calls have been a problem for hardware optimization.
>> > Direct function calls can be speculatively executed, but not virtual
>> > ones,
>> > because the hardware cannot predict where it will go. This means
>> > virtual
>> > calls can be much slower than direct function calls.
>>
>> Modern x86 branch prediction treats indirect calls the same as
>> conditional
>> branches. They get a slot in the branch target buffer, so they do get
>> speculatively executed. And if correctly predicted it's only a couple of
>> cycles more costly direct calls.
>>
>> See the thread "Feature Request: nontrivial functions and vtable optimizations" about 2 weeks ago.
>>
>> I cited the technical docs and a few doubters ran benchmarks, which
>> proved
>> that virtual methods are not as evil as many people think. In fact they
>> are
>> no more evil than a conditional branch.
>
> you're right. but direct calls don't speculate. they don't need speculation because they're
>direct jump. so they are loaded straight into the pipeline.
>
> walt was right but used the wrong term.

Walter said "the hardware cannot predict where a virtual call will go".

It does in fact predict them, and speculatively execute them, and as pretty much any bechmark will show it gets it right the vast majority of the time. (On x86 anyway.)

That's what I was saying.


August 26, 2008
Jb wrote:
> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g90iia$2jc4$3@digitalmars.com...
>> Yigal Chripun wrote:
>>> a) people here said that a virtual call will make it slow. How much
>>> slow? how much of an overhead is it on modern hardware considering also
>>> that this is a place where hardware manufacturers spend time on
>>> optimizations?
>> Virtual function calls have been a problem for hardware optimization. Direct function calls can be speculatively executed, but not virtual ones, because the hardware cannot predict where it will go. This means virtual calls can be much slower than direct function calls.
> 
> Modern x86 branch prediction treats indirect calls the same as conditional branches. They get a slot in the branch target buffer, so they do get speculatively executed. And if correctly predicted it's only a couple of cycles more costly direct calls.
> 
> See the thread "Feature Request: nontrivial functions and vtable optimizations" about 2 weeks ago.
> 
> I cited the technical docs and a few doubters ran benchmarks, which proved that virtual methods are not as evil as many people think. In fact they are no more evil than a conditional branch.
> 
> 
> 
> 
> 

That's x86 hardware.  Try something like the ps3.  That systems has little or no cache.  They have to jump to the vtable which is in a totally different location from the class.  Note I'm not in the camp that things they should never be used on these ystems however I think you should use them smartly and profile profile profile.

One technique I've used in C++ to help improve things a little is to switch the vtable with one that's in the same location or close to the class.  The wrapper function looked something like this:

class A {...}

A a = new LocalVirualTable<A>(); //ie LocalVirualTable is a bolt-in template

However, performance only improved in cases where I could flush the cache.  It many cases it was slightly worse on a x86 so you had to try it, profile and see it it had a positive or negative effect in each case.  I imagine when you've got hundreds of these classes its simply more memory to process, so on a high cache system it can be inversely beneficial.  I never tried it on a ps3 so it might be more effective there.

-Joel
August 26, 2008
== Quote from Christopher Wright (dhasenan@gmail.com)'s article
> WRONG!
> Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
> for this linked list.

My mistake. Merge sort, qsort, and heap sort are all O(n log n) for any list type
that allows for efficient iteration (O(n) to go through a list of n elements, or
for heap sort, O(n log n)) and O(1) appending (or, for heap sort, O(log n)). So
even for a linked list, those three algorithms, which are probably the most common
sorting algorithms used, will still be efficient. Unless the person who wrote them
was braindead and used indexing to iterate rather than the class's defined opApply.
August 26, 2008
Christopher Wright Wrote:

> == Quote from Christopher Wright (dhasenan@gmail.com)'s article
> > WRONG!
> > Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
> > for this linked list.
> 
> My mistake. Merge sort, qsort, and heap sort are all O(n log n) for any list type
> that allows for efficient iteration (O(n) to go through a list of n elements, or
> for heap sort, O(n log n)) and O(1) appending (or, for heap sort, O(log n)). So
> even for a linked list, those three algorithms, which are probably the most common
> sorting algorithms used, will still be efficient. Unless the person who wrote them
> was braindead and used indexing to iterate rather than the class's defined opApply.

sigh. your mistake indeed. just not where you thot. quicksort needs random access fer the pivot. not fer iterating. quicksort can't guarantee good runtime if pivot is first element. actually any of first k elements. on a forward iterator quicksort does quadratic time if already sorted or almost sorted.