August 26, 2008
Walter Bright wrote:
> Yigal Chripun wrote:
>> a) people here said that a virtual call will make it slow. How much
>> slow? how much of an overhead is it on modern hardware considering also
>> that this is a place where hardware manufacturers spend time on
>> optimizations?
> 
> Virtual function calls have been a problem for hardware optimization. Direct function calls can be speculatively executed, but not virtual ones, because the hardware cannot predict where it will go. This means virtual calls can be much slower than direct function calls.

What about for software optimization?

I seem to remember reading something about the Objective-C compiler maybe six or eight months ago talking about some of its optimization techniques.

Obj-C uses a message-passing idiom, and all messages use dynamic dispatch, since the list of messages an object can receive is not fixed at compile-time.

If I remember correctly, this article said that the dynamic dispatch expense only had to be incurred once, upon the first invocation of each message type. After that, the address of the appropriate function was re-written in memory, so that it pointed directly to the correct code. No more dynamic dispatch. Although the message handlers aren't resolved until runtime, once invoked, they'll always use the same target.

Or some thing like that.

It was an interesting read. I'll see if I can find it.

--benji
August 26, 2008
superdan wrote:
> Christopher Wright Wrote:
> 
>> superdan wrote:
>>>> class LinkedList(T)
>>>> {
>>>> 	Node!(T) head;
>>>>
>>>> 	/** Gets the ith item of the list. Throws: SIGSEGV if i >= length of the list. Time complexity: O(N) for a list of length N. This operation is provided for completeness and not recommended for frequent use in large lists. */
>>>> 	T opIndex(int i)
>>>> 	{
>>>> 		auto current = head;
>>>> 		while (i)
>>>> 		{
>>>> 			current = current.next;
>>>> 		}
>>>> 		return current.value;
>>>> 	}
>>>> }
>>> oopsies. houston we got a problem here. problem is all that pluggable sort business works only if it can count on a constant time opIndex. why? because sort has right in its spec that it takes o(n log n) time. if u pass LinkedList to it you obtain a nonsensical design that compiles but runs afoul of the spec. because with that opIndex sort will run in quadratic time and no amount of commentin' is gonna save it from that.
>> WRONG!
>> Those sorting algorithms are correct. Their runtime is now O(n^2 log n) for this linked list.
> 
> you got the wrong definition of correct. sort became a non-scalable algo from a scalable algo. whacha gonna say next. bubble sort is viable?!?

No, YOU got the wrong definition of correct. "Correct" and "scalabale" are different words. As are "correct" and "viable". In Java, I've been known to index into linked lists... usually ones with ~5 elements, but I've done it.

>>> this design is a stillborn.
>>>
>>> what needs done is to allow different kinds of containers implement different interfaces. in fact a better way to factor things is via iterators, as stl has shown.
>> You didn't ask for an O(1) opIndex on a linked list. You asked for a correct opIndex on a linked list.
> 
> the correct opIndex runs in o(1).

No, the SCALABLE opIndex runs in O(1). The CORRECT opIndex can run in O(n^n) and still be correct.

>> Any sorting algorithm you name that would work on an array would also work on this linked list. Admittedly, insertion sort would be much faster than qsort, but if your library provides that, you, knowing that you are using a linked list, would choose the insertion sort algorithm.
> 
> no. the initial idea was for a design that allows that cool mixing and matching gig thru interfaces without knowing what is where. but ur design leads to a lot of unworkable mixing and matching.

Again, WORKABLE, just not SCALABLE. You should wrap your head around this concept, since it's been around for about 30 years now.

> it is a stillborn design.

Tell that to Java, the world's most used programming language for new projects. Or C#, the world's fastest growing programming language. Or Tango, one of D's standard libraries.

> my point was opIndex should not be written for a list to begin with.

Yes it should be. Here's a fairly good example: Say you have a GUI control that displays a list and allows the user to insert or remove items from the list. It also allows the user to double-click on an item at a given position. Looking up what position maps to what item is an opIndex. Would this problem be better solved using an array (Vector)? Maybe. Luckily, if you used a List interface throughout your code, you can change one line, and it'll work wither way.

> wrong. you only need to define your abstract types appropriately. e.g. stl defines forward and random iterators. a forward iterators has ++ but no []. random iterator has both. so a random iterator can be substituted for a forward iterator. but not the other way. bottom line, sort won't compile on a forward iterator. your design allowed it to compile. which makes the design wrong.

STL happens to be one design and one world-view. It's a good one, but it's not the only one. My main problem with the STL is that it takes longer to learn than the Java/.NET standard libraries -- and thus the cost of a programmer who knows it is higher. But there are language considerations in there too, and this is a topic for another day.

> btw my respect for tango improved when i found no opIndex in their list container.

http://www.dsource.org/projects/tango/browser/trunk/tango/util/collection/LinkSeq.d#L176
August 26, 2008
Robert Fraser wrote:

> superdan wrote:

>> btw my respect for tango improved when i found no opIndex in their list container.
> 
>
http://www.dsource.org/projects/tango/browser/trunk/tango/util/collection/LinkSeq.d#L176

This particular collection package is deprecated.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
August 26, 2008
Lars Ivar Igesund wrote:
> Robert Fraser wrote:
> 
>> superdan wrote:
> 
>>> btw my respect for tango improved when i found no opIndex in their list
>>> container.
>>
> http://www.dsource.org/projects/tango/browser/trunk/tango/util/collection/LinkSeq.d#L176
> 
> This particular collection package is deprecated.

The new package has this feature too:
http://www.dsource.org/projects/tango/browser/trunk/tango/util/container/Slink.d#L248

It's a good feature to have (I wouldn't consider a list class complete without it), it just shouldn't be abused.
August 26, 2008
Reply to Robert,

> BCS wrote:
> 
>> Reply to Benji,
>> 
>>> The new JSON parser in the Tango library operates on templated
>>> string arrays. If I want to read from a file or a socket, I have to
>>> first slurp the whole thing into a character array, even though the
>>> character-streaming would be more practical.
>>> 
>> Unless you are only going to parse the start of the file or are going
>> to
>> be throwing away most of it *while you parse it, not after* The best
>> way
>> to parse a file is to load it all in one OS system call and then run
>> a
>> slicing parser (like the Tango XML parser) on that.
>> One memory allocation and one load or a mmap, and then only the meta
>> structures get allocated later.
> There are cases where you might want to parse an XML file that won't
> fit easily in main memory. I think a stream processing SAX parser
> would be a good addition (perhaps not replacement for) the exiting
> one.
> 

If you can't fit the data file in memory the I find it hard to believe you will be able to hold the parsed file in memory. If you can program the parser to dump unneeded data on the fly or process and discard the data, that might make a difference.


August 26, 2008
BCS wrote:
> Reply to Robert,
> 
>> BCS wrote:
>>
>>> Reply to Benji,
>>>
>>>> The new JSON parser in the Tango library operates on templated
>>>> string arrays. If I want to read from a file or a socket, I have to
>>>> first slurp the whole thing into a character array, even though the
>>>> character-streaming would be more practical.
>>>>
>>> Unless you are only going to parse the start of the file or are going
>>> to
>>> be throwing away most of it *while you parse it, not after* The best
>>> way
>>> to parse a file is to load it all in one OS system call and then run
>>> a
>>> slicing parser (like the Tango XML parser) on that.
>>> One memory allocation and one load or a mmap, and then only the meta
>>> structures get allocated later.
>> There are cases where you might want to parse an XML file that won't
>> fit easily in main memory. I think a stream processing SAX parser
>> would be a good addition (perhaps not replacement for) the exiting
>> one.
>>
> 
> If you can't fit the data file in memory the I find it hard to believe you will be able to hold the parsed file in memory. If you can program the parser to dump unneeded data on the fly or process and discard the data, that might make a difference.

Well, for something like a DOM parser, it's pretty much impossible to parse a file that won't fit into memory. But a SAX parser doesn't actually create any objects. It just calls events, while processing XML data from a stream. A good SAX parser can operate without ever allocating anything on the heap, leaving the consumer to create any necessary objects from the parse process.

--benji
August 26, 2008
BCS wrote:
> Reply to Robert,
> 
>> BCS wrote:
>>
>>> Reply to Benji,
>>>
>>>> The new JSON parser in the Tango library operates on templated
>>>> string arrays. If I want to read from a file or a socket, I have to
>>>> first slurp the whole thing into a character array, even though the
>>>> character-streaming would be more practical.
>>>>
>>> Unless you are only going to parse the start of the file or are going
>>> to
>>> be throwing away most of it *while you parse it, not after* The best
>>> way
>>> to parse a file is to load it all in one OS system call and then run
>>> a
>>> slicing parser (like the Tango XML parser) on that.
>>> One memory allocation and one load or a mmap, and then only the meta
>>> structures get allocated later.
>> There are cases where you might want to parse an XML file that won't
>> fit easily in main memory. I think a stream processing SAX parser
>> would be a good addition (perhaps not replacement for) the exiting
>> one.
>>
> 
> If you can't fit the data file in memory the I find it hard to believe you will be able to hold the parsed file in memory. If you can program the parser to dump unneeded data on the fly or process and discard the data, that might make a difference.

I think that's one of the reasons to use a streaming parser -- so you can dump data on the fly.
August 26, 2008
"Robert Fraser" <fraserofthenight@gmail.com> wrote in message news:g91fva$1if4$1@digitalmars.com...
> Lars Ivar Igesund wrote:
>> Robert Fraser wrote:
>>
>>> superdan wrote:
>>
>>>> btw my respect for tango improved when i found no opIndex in their list container.
>>>
>> http://www.dsource.org/projects/tango/browser/trunk/tango/util/collection/LinkSeq.d#L176
>>
>> This particular collection package is deprecated.
>
> The new package has this feature too: http://www.dsource.org/projects/tango/browser/trunk/tango/util/container/Slink.d#L248
>
> It's a good feature to have (I wouldn't consider a list class complete without it), it just shouldn't be abused.

First, Slink is not really the public interface, it is the unit that LinkedList (and other containers) use to build linked lists.

Second, LinkedList implements a 'lookup by index', through the get function, but note that it is not implemented as an opIndex function.  an opIndex implies fast lookup (at least < O(n)).

I don't think these functions were intended to be used in sorting routines.

-Steve


August 26, 2008
Reply to Benji,

> BCS wrote:
> 
>> Reply to Robert,
>> 
>>> BCS wrote:
>>> 
>>>> Reply to Benji,
>>>> 
>>>>> The new JSON parser in the Tango library operates on templated
>>>>> string arrays. If I want to read from a file or a socket, I have
>>>>> to first slurp the whole thing into a character array, even though
>>>>> the character-streaming would be more practical.
>>>>> 
>>>> Unless you are only going to parse the start of the file or are
>>>> going
>>>> to
>>>> be throwing away most of it *while you parse it, not after* The
>>>> best
>>>> way
>>>> to parse a file is to load it all in one OS system call and then
>>>> run
>>>> a
>>>> slicing parser (like the Tango XML parser) on that.
>>>> One memory allocation and one load or a mmap, and then only the
>>>> meta
>>>> structures get allocated later.
>>> There are cases where you might want to parse an XML file that won't
>>> fit easily in main memory. I think a stream processing SAX parser
>>> would be a good addition (perhaps not replacement for) the exiting
>>> one.
>>> 
>> If you can't fit the data file in memory the I find it hard to
>> believe you will be able to hold the parsed file in memory. If you
>> can program the parser to dump unneeded data on the fly or process
>> and discard the data, that might make a difference.
>> 
> Well, for something like a DOM parser, it's pretty much impossible to
> parse a file that won't fit into memory. But a SAX parser doesn't
> actually create any objects. It just calls events, while processing
> XML data from a stream. A good SAX parser can operate without ever
> allocating anything on the heap, leaving the consumer to create any
> necessary objects from the parse process.
> 
> --benji
> 

Interesting, I've worked with parsers* that function something like that but never thought of them in that way. OTOH I can think of only very limited domain where this would be useful. If I needed to process that much data I'd load it into a database and go from there.

*In fact my parser generator could be used that way.


August 26, 2008
Robert Fraser Wrote:

> superdan wrote:
> > Christopher Wright Wrote:
> > 
> >> superdan wrote:
> >>>> class LinkedList(T)
> >>>> {
> >>>> 	Node!(T) head;
> >>>>
> >>>> 	/** Gets the ith item of the list. Throws: SIGSEGV if i >= length of
> >>>> the list. Time complexity: O(N) for a list of length N. This operation
> >>>> is provided for completeness and not recommended for frequent use in
> >>>> large lists. */
> >>>> 	T opIndex(int i)
> >>>> 	{
> >>>> 		auto current = head;
> >>>> 		while (i)
> >>>> 		{
> >>>> 			current = current.next;
> >>>> 		}
> >>>> 		return current.value;
> >>>> 	}
> >>>> }
> >>> oopsies. houston we got a problem here. problem is all that pluggable sort business works only if it can count on a constant time opIndex. why? because sort has right in its spec that it takes o(n log n) time. if u pass LinkedList to it you obtain a nonsensical design that compiles but runs afoul of the spec. because with that opIndex sort will run in quadratic time and no amount of commentin' is gonna save it from that.
> >> WRONG!
> >> Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
> >> for this linked list.
> > 
> > you got the wrong definition of correct. sort became a non-scalable algo from a scalable algo. whacha gonna say next. bubble sort is viable?!?
> 
> No, YOU got the wrong definition of correct. "Correct" and "scalabale" are different words. As are "correct" and "viable". In Java, I've been known to index into linked lists... usually ones with ~5 elements, but I've done it.

yeah. for starters my dictionary fails to list "scalabale".

> >>> this design is a stillborn.
> >>>
> >>> what needs done is to allow different kinds of containers implement different interfaces. in fact a better way to factor things is via iterators, as stl has shown.
> >> You didn't ask for an O(1) opIndex on a linked list. You asked for a correct opIndex on a linked list.
> > 
> > the correct opIndex runs in o(1).
> 
> No, the SCALABLE opIndex runs in O(1). The CORRECT opIndex can run in O(n^n) and still be correct.

stepanov has shown that for composable operations the complexity must be part of the specification. otherwise composition easily leads to high-order polynomial that fail to terminate in reasonable time. opIndex is an indexing operator expected to run in constant time, and algorithms rely on that. so no. opIndex running in o(n^n) is incorrect because it fails it spec.

> >> Any sorting algorithm you name that would work on an array would also work on this linked list. Admittedly, insertion sort would be much faster than qsort, but if your library provides that, you, knowing that you are using a linked list, would choose the insertion sort algorithm.
> > 
> > no. the initial idea was for a design that allows that cool mixing and matching gig thru interfaces without knowing what is where. but ur design leads to a lot of unworkable mixing and matching.
> 
> Again, WORKABLE, just not SCALABLE. You should wrap your head around this concept, since it's been around for about 30 years now.

guess would do you good to entertain just for a second the idea that i know what i'm talking about and you don't get me.

> > it is a stillborn design.
> 
> Tell that to Java, the world's most used programming language for new projects. Or C#, the world's fastest growing programming language. Or Tango, one of D's standard libraries.

here you hint you don't understand what i'm talking about indeed. neither of java, c#, or tango define a[n] to run in o(n). they define named functions, which i'm perfectly fine with.

> > my point was opIndex should not be written for a list to begin with.
> 
> Yes it should be. Here's a fairly good example: Say you have a GUI control that displays a list and allows the user to insert or remove items from the list. It also allows the user to double-click on an item at a given position. Looking up what position maps to what item is an opIndex. Would this problem be better solved using an array (Vector)? Maybe. Luckily, if you used a List interface throughout your code, you can change one line, and it'll work wither way.

funny you should mention that. window manager in windows 3.1 worked exactly like that. users noticed that the more windows the opened, the longer it took to open a new window. with new systems and more memory people would have many windows. before long this became a big issue. windows 95 fixed that.

never misunderestimate scalability.

> > wrong. you only need to define your abstract types appropriately. e.g. stl defines forward and random iterators. a forward iterators has ++ but no []. random iterator has both. so a random iterator can be substituted for a forward iterator. but not the other way. bottom line, sort won't compile on a forward iterator. your design allowed it to compile. which makes the design wrong.
> 
> STL happens to be one design and one world-view. It's a good one, but it's not the only one. My main problem with the STL is that it takes longer to learn than the Java/.NET standard libraries -- and thus the cost of a programmer who knows it is higher. But there are language considerations in there too, and this is a topic for another day.

cool. don't see how this all relates to the problem at hand.

> > btw my respect for tango improved when i found no opIndex in their list container.
> 
> http://www.dsource.org/projects/tango/browser/trunk/tango/util/collection/LinkSeq.d#L176

here you gently provide irrefutable proof you don't get what i'm sayin'. schveiguy did. the page fails to list opIndex. there is a function called 'get'. better yet. the new package lists a function 'nth' suggesting a linear walk to the nth element. good job tango fellas.