August 27, 2008
Walter Bright:

>In my experience with such programs, disabling the collection cycles brought the speed up to par.<

In my experience there's some difference still.
The usual disclaimer: benchmarks are tricky things, so anyone is invited to spot problems in my code.


A very simple benchmark:

// D without GC
import std.gc: disable;
void main() {
    int[int] d;
    disable();
    for (int i; i < 10_000_000; ++i)
        d[i] = 0;
}


# Python+Psyco without GC
from gc import disable
def main():
    d = {}
    disable()
    for i in xrange(10000000):
        d[i] = 0
import psyco; psyco.full()
main()


hash without GC, n = 10_000_000:
  D:     9.12 s
  Psyco: 1.45 s

hash2 with GC, n = 10_000_000:
  D:     9.80 s
  Psyco: 1.46 s


If Psyco isn't used the Python version without GC requires 2.02 seconds. This means the 2.02 - 1.45 = 0.57 s are needed by the Python virtual machine just to run those 10_000_000 loops :-)

Warms tests, best of 3, tests performed with Python 2.5.2, Psyco 1.6, on Win XP, and the last DMD with -O -release -inline.

Python integers are objects, rather bigger than 4 bytes, and they can grow "naturally" to become multi-precision integers:

>>> a = 2147483647
>>> a
2147483647
>>> a + 1
2147483648L
>>> type(a)
<type 'int'>
>>> type(a + 1)
<type 'long'>
>>> type(7 ** 5)
<type 'int'>
>>> type(7 ** 55)
<type 'long'>

Bye,
bearophile
August 27, 2008
superdan wrote:
> Christopher Wright Wrote:
> 
>> == Quote from Christopher Wright (dhasenan@gmail.com)'s article
>>> WRONG!
>>> Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
>>> for this linked list.
>> My mistake. Merge sort, qsort, and heap sort are all O(n log n) for any list type
>> that allows for efficient iteration (O(n) to go through a list of n elements, or
>> for heap sort, O(n log n)) and O(1) appending (or, for heap sort, O(log n)). So
>> even for a linked list, those three algorithms, which are probably the most common
>> sorting algorithms used, will still be efficient. Unless the person who wrote them
>> was braindead and used indexing to iterate rather than the class's defined opApply.
> 
> sigh. your mistake indeed. just not where you thot. quicksort needs random access fer the pivot. not fer iterating. quicksort can't guarantee good runtime if pivot is first element. actually any of first k elements. on a forward iterator quicksort does quadratic time if already sorted or almost sorted.

You need to pick a random pivot in order to guarantee that runtime, in fact. And you can do that in linear time, and you're doing a linear scan through the elements anyway, so you get the same asymptotic time. It's going to double your runtime at worst, if you chose a poor datastructure for the task.
August 27, 2008
bearophile wrote:
> Walter Bright:
> 
>> In my experience with such programs, disabling the collection
>> cycles brought the speed up to par.<
> 
> In my experience there's some difference still. The usual disclaimer:
> benchmarks are tricky things, so anyone is invited to spot problems
> in my code.

I invite you to look at the code in internal/aaA.d and do some testing!
August 27, 2008
BCS wrote:
> Reply to Nick,
> 
>> "bearophile" <bearophileHUGS@lycos.com> wrote in message
>> news:g8vmda$sd4$1@digitalmars.com...
>>
>>> BCS:
>>>
>>>> If you must have that sort of interface, pick a different language,
>>>> because D isn't intended to work that way.
>>>>
>>> I suggest Benji to try C# 3+, despite all the problems it has and the
>>> borg-like nature of such software, etc, it will be used way more than
>>> D, and it has all the nice things Benji asks for.
>>>
>> (pet peeve) As much as there is that I like about C#, the lack of an
>> IArithmetic or operator constrains tends to gimp its template system
>> in a number of cases.
>>
> 
> C# generics are *Crippled*. They more or less do nothing but map types around.

Yes, but oh, the syntax!
August 27, 2008
Christopher Wright Wrote:

> superdan wrote:
> > Christopher Wright Wrote:
> > 
> >> == Quote from Christopher Wright (dhasenan@gmail.com)'s article
> >>> WRONG!
> >>> Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
> >>> for this linked list.
> >> My mistake. Merge sort, qsort, and heap sort are all O(n log n) for any list type
> >> that allows for efficient iteration (O(n) to go through a list of n elements, or
> >> for heap sort, O(n log n)) and O(1) appending (or, for heap sort, O(log n)). So
> >> even for a linked list, those three algorithms, which are probably the most common
> >> sorting algorithms used, will still be efficient. Unless the person who wrote them
> >> was braindead and used indexing to iterate rather than the class's defined opApply.
> > 
> > sigh. your mistake indeed. just not where you thot. quicksort needs random access fer the pivot. not fer iterating. quicksort can't guarantee good runtime if pivot is first element. actually any of first k elements. on a forward iterator quicksort does quadratic time if already sorted or almost sorted.
> 
> You need to pick a random pivot in order to guarantee that runtime, in fact. And you can do that in linear time, and you're doing a linear scan through the elements anyway, so you get the same asymptotic time. It's going to double your runtime at worst, if you chose a poor datastructure for the task.

damn man you're right. yeah it's still o(n log n). i was wrong. 'pologies.
August 27, 2008
On 2008-08-26 13:25:40 -0400, Benji Smith <dlanguage@benjismith.net> said:

> I seem to remember reading something about the Objective-C compiler maybe six or eight months ago talking about some of its optimization techniques.
> 
> Obj-C uses a message-passing idiom, and all messages use dynamic dispatch, since the list of messages an object can receive is not fixed at compile-time.
> 
> If I remember correctly, this article said that the dynamic dispatch expense only had to be incurred once, upon the first invocation of each message type. After that, the address of the appropriate function was re-written in memory, so that it pointed directly to the correct code. No more dynamic dispatch. Although the message handlers aren't resolved until runtime, once invoked, they'll always use the same target.
> 
> Or some thing like that.
> 
> It was an interesting read. I'll see if I can find it.

Hum, I believe you're talking about the cache for method calls. What Objective-C does is that it caches methods by selector in a lookup table. There is one such table for each class, and it gets populated as methods are called on that class.

Once a method is in the cache, it's very efficient to find where to branch: you take the selector's pointer and apply the mask to get value n, then branch on the method pointer from the nth bucket in the table.

All messages are passed by calling the objc_msgSend function. Here's how you can implement some of that in D:

	id objc_msgSend(id, SEL, ...) {
		auto n = cast(uint)SEL & id.isa.cache.mask;
		auto func = cast(id function(id, SEL, ...))id.isa.cache.buckets[n];
		if (func != null) {
			<set instruction pointer to func>
			// never returns, the function pointed by func returns instead
		}
		<find func pointer by other means, fill cache, etc.>
	}

I've read somewhere that it's almost as fast as virtual functions. While I haven't verified that, it's much more flexible: you can add functions at runtime to any class. That's how Objective-C allows you to add methods to classes you do not control and you can still override them in derived classes.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

August 27, 2008
"superdan" <super@dan.org> wrote in message news:g921nb$2qqq$1@digitalmars.com...
> Nick Sabalausky Wrote:
>
>> "Denis Koroskin" <2korden@gmail.com> wrote in message news:op.ugie28dxo7cclz@proton.creatstudio.intranet...
>> > On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super@dan.org> wrote: [snip]
>> >>> The D spec certainly doesn't make any guarantees about
>> >>> the time/memory complexity of opIndex; it's up to the implementing
>> >>> class
>> >>> to do so.
>> >>
>> >> it don't indeed. it should. that's a problem with the spec.
>> >>
>> >
>> > I agree. You can't rely on function invokation, i.e. the following
>> > might
>> > be slow as death:
>> >
>> > auto n = collection.at(i);
>> > auto len = collection.length();
>> >
>> > but index operations and properties getters should be real-time and
>> > have
>> > O(1) complexity by design.
>> >
>> > auto n = collection[i];
>> > auto len = collection.length;
>>
>> I disagree. That strategy strikes me as a very clear example of breaking
>> encapsulation by having implementation details dictate certain aspects of
>> the API. At the very least, that will make the API overly rigid,
>> hindering
>> future changes that could otherwise have been non-breaking,
>> behind-the-scenes changes.
>
> take this:
>
> auto c = new Customer;
> c.loadCustomerInfo("123-12-1234");
>
> that's all cool. there's no performance guarantee other than some best effort kinda thing `we won't sabotage things'. if you come with faster or slower methods to load customers, no problem. coz noone assumes any.
>
> now take sort. sort says. my input is a range that supports indexing and swapping independent of the range size. if you don't have that just let me know and i'll use a totally different method. just don't pretend.
>

Choosing a sort method is a separate task from the actual sorting. Any sort that expects to be able to index will still work correctly if given a collection that has O(n) or worse indexing, just as it will still work correctly if given an O(n) or worse comparison delegate. It's up to the caller of the sort function to know that an O(n log n) sort (for instance) is only O(n log n) if the indexing and comparison are both O(1). And then it's up to them to decide if they still want to send it a linked list or a complex comparison. The sort shouldn't crap out at compile-time (or runtime) just because some novice might not know that doing a generalized bubble sort on a linked list scales poorly.

If you want automatic choosing of an appropriate sort algorithm (which is certainly a good thing to have), then that can be done at a separate, optional, level of abstraction using function overloading, template specialization, RTTI, etc. That way you're not imposing arbitrary restrictions on anyone.

> with that indexing and swapping complexity ain't implementation detail. they're part of the spec. guess stepanov's main contribution was to clarify that.
>

When I called indexing an implementation detail, I was referring to the collection itself. The method of indexing *is* an implementation detail of the collection. It should not be considered an implementation detail of the sort algorithm since it's encapsulated in the collection and thus hidden away from the sort algorithm. If you make a rule that collections with cheap indexing are indexed via opIndex and collections with expensive indexing are indexed via a function, then you've just defined the API in terms of the collection's implementation (and introduced an unnecessary inconsistency into the API).

If a sort function is desired that only accepts collections with O(1) indexing, then that can be accomplished at a higher level of abstraction (using function overloading, RTTI, etc.) without getting in the way when such a guarantee is not needed.

>> For realtime code, I can see the benefit to what you're saying. Although
>> in
>> many cases only part of a program needs to be realtime, and for the rest
>> of
>> the program's code I'd hate to have to sacrifice the encapsulation
>> benefits.
>
> realtime has nothin' to do with it.

For code that needs to run in realtime, I agree with Denis Koroskin that it could be helpful to be able to look at a piece of code and have some sort of guarantee that there is no behind-the-scenes overloading going on that is any more complex than the operators' default behaviors. But for code that doesn't need to finish within a maximum amount of time, that becomes less important and the encapsulation/syntactic-consistency gained from the use of such things becomes a more worthy pursuit. That's what I was saying about realtime.

> encapsulation ain't broken by making complexity part of the reqs. any more than any req ain't breakin' encapsulation. if it looks like a problem then encapsulation was misdesigned and needs change.
>
> case in point. all containers should provide 'nth' say is it's o(n) or better. then there's a subclass of container that is indexed_container. that provides opIndex and says it's o(log n) or better. it also provides 'nth' by just forwarding to opIndex. faster than o(n) ain't a problem.
>
> but forcing a list to blurt something for opIndex - that's just bad design.

I agree that not all collections should implement an opIndex. Anything without a natural sequence or mapping should lack opIndex (such as a tree or graph). But forcing the user of a collection that *does* have a natural sequence (like a linked list) to use function-call-syntax instead of standard indexing-syntax just because the collection is implemented in a way that causes indexing to be less scalable than other collections - that's bad design.

The way I see it, "group[n]" means "Get the nth element of group". Not "Get the element at location group.ptr + (n * sizeof(group_base_type)) or something else that's just as scalable." In plain C, those are one and the same. But when you start talking about generic collections, encapsulation and interface versus implementation, they are very different: the former is interface, the latter is implementation.


August 27, 2008
Walter Bright wrote:
> Jb wrote:
>> Walter said "the hardware cannot predict where a virtual call will go".
>>
>> It does in fact predict them, and speculatively execute them, and as pretty much any bechmark will show it gets it right the vast majority of the time. (On x86 anyway.)
>>
>> That's what I was saying. 
> 
> Looks like I keep falling behind on what modern CPUs are doing :-(
> 
> In any case, throughout all the revolutions in how CPUs work, there have been a few invariants that hold true well enough as an optimization guide:
> 
> 1. fewer instructions ==> faster execution
> 2. fewer memory accesses ==> faster execution
> 3. fewer conditional branches ==> faster execution

Also you can't inline virtual calls (well a smart compiler could but that's another discussion).   That means the compiler can't optimize so well but removing unnecessary operations.

-Joel
August 27, 2008
Benji Smith Wrote:

> superdan wrote:
> > now take sort. sort says. my input is a range that supports indexing and swapping independent of the range size. if you don't have that just let me know and i'll use a totally different method. just don't pretend.
> > 
> > with that indexing and swapping complexity ain't implementation detail. they're part of the spec. guess stepanov's main contribution was to clarify that.
> 
> The other variable cost operation of a sort is the element comparison. Even if indexing and swapping are O(1), the cost of a comparison between two elements might be O(m), where m is proportional to the size of the elements themselves.
> 
> And since a typical sort algorithm will perform n log n comparisons, the cost of the comparison has to be factored into the total cost.
> 
> The performance of sorting...say, an array of strings based on a locale-specific collation...could be an expensive operation, if the strings themselves are really long. But that wouldn't make the implementation incorrect, and I'm always glad when a sorting implementation provides a way of passing a custom comparison delegate into the sort routine.

good points. i only know of one trick to save on comparisons. it's that -1/0/1 comparison. you compare once and get info on less/equal/greater. that cuts comparisons in half. too bad std.algorithm don't use it.

then i moseyed 'round std.algorithm and saw all that schwartz xform business. not sure i grokked it. does it have to do with saving on comparisons?
August 27, 2008
Nick Sabalausky Wrote:

> "superdan" <super@dan.org> wrote in message news:g921nb$2qqq$1@digitalmars.com...
> > Nick Sabalausky Wrote:
> >
> >> "Denis Koroskin" <2korden@gmail.com> wrote in message news:op.ugie28dxo7cclz@proton.creatstudio.intranet...
> >> > On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super@dan.org> wrote: [snip]
> >> >>> The D spec certainly doesn't make any guarantees about
> >> >>> the time/memory complexity of opIndex; it's up to the implementing
> >> >>> class
> >> >>> to do so.
> >> >>
> >> >> it don't indeed. it should. that's a problem with the spec.
> >> >>
> >> >
> >> > I agree. You can't rely on function invokation, i.e. the following
> >> > might
> >> > be slow as death:
> >> >
> >> > auto n = collection.at(i);
> >> > auto len = collection.length();
> >> >
> >> > but index operations and properties getters should be real-time and
> >> > have
> >> > O(1) complexity by design.
> >> >
> >> > auto n = collection[i];
> >> > auto len = collection.length;
> >>
> >> I disagree. That strategy strikes me as a very clear example of breaking
> >> encapsulation by having implementation details dictate certain aspects of
> >> the API. At the very least, that will make the API overly rigid,
> >> hindering
> >> future changes that could otherwise have been non-breaking,
> >> behind-the-scenes changes.
> >
> > take this:
> >
> > auto c = new Customer;
> > c.loadCustomerInfo("123-12-1234");
> >
> > that's all cool. there's no performance guarantee other than some best effort kinda thing `we won't sabotage things'. if you come with faster or slower methods to load customers, no problem. coz noone assumes any.
> >
> > now take sort. sort says. my input is a range that supports indexing and swapping independent of the range size. if you don't have that just let me know and i'll use a totally different method. just don't pretend.
> >
> 
> Choosing a sort method is a separate task from the actual sorting.

thot you were the one big on abstraction and encapsulation and all those good things. as a user i want to sort stuff. i let the library choose what's best for the collection at hand.

sort(stuff)
{
    1) figure out best algo for stuff
    2) have at it
}

i don't want to make that decision outside. for example i like one sort routine. not quicksort, heapsort, quicksort_with_median_of_5, or god forbid bubblesort.

> Any sort that expects to be able to index will still work correctly if given a collection that has O(n) or worse indexing, just as it will still work correctly if given an O(n) or worse comparison delegate.

i disagree but now that christopher gave me a black eye guess i have to shut up.

> It's up to the caller of the sort function to know that an O(n log n) sort (for instance) is only O(n log n) if the indexing and comparison are both O(1). And then it's up to them to decide if they still want to send it a linked list or a complex comparison. The sort shouldn't crap out at compile-time (or runtime) just because some novice might not know that doing a generalized bubble sort on a linked list scales poorly.

it should coz there's an obvious good choice. there's no good tradeoff in that. there never will be a case to call the bad sort on the bad range.

> If you want automatic choosing of an appropriate sort algorithm (which is certainly a good thing to have), then that can be done at a separate, optional, level of abstraction using function overloading, template specialization, RTTI, etc. That way you're not imposing arbitrary restrictions on anyone.

i think u missed a lil point i was making.

All collections: implement nth()
Indexable collections: implement opIndex

is all. there is no restriction. just use nth.

> > with that indexing and swapping complexity ain't implementation detail. they're part of the spec. guess stepanov's main contribution was to clarify that.
> >
> 
> When I called indexing an implementation detail, I was referring to the collection itself. The method of indexing *is* an implementation detail of the collection.

not when it gets composed in higher level ops.

> It should not be considered an implementation detail of the sort algorithm since it's encapsulated in the collection and thus hidden away from the sort algorithm. If you make a rule that collections with cheap indexing are indexed via opIndex and collections with expensive indexing are indexed via a function, then you've just defined the API in terms of the collection's implementation (and introduced an unnecessary inconsistency into the API).

no. there is consistency. nth() is consistent across - o(n) or better indexing.

i think u have it wrong when u think of "cheap" as if it were "fewer machine instructions". no. it's about asymptotic complexity and that does matter.

> If a sort function is desired that only accepts collections with O(1) indexing, then that can be accomplished at a higher level of abstraction (using function overloading, RTTI, etc.) without getting in the way when such a guarantee is not needed.

exactly. nth() and opIndex() fit the bill. what's there not to love?

> >> For realtime code, I can see the benefit to what you're saying. Although
> >> in
> >> many cases only part of a program needs to be realtime, and for the rest
> >> of
> >> the program's code I'd hate to have to sacrifice the encapsulation
> >> benefits.
> >
> > realtime has nothin' to do with it.
> 
> For code that needs to run in realtime, I agree with Denis Koroskin that it could be helpful to be able to look at a piece of code and have some sort of guarantee that there is no behind-the-scenes overloading going on that is any more complex than the operators' default behaviors. But for code that doesn't need to finish within a maximum amount of time, that becomes less important and the encapsulation/syntactic-consistency gained from the use of such things becomes a more worthy pursuit. That's what I was saying about realtime.

i disagree but am in a rush now. guess i can't convince u.

> > encapsulation ain't broken by making complexity part of the reqs. any more than any req ain't breakin' encapsulation. if it looks like a problem then encapsulation was misdesigned and needs change.
> >
> > case in point. all containers should provide 'nth' say is it's o(n) or better. then there's a subclass of container that is indexed_container. that provides opIndex and says it's o(log n) or better. it also provides 'nth' by just forwarding to opIndex. faster than o(n) ain't a problem.
> >
> > but forcing a list to blurt something for opIndex - that's just bad design.
> 
> I agree that not all collections should implement an opIndex. Anything without a natural sequence or mapping should lack opIndex (such as a tree or graph). But forcing the user of a collection that *does* have a natural sequence (like a linked list) to use function-call-syntax instead of standard indexing-syntax just because the collection is implemented in a way that causes indexing to be less scalable than other collections - that's bad design.

no. it's great design. because it's not lyin'. you want o(1) indexing you say a[n]. you are ok with o(n) indexing you say a.nth(n). this is how generic code works, with consistent notation. not with lyin'.

> The way I see it, "group[n]" means "Get the nth element of group". Not "Get the element at location group.ptr + (n * sizeof(group_base_type)) or something else that's just as scalable."

no need to get that low. just say o(1) and understand o(1) has nothing to do with the count of assembler ops.

> In plain C, those are one and the same. But when you start talking about generic collections, encapsulation and interface versus implementation, they are very different: the former is interface, the latter is implementation.

so now would u say stl has a poor design? because it's all about stuff that you consider badly designed.