April 27, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | dsimcha wrote:
[snip]
>
> Output:
> Direct: 2343
> Virtual: 5695
> opApply: 3014
>
> Bottom line is that these pretty much come in the order you would expect them to,
> but there are no particularly drastic differences between any of them. To put
> these timings in perspective, 5700 ms for 1 billion iterations is roughly (on a
> 2.7 GHz machine) 15 clock cycles per iteration. How often does anyone really have
> code that is performance critical *and* where the contents of the loop don't take
> long enough to dwarf the 15 clock cycles per iteration loop overhead *and* you
> need the iteration to be polymorphic?
I edited this code to work with ldc (D1) + Tango, and saw the Direct and opApply cases generate identical code (inc, cmp, jne, with the loop counter in a register) [1], so they're equally fast (modulo process scheduling randomness).
Virtual was roughly 10 times slower on my machine. (with ldc)
Unfortunately, I can't directly compare timings between ldc and dmd directly because dmd is likely at a disadvantage due to being 32-bit in a 64-bit world.
Although... the Virtual case takes about equal time with ldc- and dmd-compiled code, so maybe the slowness of Direct/dmd when compared to Direct/ldc (the dmd code is a factor 3 slower) is due to it apparently not register-allocating the loop variable.
The opApply case was another factor 2 slower than Direct with dmd on my machine, probably because opApply and the loop body don't get inlined.
It seems gdc is the only compiler to realize the first loop can be completely optimized away. It's again about equally fast for Virtual, but for opApply it's roughly a factor 3 slower than ldc; it seem to inline only opApply itself, not the loop body.
[1]: -O3 -release (with inlining), x86_64
| |||
April 27, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu wrote:
> Georg Wrede wrote:
>> There's an illusion. And that illusion comes from the D newsgroups having "wrong" names.
>> The D2 newsgroup should have a name like "D2 discussion -- for D language development folks, enter at your own risk". And a *D1* newsgroup should then be for anybody who actually uses the language for something. Currently, actually, the D.learn newsgroup has sort-of assumed this functionality.
>
> Yes, well put. I think it would be great to define a digitalmars.d2 or digitalmars.d2-design newsgroup.
And what would we do the day D987 came out? Start another group called
digitalmars.d987-design? Or just create one called digitalmars.d.design,
and use that no matter the version of D being discussed?
--
Simen
| |||
April 27, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Simen Kjaeraas | Simen Kjaeraas wrote:
> Andrei Alexandrescu wrote:
>
>> Georg Wrede wrote:
>>> There's an illusion. And that illusion comes from the D newsgroups having "wrong" names.
>>> The D2 newsgroup should have a name like "D2 discussion -- for D language development folks, enter at your own risk". And a *D1* newsgroup should then be for anybody who actually uses the language for something. Currently, actually, the D.learn newsgroup has sort-of assumed this functionality.
>>
>> Yes, well put. I think it would be great to define a digitalmars.d2 or digitalmars.d2-design newsgroup.
>
> And what would we do the day D987 came out? Start another group called
> digitalmars.d987-design? Or just create one called digitalmars.d.design,
> and use that no matter the version of D being discussed?
>
> --
> Simen
How about digitalmars.dnext?
| |||
April 27, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Christopher Wright | On Tue, 28 Apr 2009 01:38:27 +0400, Christopher Wright <dhasenan@gmail.com> wrote:
> Simen Kjaeraas wrote:
>> Andrei Alexandrescu wrote:
>>
>>> Georg Wrede wrote:
>>>> There's an illusion. And that illusion comes from the D newsgroups having "wrong" names.
>>>> The D2 newsgroup should have a name like "D2 discussion -- for D language development folks, enter at your own risk". And a *D1* newsgroup should then be for anybody who actually uses the language for something. Currently, actually, the D.learn newsgroup has sort-of assumed this functionality.
>>>
>>> Yes, well put. I think it would be great to define a digitalmars.d2 or digitalmars.d2-design newsgroup.
>> And what would we do the day D987 came out? Start another group called
>> digitalmars.d987-design? Or just create one called digitalmars.d.design,
>> and use that no matter the version of D being discussed?
>> -- Simen
>
> How about digitalmars.dnext?
Maybe digitalmars.future? :)
| |||
April 28, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Frits van Bommel | On 2009-04-27 10:51:22 -0400, Frits van Bommel <fvbommel@REMwOVExCAPSs.nl> said: > I edited this code to work with ldc (D1) + Tango, and saw the Direct and opApply cases generate identical code (inc, cmp, jne, with the loop counter in a register) [1], so they're equally fast (modulo process scheduling randomness). Thank you for your timings. I think it shows my point: that by prefering ranges over opApply we're just optimising around a deficiency in DMD's optimizer. I'm thinking... with proper inlining, perhaps we could take the notion of ranges out of the compiler and just define a generic opApply in std.range that use front, popFront, and empty. :-) Perhaps. -- Michel Fortin michel.fortin@michelf.com http://michelf.com/ | |||
April 28, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Michel Fortin | Michel Fortin wrote: > On 2009-04-27 10:51:22 -0400, Frits van Bommel <fvbommel@REMwOVExCAPSs.nl> said: > >> I edited this code to work with ldc (D1) + Tango, and saw the Direct >> and opApply cases generate identical code (inc, cmp, jne, with the >> loop counter in a register) [1], so they're equally fast (modulo >> process scheduling randomness). > > Thank you for your timings. I think it shows my point: that by prefering ranges over opApply we're just optimising around a deficiency in DMD's optimizer. Not true. Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation. > I'm thinking... with proper inlining, perhaps we could take the notion of ranges out of the compiler and just define a generic opApply in std.range that use front, popFront, and empty. :-) Perhaps. I suspect supporting ranges is just much easier. -- Daniel | |||
April 28, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Keep | On 2009-04-28 06:33:13 -0400, Daniel Keep <daniel.keep.lists@gmail.com> said: > Michel Fortin wrote: >> On 2009-04-27 10:51:22 -0400, Frits van Bommel >> <fvbommel@REMwOVExCAPSs.nl> said: >> >>> I edited this code to work with ldc (D1) + Tango, and saw the Direct >>> and opApply cases generate identical code (inc, cmp, jne, with the >>> loop counter in a register) [1], so they're equally fast (modulo >>> process scheduling randomness). >> >> Thank you for your timings. I think it shows my point: that by prefering >> ranges over opApply we're just optimising around a deficiency in DMD's >> optimizer. > > Not true. Here's an excellent reason to use ranges over opApply: you > can't define zip with opApply. Because opApply uses inversion of > control, you can't use more than one without bringing threads into the > equation. I guess I removed too much context from the above posts. We're just timing various foreach implementations. You're right when you say ranges are more versatile than opApply, and I'm all for keeping both ranges and opApply. I just want the compiler to prefer opApply over ranges when both are available when generating code for foreach, since with opApply you sometime can optimize things in a way that that you can't with ranges. >> I'm thinking... with proper inlining, perhaps we could take the notion >> of ranges out of the compiler and just define a generic opApply in >> std.range that use front, popFront, and empty. :-) Perhaps. > > I suspect supporting ranges is just much easier. Sure, especially since they're already implemented in the compiler. Inlining of delegates known at compile-time would have a greater reach though. -- Michel Fortin michel.fortin@michelf.com http://michelf.com/ | |||
April 28, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Keep | Daniel Keep wrote: > > Michel Fortin wrote: >> On 2009-04-27 10:51:22 -0400, Frits van Bommel <fvbommel@REMwOVExCAPSs.nl> said: >> >>> I edited this code to work with ldc (D1) + Tango, and saw the Direct >>> and opApply cases generate identical code (inc, cmp, jne, with the >>> loop counter in a register) [1], so they're equally fast (modulo >>> process scheduling randomness). >> Thank you for your timings. I think it shows my point: that by prefering ranges over opApply we're just optimising around a deficiency in DMD's optimizer. > > Not true. Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation. > Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead. >> I'm thinking... with proper inlining, perhaps we could take the notion of ranges out of the compiler and just define a generic opApply in std.range that use front, popFront, and empty. :-) Perhaps. > > I suspect supporting ranges is just much easier. > > -- Daniel | |||
April 28, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | On Tue, 28 Apr 2009 07:35:22 -0400, downs <default_357-line@yahoo.de> wrote:
> Daniel Keep wrote:
>>
>> Michel Fortin wrote:
>>> On 2009-04-27 10:51:22 -0400, Frits van Bommel
>>> <fvbommel@REMwOVExCAPSs.nl> said:
>>>
>>>> I edited this code to work with ldc (D1) + Tango, and saw the Direct
>>>> and opApply cases generate identical code (inc, cmp, jne, with the
>>>> loop counter in a register) [1], so they're equally fast (modulo
>>>> process scheduling randomness).
>>> Thank you for your timings. I think it shows my point: that by prefering
>>> ranges over opApply we're just optimising around a deficiency in DMD's
>>> optimizer.
>>
>> Not true. Here's an excellent reason to use ranges over opApply: you
>> can't define zip with opApply. Because opApply uses inversion of
>> control, you can't use more than one without bringing threads into the
>> equation.
>>
>
> Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.
read: less overhead than full threads, not less overhead than ranges ;)
-Steve
| |||
April 28, 2009 Phobos2: zip, integral ranges, map, Any, All, Map | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Keep | Some more notes about Phobos2. Some of the things I say may be wrong because my experience with Phobos2 is limited still. Daniel Keep: > Not true. Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation. I'll try to zip two ranges that return the leaves of two different binary trees. This can be a simple example to show to attract people to D2 language :-) So I've tried using zip: import std.range: zip; import std.stdio: writeln; void main() { auto a = [1, 2, 3]; string[] b = ["a", "b", "c"]; foreach (xy; zip(a, b)) writeln(xy.at!(0), " ", xy.at!(1)); } That doesn't work: ...\dmd\src\phobos\std\range.d(1847): Error: template instance Zip!(int[3u],immutable(char)[][]) does not match template declaration Zip(R...) if (R.length && allSatisfy!(isInputRange,R)) probably because 'a' is a static array. Is isInputRange false for static arrays? ------------ The most basic and useful range is the one of integer numbers. How can I create with Phobos2 lazy and eager ranges like the following ones? >>> range(1, 5) [1, 2, 3, 4] >>> range(5, 1, -1) [5, 4, 3, 2] >>> list(xrange(5, 10)) [5, 6, 7, 8, 9] >>> list(xrange(5, 10, 2)) [5, 7, 9] Similar ranges are useful with map,zip, and in many other situations. (I hope the x..y range syntax of D2 foreach will evolve in a syntax that can be used everywhere an integral range can be used). ------------ The docs say about "map": >Multiple functions can be passed to map. In that case, the element type of map is a tuple containing one element for each function.< > [...] > foreach (e; map!("a + a", "a * a")(arr1)) Having the possibility to map two functions in parallel may be useful in some situations (where for example computing the items is costly) but quite more useful can be to map a function that takes two or more arguments. An example: >>> map(lambda x, y: x*y, ["a", "b", "c"], xrange(1, 4)) ['a', 'bb', 'ccc'] If you don't have that you are forced to use a zip inside the map, but then you also are forced to change the D2 mapping function to something like: (p){ return p.at!(0) * p.at!(1); } ---------- all() and any() functions are useful to test if all or any items of an iterable are true (with optinal mapping function too). They are useful as templates too, so I suggest to rename allSatisfy as "All" and "anySatisfy" as "Any". A similar template named "Map" can be created if not already present, to map a given template to a typetuple. In future I'll probably write more notes, suggestions and questions about Phobos2. I hope such notes are very useful. Bye, bearophile | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply