July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brad Roberts | Will do, and that's goo to know :-)
On Jul 2, 2010, at 11:49 AM, Brad Roberts wrote:
> Please file bugs with reduced test cases where seemingly obvious inlining isn't done. I'd like to spend time on them. Ref parameters were one of the things that blocked inlining until this most recent release (or maybe the one before that). That works now. :)
>
> On Fri, 2 Jul 2010, Sean Kelly wrote:
>
>> Date: Fri, 2 Jul 2010 11:26:58 -0700
>> From: Sean Kelly <sean at invisibleduck.org>
>> Reply-To: Discuss the phobos library for D <phobos at puremagic.com>
>> To: Discuss the phobos library for D <phobos at puremagic.com>
>> Subject: Re: [phobos] std.algorithm.sort slow as molasses
>>
>> The rules for inlining are weird. When I wrote Array.sort I noticed that a nested function that took reference parameters (ie. swap) wasn't inlined either, so I changed mine to use indexes and accessed the outer array that way. For this level of performance tuning you really have to look at the ASM output to see what's going on.
>>
>> On Jul 2, 2010, at 9:39 AM, David Simcha wrote:
>>
>>> .
>>> On Fri, Jul 2, 2010 at 12:21 PM, Andrei Alexandrescu <andrei at erdani.com> wrote:
>>> One simple solution would be for you to contribute dstat's sort to Phobos. However, I'd be curious what the reason of std's sort slowness is. I suspect it might be the fact that I use qsort all the way down instead of switching to insertion sort. What is your sort's strategy?
>>>
>>>
>>> Insertion sort at 25 elements. This is based on fairly heavy empirical testing. I tried disabling this and doing qsort all the way down. This only explains a small part of the difference (about 30 milliseconds' worth).
>>>
>>> I looked at the std.algorithm code and I think I see at least part of the problem, but I don't know how to fix it w/o completely gutting the code and rewriting it:
>>>
>>> // This is probably not inlined b/c I don't think DMD can inline nested functions
>>> // that access the outer scope. Someone please confirm this
>>> bool pred(ElementType!(Range) a)
>>> {
>>> return less(a, r.back);
>>> }
>>> auto right = partition!(pred, ss)(r);
>>>
>>> _______________________________________________
>>> phobos mailing list
>>> phobos at puremagic.com
>>> http://lists.puremagic.com/mailman/listinfo/phobos
>>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos_______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
|
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brad Roberts | Brad Roberts, el 2 de julio a las 11:49 me escribiste: > Please file bugs with reduced test cases where seemingly obvious inlining isn't done. I'd like to spend time on them. Ref parameters were one of the things that blocked inlining until this most recent release (or maybe the one before that). That works now. :) See bug 859 http://d.puremagic.com/issues/show_bug.cgi?id=859 Is not a great bug report but there is a little test case. This is mostly D1 because of ranges, but using opApply with the current state of inlining is almost impossible for performance-aware applications. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Hello? Is there anybody in there? Just nod if you can hear me. Is there anyone at home? |
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly |
Sean Kelly wrote:
> Will do, and that's goo to know :-)
>
>
I, for one, don't want to know about any goo!
|
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu |
Andrei Alexandrescu wrote:
> Working around that would mess much of the elegance of std.sort, sigh. The nice thing is that sort reuses partition instead of writing a dedicated version of it (as I had in my implementation for a while). I was quite glad I'd pulled that off.
>
Most heavily optimized implementations of things tend to have lost their elegance. qsort is no stranger to that. Just look at all the nasty things people do trying to make memcpy() run faster!
|
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright wrote:
>
>
> Andrei Alexandrescu wrote:
>> Working around that would mess much of the elegance of std.sort, sigh. The nice thing is that sort reuses partition instead of writing a dedicated version of it (as I had in my implementation for a while). I was quite glad I'd pulled that off.
>>
>
> Most heavily optimized implementations of things tend to have lost their elegance. qsort is no stranger to that. Just look at all the nasty things people do trying to make memcpy() run faster!
As I said before, I have no simpathy for that position. That crap is sometimes necessary doesn't mean we should endorse it. We should allow inlining of that function if that's the performance problem, not replace it with a workaround.
Andrei
|
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to Leandro Lucarella | On Fri, 2 Jul 2010, Leandro Lucarella wrote:
> See bug 859 http://d.puremagic.com/issues/show_bug.cgi?id=859
>
> Is not a great bug report but there is a little test case. This is mostly D1 because of ranges, but using opApply with the current state of inlining is almost impossible for performance-aware applications.
That one is on my radar already. I'm going to finish working on the migration of the dmd test suite before digging back into ticket, but this one is high on my list for what to tackle next.
Is it directly impacting this specific case (I haven't looked at the code involved)?
Later,
Brad
|
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to Brad Roberts | I have gotten around to doing some disassembly reading, and apparently I was wrong. DMD does inline nested functions passed as templates, even when they access the outer scope, at least in trivial cases. D code: import std.c.stdio; void main(string[] args) { uint num = args.length; // Make sure this isn't const folded. bool comp(uint otherNum) { return otherNum < num; } // Need to print to keep this from being optimized out entirely. printf("%d", evalPred!comp(300)); } bool evalPred(alias pred)(uint num) { // This line is to make sure that this function doesn't get inlined into // main. DMD doesn't inline functions that could throw. if(num == uint.max) { throw new Exception(""); } return pred(num); } Disassembly of evalPred!comp: _D4test4mainFAAyaZv45__T8evalPredS29_D4test4mainFAAyaZv4compMFkZbZ8evalPredMFkZb PROC NEAR ; COMDEF _D4test4mainFAAyaZv45__T8evalPredS29_D4test4mainFAAyaZv4compMFkZbZ8evalPredMFkZb push eax ; 0000 _ 50 cmp dword ptr [esp+8H], -1 ; 0001 _ 83. 7C 24, 08, FF jnz ?_006 ; 0006 _ 75, 27 mov ecx, offset FLAT:_D9Exception7__ClassZ ; 0008 _ B9, 00000000(segrel) push ecx ; 000D _ 51 call __d_newclass ; 000E _ E8, 00000000(rel) add esp, 4 ; 0013 _ 83. C4, 04 push dword ptr [?_004] ; 0016 _ FF. 35, 0000000C(segrel) push dword ptr [?_003] ; 001C _ FF. 35, 00000008(segrel) push 0 ; 0022 _ 6A, 00 call _D6object9Exception6__ctorMFAyaC6object9ThrowableZC9Exception; 0024 _ E8, 00000000(rel) push eax ; 0029 _ 50 call __d_throw at 4 ; 002A _ E8, 00000000(rel) ?_006: mov eax, dword ptr [esp] ; 002F _ 8B. 04 24 mov edx, dword ptr [eax] ; 0032 _ 8B. 10 mov eax, 1 ; 0034 _ B8, 00000001 cmp edx, dword ptr [esp+8H] ; 0039 _ 3B. 54 24, 08 ja ?_007 ; 003D _ 77, 02 xor eax, eax ; 003F _ 31. C0 ?_007: pop ecx ; 0041 _ 59 ret 4 ; 0042 _ C2, 0004 _D4test4mainFAAyaZv45__T8evalPredS29_D4test4mainFAAyaZv4compMFkZbZ8evalPredMFkZb ENDP -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/90a1c56d/attachment.html> |
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | I've figured out a major portion of the problem. I can shave ~100 ms off if I change the enforce in std.array.back(), which bounds checks even in release mode and also prevents inlining of std.array.back(), to an assert. It's an assert in std.array.front() already. This is consistent with the idea that arrays should be bounds checked only if asserts are enabled. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/ae82615f/attachment.html> |
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Sorry for the double paste. Please ignore everything before the second time you see _D3std9algorithm129__T8sortImplS793std10functional54__T13binaryFunImplVAyaa5_61203c2... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/33cc5689/attachment-0001.html> |
July 02, 2010 [phobos] std.algorithm.sort slow as molasses | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Simcha | Sorry, the list wouldn't post my accidental double paste anyhow because it was too big. Here's the disassembly of partition, showing only back() is getting called and pred seems to be getting inlined, contrary to what I had believed before. So changing enforce() in back() to assert() gets rid of about half the overhead, but I have no idea where the other half is coming from. _D3std9algorithm129__T8sortImplS793std10functional54__T13binaryFunImplVAyaa5_61203c2062VAyaa1_610889BE3F182A6EA5575A2D298A20D100 PROC NEAR ; COMDEF _D3std9algorithm129__T8sortImplS793std10functional54__T13binaryFunImplVAyaa5_61203c2062VAyaa1_610889BE3F182A6EA5575A2D298A20D100 sub esp, 36 ; 0000 _ 83. EC, 24 mov ecx, dword ptr [esp+2CH] ; 0003 _ 8B. 4C 24, 2C push ebx ; 0007 _ 53 mov ebx, dword ptr [esp+2CH] ; 0008 _ 8B. 5C 24, 2C test ebx, ebx ; 000C _ 85. DB push ebp ; 000E _ 55 push esi ; 000F _ 56 mov esi, eax ; 0010 _ 89. C6 push edi ; 0012 _ 57 jnz ?_020 ; 0013 _ 75, 0E pop edi ; 0015 _ 5F mov eax, ebx ; 0016 _ 8B. C3 mov edx, ecx ; 0018 _ 8B. D1 pop esi ; 001A _ 5E pop ebp ; 001B _ 5D pop ebx ; 001C _ 5B add esp, 36 ; 001D _ 83. C4, 24 ret 8 ; 0020 _ C2, 0008 ?_020: mov dword ptr [esp+3CH], ecx ; 0023 _ 89. 4C 24, 3C mov edx, dword ptr [esp+3CH] ; 0027 _ 8B. 54 24, 3C mov dword ptr [esp+38H], ebx ; 002B _ 89. 5C 24, 38 mov ebx, dword ptr [esp+38H] ; 002F _ 8B. 5C 24, 38 mov dword ptr [esp+14H], ebx ; 0033 _ 89. 5C 24, 14 mov dword ptr [esp+18H], edx ; 0037 _ 89. 54 24, 18 cmp dword ptr [esp+38H], 0 ; 003B _ 83. 7C 24, 38, 00 je ?_023 ; 0040 _ 0F 84, 00000088 ?_021: mov edx, dword ptr [esp+3CH] ; 0046 _ 8B. 54 24, 3C mov eax, dword ptr [esp+38H] ; 004A _ 8B. 44 24, 38 mov ebx, dword ptr [edx] ; 004E _ 8B. 1A push dword ptr [esi+0CH] ; 0050 _ FF. 76, 0C push dword ptr [esi+8H] ; 0053 _ FF. 76, 08 mov dword ptr [esp+18H], edx ; 0056 _ 89. 54 24, 18 call _D3std5array12__T4backTAiZ4backFNcAiZi ; 005A _ E8, 00000000(rel) cmp dword ptr [eax], ebx ; 005F _ 39. 18 jle ?_022 ; 0061 _ 7E, 31 mov edi, dword ptr [esp+38H] ; 0063 _ 8B. 7C 24, 38 mov ecx, dword ptr [esp+10H] ; 0067 _ 8B. 4C 24, 10 dec edi ; 006B _ 4F lea edx, [ecx+4H] ; 006C _ 8D. 51, 04 mov ebx, dword ptr [esp+14H] ; 006F _ 8B. 5C 24, 14 dec ebx ; 0073 _ 4B mov dword ptr [esp+3CH], edx ; 0074 _ 89. 54 24, 3C mov edx, dword ptr [esp+18H] ; 0078 _ 8B. 54 24, 18 mov eax, dword ptr [esp+14H] ; 007C _ 8B. 44 24, 14 mov dword ptr [esp+38H], edi ; 0080 _ 89. 7C 24, 38 add edx, 4 ; 0084 _ 83. C2, 04 mov dword ptr [esp+14H], ebx ; 0087 _ 89. 5C 24, 14 mov dword ptr [esp+18H], edx ; 008B _ 89. 54 24, 18 jmp ?_025 ; 008F _ E9, 000000AA ?_022: push dword ptr [esp+3CH] ; 0094 _ FF. 74 24, 3C push dword ptr [esp+3CH] ; 0098 _ FF. 74 24, 3C call _D3std5array12__T4backTAiZ4backFNcAiZi ; 009C _ E8, 00000000(rel) mov ebx, dword ptr [eax] ; 00A1 _ 8B. 18 push dword ptr [esi+0CH] ; 00A3 _ FF. 76, 0C push dword ptr [esi+8H] ; 00A6 _ FF. 76, 08 call _D3std5array12__T4backTAiZ4backFNcAiZi ; 00A9 _ E8, 00000000(rel) cmp dword ptr [eax], ebx ; 00AE _ 39. 18 jg ?_024 ; 00B0 _ 7F, 2E mov edi, dword ptr [esp+38H] ; 00B2 _ 8B. 7C 24, 38 dec edi ; 00B6 _ 4F mov eax, dword ptr [esp+38H] ; 00B7 _ 8B. 44 24, 38 mov dword ptr [esp+38H], edi ; 00BB _ 89. 7C 24, 38 mov edx, dword ptr [esp+3CH] ; 00BF _ 8B. 54 24, 3C mov dword ptr [esp+3CH], edx ; 00C3 _ 89. 54 24, 3C cmp dword ptr [esp+38H], 0 ; 00C7 _ 83. 7C 24, 38, 00 jnz ?_022 ; 00CC _ 75, C6 ?_023: mov edx, dword ptr [esp+18H] ; 00CE _ 8B. 54 24, 18 mov eax, dword ptr [esp+14H] ; 00D2 _ 8B. 44 24, 14 pop edi ; 00D6 _ 5F pop esi ; 00D7 _ 5E pop ebp ; 00D8 _ 5D pop ebx ; 00D9 _ 5B add esp, 36 ; 00DA _ 83. C4, 24 ret 8 ; 00DD _ C2, 0008 ?_024: push dword ptr [esp+3CH] ; 00E0 _ FF. 74 24, 3C push dword ptr [esp+3CH] ; 00E4 _ FF. 74 24, 3C call _D3std5array12__T4backTAiZ4backFNcAiZi ; 00E8 _ E8, 00000000(rel) mov edi, eax ; 00ED _ 89. C7 mov edx, dword ptr [esp+3CH] ; 00EF _ 8B. 54 24, 3C mov ebp, edx ; 00F3 _ 89. D5 mov eax, dword ptr [esp+38H] ; 00F5 _ 8B. 44 24, 38 ; Note: Zero displacement could be omitted mov ecx, dword ptr [edx] ; 00F9 _ 8B. 4A, 00 mov ebx, dword ptr [edi] ; 00FC _ 8B. 1F lea edx, [edx+4H] ; 00FE _ 8D. 52, 04 mov dword ptr [ebp], ebx ; 0101 _ 89. 5D, 00 mov eax, dword ptr [esp+38H] ; 0104 _ 8B. 44 24, 38 dec eax ; 0108 _ 48 mov dword ptr [edi], ecx ; 0109 _ 89. 0F mov ebx, dword ptr [esp+14H] ; 010B _ 8B. 5C 24, 14 mov ecx, dword ptr [esp+18H] ; 010F _ 8B. 4C 24, 18 mov dword ptr [esp+38H], eax ; 0113 _ 89. 44 24, 38 dec ebx ; 0117 _ 4B mov eax, dword ptr [esp+14H] ; 0118 _ 8B. 44 24, 14 mov dword ptr [esp+14H], ebx ; 011C _ 89. 5C 24, 14 add ecx, 4 ; 0120 _ 83. C1, 04 mov ebx, dword ptr [esp+38H] ; 0123 _ 8B. 5C 24, 38 mov dword ptr [esp+18H], ecx ; 0127 _ 89. 4C 24, 18 dec ebx ; 012B _ 4B mov eax, dword ptr [esp+38H] ; 012C _ 8B. 44 24, 38 mov dword ptr [esp+3CH], edx ; 0130 _ 89. 54 24, 3C mov ecx, edx ; 0134 _ 8B. CA mov dword ptr [esp+38H], ebx ; 0136 _ 89. 5C 24, 38 mov dword ptr [esp+3CH], ecx ; 013A _ 89. 4C 24, 3C ?_025: cmp dword ptr [esp+38H], 0 ; 013E _ 83. 7C 24, 38, 00 jne ?_021 ; 0143 _ 0F 85, FFFFFEFD ; Note: Immediate operand could be made smaller by sign extension jmp ?_023 ; 0149 _ E9, FFFFFF80 _D3std9algorithm129__T8sortImplS793std10functional54__T13binaryFunImplVAyaa5_61203c2062VAyaa1_610889BE3F182A6EA5575A2D298A20D100 ENDP On Fri, Jul 2, 2010 at 5:25 PM, David Simcha <dsimcha at gmail.com> wrote: > Sorry for the double paste. Please ignore everything before the second > time you see > _D3std9algorithm129__T8sortImplS793std10functional54__T13binaryFunImplVAyaa5_61203c2... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100702/11298817/attachment-0001.html> |
Copyright © 1999-2021 by the D Language Foundation