| Thread overview | |||||||||
|---|---|---|---|---|---|---|---|---|---|
|
December 25, 2013 dmd simple loop disassembly - redundant instruction? | ||||
|---|---|---|---|---|
| ||||
Hello,
I am studying the difference between x86 generated code of DMD and C/C++ compilers on Windows (simply put: why exactly, and by what margin, DMD-compiled D code is often slower than GCC-compiled C/C++ equivalent).
Now, I have this simple D program:
-----
immutable int MAX_N = 1_000_000;
void main () {
int [MAX_N] a;
foreach (i; 0..MAX_N)
a[i] = i;
}
-----
(I know there's iota in std.range, and it turns out to be even slower - but that's a high level function, and I'm trying to understand the lower-level details now.)
The assembly (dmd -O -release -inline -noboundscheck, then obj2asm) has the following piece corresponding to the cycle:
-----
L2C: mov -03D0900h[EDX*4][EBP],EDX
mov ECX,EDX
inc EDX
cmp EDX,0F4240h
jb L2C
-----
Now, I am not exactly fluent in assembler, but the "mov ECX, EDX" seems unnecessary. The ECX register is explicitly used three times in the whole program, and it looks like this instruction can at least be moved out of the loop, if not removed completely. Is it indeed a bug, or there's some reason here? And if the former, where do I report it - at http://d.puremagic.com/issues/, as with the front-end?
I didn't try GDC or LDC since I didn't find a clear instruction for using them under Win32. If there is one, please kindly point me to it. I found a few explanations for GDC, but had a hard time trying to figure out which is the most current one.
Note that the C++ version does the same with four instructions instead of five, as D version is expected to be if we remove the instruction in question. Indeed, it goes like (code inside the loop):
-----
L3:
movl %eax, _a(,%eax,4)
addl $1, %eax
cmpl $1000000, %eax
jne L3
-----
The full assembly listings, and the source codes (D and C++), are here:
http://acm.math.spbu.ru/~gassa/dlang/simple_loop/
I've tried a few other versions as well. Changing the loop to an explicit "for (int i = 0; i < MAX_N; i++)" (a2.d) does not affect the generated assembly. Making the array dynamic (a3.d) leads to five instructions, all seemingly important. A __gshared static array (a4.d) gives the same seemingly unneeded instruction but with EAX instead of ECX:
-----
L2: mov _D2a41aG1000000i[EDX*4],EDX
mov EAX,EDX
inc EDX
cmp EDX,0F4240h
jb L2
-----
Ivan Kazmenko.
| ||||
December 25, 2013 Re: dmd simple loop disassembly - redundant instruction? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ivan Kazmenko | On Wednesday, 25 December 2013 at 12:03:08 UTC, Ivan Kazmenko wrote:
> Now, I am not exactly fluent in assembler, but the "mov ECX, EDX" seems unnecessary. The ECX register is explicitly used three times in the whole program, and it looks like this instruction can at least be moved out of the loop, if not removed completely.
> Is it indeed a bug, or there's some reason here? And if the former, where do I report it - at http://d.puremagic.com/issues/, as with the front-end?
Did you try something like:
for(immutable i; 0..MAX_N)
a[i] = i;
too? One thing to note is that, technically, i is a _copy_ of the iterated number. So things like
for(i; 0..5)
i++;
have no effect (it will loop 5 times regardless). Indeed, in your case, this could be optimized out, but in general the extra instruction is technically correct. I don't know if making i immutable would change things, but it might give the compiler enough of a hint to do the correct optimization here.
| |||
December 25, 2013 Re: dmd simple loop disassembly - redundant instruction? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Chris Cain | On Wednesday, 25 December 2013 at 12:43:05 UTC, Chris Cain wrote:
> Did you try something like:
>
> for(immutable i; 0..MAX_N)
> a[i] = i;
>
> too? One thing to note is that, technically, i is a _copy_ of the iterated number. So things like
>
> for(i; 0..5)
> i++;
>
> have no effect (it will loop 5 times regardless). Indeed, in your case, this could be optimized out, but in general the extra instruction is technically correct. I don't know if making i immutable would change things, but it might give the compiler enough of a hint to do the correct optimization here.
Thanks, that sounded reasonable. Still, in this particular case, the generated assembly remained the same.
| |||
December 25, 2013 Re: dmd simple loop disassembly - redundant instruction? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ivan Kazmenko | Ivan Kazmenko:
> I am studying the difference between x86 generated code of DMD and C/C++ compilers on Windows (simply put: why exactly, and by what margin, DMD-compiled D code is often slower than GCC-compiled C/C++ equivalent).
>
> Now, I have this simple D program:
>
> -----
> immutable int MAX_N = 1_000_000;
> void main () {
> int [MAX_N] a;
> foreach (i; 0..MAX_N)
> a[i] = i;
> }
> -----
>
> (I know there's iota in std.range, and it turns out to be even slower - but that's a high level function, and I'm trying to understand the lower-level details now.)
>
> The assembly (dmd -O -release -inline -noboundscheck, then obj2asm) has the following piece corresponding to the cycle:
>
> -----
> L2C: mov -03D0900h[EDX*4][EBP],EDX
> mov ECX,EDX
> inc EDX
> cmp EDX,0F4240h
> jb L2C
> -----
ldc2 optimizes the useless loop away:
__Dmain:
xorl %eax, %eax
ret
If I modify the code returning some value from the int main:
return a[7];
ldc2 gives the loop code:
LBB0_1:
movl %eax, 12(%esp,%eax,4)
incl %eax
cmpl $1000000, %eax
jne LBB0_1
If I use iota ldc2 copiles the loop to exactly the same asm:
foreach (i; MAX_N.iota)
Bye,
bearophile
| |||
December 26, 2013 Re: dmd simple loop disassembly - redundant instruction? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Wednesday, 25 December 2013 at 14:51:11 UTC, bearophile wrote: > ldc2 optimizes the useless loop away: > > __Dmain: > xorl %eax, %eax > ret > > If I modify the code returning some value from the int main: > return a[7]; > > ldc2 gives the loop code: > > LBB0_1: > movl %eax, 12(%esp,%eax,4) > incl %eax > cmpl $1000000, %eax > jne LBB0_1 > > If I use iota ldc2 copiles the loop to exactly the same asm: > > foreach (i; MAX_N.iota) Glad to know that! But what about DMD? Anyone?.. If someone with better knowledge in assembly confirms the instruction is unnecessary, I'll file a bug report (at http://d.puremagic.com/issues/ I presume). Ivan Kazmenko. | |||
December 26, 2013 Re: dmd simple loop disassembly - redundant instruction? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ivan Kazmenko | On 12/25/13, 20:03, Ivan Kazmenko wrote:
> -----
> L2C: mov -03D0900h[EDX*4][EBP],EDX
> mov ECX,EDX
> inc EDX
> cmp EDX,0F4240h
> jb L2C
> -----
You should have said that all instructions are redundant :) Looks like the array got optimized out, but then the optimizer stopped. The ECX likely refers to the 'i' loop variable. When the array write code got optimized out, the compile could have figured out that 'i' was in turn unused as well and remove it too. And then, the foreach, etc...
You can file backend bugs on the same site.
| |||
December 26, 2013 Re: dmd simple loop disassembly - redundant instruction? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Lionello Lunesu | On Thursday, 26 December 2013 at 08:08:09 UTC, Lionello Lunesu wrote: > You should have said that all instructions are redundant :) Looks like the array got optimized out, but then the optimizer stopped. The ECX likely refers to the 'i' loop variable. When the array write code got optimized out, the compile could have figured out that 'i' was in turn unused as well and remove it too. And then, the foreach, etc... I added the "return a[7];" part from bearophile's suggestion, but that did not change anything in the loop code. So, my guess is that the array does not get optimized out at all. > You can file backend bugs on the same site. Thanks, issue created! https://d.puremagic.com/issues/show_bug.cgi?id=11821 Ivan Kazmenko. | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply