Jump to page: 1 2
Thread overview
-noboundscheck
Aug 19, 2012
Nvirjskly
Aug 19, 2012
bearophile
Aug 19, 2012
Jonathan M Davis
Aug 19, 2012
Nvirjskly
Aug 19, 2012
Jonathan M Davis
Aug 19, 2012
1100110
Aug 19, 2012
Nvirjskly
Aug 19, 2012
1100110
Aug 19, 2012
1100110
Aug 20, 2012
Nvirjskly
Aug 20, 2012
1100110
Aug 20, 2012
Nvirjskly
Aug 20, 2012
Nvirjskly
Aug 20, 2012
bearophile
August 19, 2012
Compiling my code with the -noboundscheck flag sped it up by almost 5 times (whilst passing all tests and working exactly the same way,) is bounds checking really that expensive, and what other simple optimisations can I preform other than -inline -O -noboundscheck?
August 19, 2012
Nvirjskly:

> is bounds checking really that expensive,

The D front-end is very dumb in this, as far as I know it makes no attempts to remove those tests where they can't fail. Walter believes such optimizations don't gain much.


> what other simple optimisations can I preform other than -inline -O -noboundscheck?

Compiler options change across different compilers. What compiler are you using?

Bye,
bearophile
August 19, 2012
On Sunday, August 19, 2012 21:29:38 Nvirjskly wrote:
> Compiling my code with the -noboundscheck flag sped it up by almost 5 times (whilst passing all tests and working exactly the same way,) is bounds checking really that expensive, and what other simple optimisations can I preform other than -inline -O -noboundscheck?

It would depend entirely on your code. In most cases, I wouldn't expect to see a speed up anywhere near that large. But if you're constantly accessing arrays and doing little other computation, then maybe you do. I have no idea what your code is doing.

dmd's optimizer isn't the best anyway. It compiles much faster than gdc and ldc do, but it usually generates slower code (the focus on dmd has generally been getting everything working correctly rather than optimizing everything to death, though that should change with time). Whatever the situation with your code is, I'd expect that that the situation with its optimizations would change quite a bit with one of the other D compilers.

- Jonathan M Davis
August 19, 2012
On Sunday, 19 August 2012 at 20:07:32 UTC, Jonathan M Davis wrote:
> On Sunday, August 19, 2012 21:29:38 Nvirjskly wrote:
>> Compiling my code with the -noboundscheck flag sped it up by
>> almost 5 times (whilst passing all tests and working exactly the
>> same way,) is bounds checking really that expensive, and what
>> other simple optimisations can I preform other than -inline -O
>> -noboundscheck?
>
> It would depend entirely on your code. In most cases, I wouldn't expect to see
> a speed up anywhere near that large. But if you're constantly accessing arrays
> and doing little other computation, then maybe you do. I have no idea what
> your code is doing.
>

I am using dmd.

Yes, my code is extremely array heavy with many array-based computations (Shame-less plug: https://github.com/Nvirjskly/cryptod)

> dmd's optimizer isn't the best anyway. It compiles much faster than gdc and
> ldc do, but it usually generates slower code (the focus on dmd has generally
> been getting everything working correctly rather than optimizing everything to
> death, though that should change with time). Whatever the situation with your
> code is, I'd expect that that the situation with its optimizations would
> change quite a bit with one of the other D compilers.
>
> - Jonathan M Davis

Ah, that makes a lot of sense. If my goal is a fast running time would it then make sense to use another compiler? I heard that gdc development is lagging behind and that ldc might not even support D2 all that well?

August 19, 2012
On Sunday, August 19, 2012 22:13:15 Nvirjskly wrote:
> Ah, that makes a lot of sense. If my goal is a fast running time would it then make sense to use another compiler? I heard that gdc development is lagging behind and that ldc might not even support D2 all that well?

Both gdc and ldc support D2, though sometimes they're a relesae behind (especially right after a new dmd release). I don't remember the sites for them, but I do recall that one or both of them have had issues where their old site is generally the first one that you find, so it looks like they don't support D2. But that's an issue with hits and google, not the compiler's themselves.

But if you want your code to be as fast as possible, then use either gdc or ldc, though I don't know which is better (it probably depends on your code).

- Jonathan M Davis
August 19, 2012
I have gdc, dmd, and ldc installed on my computer.

I also forked your repo two minutes before reading this.


Tell me what you want, and Ill run whatever tests you want.
But in return, I'm stealing your whirlpool.(with attribution of course.)
August 19, 2012
On Sunday, 19 August 2012 at 21:11:13 UTC, 1100110 wrote:
> I have gdc, dmd, and ldc installed on my computer.
>
> I also forked your repo two minutes before reading this.
>
>
> Tell me what you want, and Ill run whatever tests you want.
> But in return, I'm stealing your whirlpool.(with attribution of course.)

Haha I actually do not have whirlpool implemented yet (it's an empty file,) but since you seem to want it, it's right at the top of my TODO list (if I'm lucky I'll get it done by the end of today, but best bet is this time tomorrow. I already have the spec open.)

benchmark.d contains a main function  that runs some rudimentary benchmarks if you want to compile it with that...

import std.process, std.stdio, std.file, std.path;
void main()
{
 string files = "";
 foreach (string name; dirEntries("src", SpanMode.breadth))
 {
  if(name.isFile())
  files ~= name ~ " ";
 }
 string command = "dmd " ~ files ~ "benchmark.d -ofcryptod -noboundscheck -O -release -inline";
 writeln(shell(command));
}

should compile that with dmd, I'm not sure about ldc or gdc and their compiler options, but it should be something similar...
August 19, 2012
Yeah, I figured it out.  I did have to rename src though...

I ran a few tests, inconclusive for any serious difference.
gdc is now compiling with -O3 -march=native -frelease -fno-bounds-check -finline -ffast-math.

But no, dmd has the shortest compile times, gdmd the longest.
I'm timing everything right now.

...My laptop is getting hot...

I want to see how bad it crashes.  =P
On Sun, 19 Aug 2012 17:17:02 -0500, Nvirjskly <nvirjskly@gmail.com> wrote:

> On Sunday, 19 August 2012 at 21:11:13 UTC, 1100110 wrote:
>> I have gdc, dmd, and ldc installed on my computer.
>>
>> I also forked your repo two minutes before reading this.
>>
>>
>> Tell me what you want, and Ill run whatever tests you want.
>> But in return, I'm stealing your whirlpool.(with attribution of course.)
>
> Haha I actually do not have whirlpool implemented yet (it's an empty file,) but since you seem to want it, it's right at the top of my TODO list (if I'm lucky I'll get it done by the end of today, but best bet is this time tomorrow. I already have the spec open.)
>
> benchmark.d contains a main function  that runs some rudimentary benchmarks if you want to compile it with that...
>
> import std.process, std.stdio, std.file, std.path;
> void main()
> {
>   string files = "";
>   foreach (string name; dirEntries("src", SpanMode.breadth))
>   {
>    if(name.isFile())
>    files ~= name ~ " ";
>   }
>   string command = "dmd " ~ files ~ "benchmark.d -ofcryptod -noboundscheck -O -release -inline";
>   writeln(shell(command));
> }
>
> should compile that with dmd, I'm not sure about ldc or gdc and their compiler options, but it should be something similar...


-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
August 19, 2012
Here are my results!  iirc -release implies -noboundscheck..
Also I am on x64, and these files only compile to 32bit. So there could be
performance missing there.

rdmd --force -I../ -m32 -O -inline -release benchmark.d 26.00s user 0.23s system 99% cpu 26.386 total
---
2048 md2 in 1003 milliseconds: 15.9521 Mib/s
32768 md4 in 682 milliseconds: 375.367 Mib/s
32768 md5 in 426 milliseconds: 600.939 Mib/s
8192 ripemd160 in 779 milliseconds: 82.1566 Mib/s
4096 sha1 in 276 milliseconds: 115.942 Mib/s
16777216 ints generated by mersenne twister in 1146 milliseconds: 446.771
Mib/s
256 ints generated by BlumBlumShub in 812 milliseconds: 0.00962131 Mib/s
1048576 texts blowfish encrypted in 645 milliseconds: 99.2248 Mib/s
65536 texts threefish encrypted in 2774 milliseconds: 5.76784 Mib/s
131072 texts AES128 encrypted in 896 milliseconds: 17.8571 Mib/s

rdmd --force -I../ -m32 benchmark.d
16.79s user 0.19s system 99% cpu 17.048 total
---
2048 md2 in 1546 milliseconds: 10.3493 Mib/s
32768 md4 in 1240 milliseconds: 206.452 Mib/s
32768 md5 in 1558 milliseconds: 164.313 Mib/s
8192 ripemd160 in 1535 milliseconds: 41.6938 Mib/s
4096 sha1 in 616 milliseconds: 51.9481 Mib/s
16777216 ints generated by mersenne twister in 1510 milliseconds: 339.073
Mib/s
256 ints generated by BlumBlumShub in 816 milliseconds: 0.00957414 Mib/s
1048576 texts blowfish encrypted in 1094 milliseconds: 58.5009 Mib/s
65536 texts threefish encrypted in 3316 milliseconds: 4.82509 Mib/s
131072 texts AES128 encrypted in 1945 milliseconds: 8.22622 Mib/s


(ldc && gdc REALLY hate building 32bit code...)


rdmd --compiler=ldmd2 --force -I../ -m32 -O -release -noboundscheck
benchmark.d
2048 md2 in 570 milliseconds: 28.0702 Mib/s
32768 md4 in 765 milliseconds: 334.641 Mib/s
32768 md5 in 840 milliseconds: 304.762 Mib/s
8192 ripemd160 in 571 milliseconds: 112.084 Mib/s
4096 sha1 in 263 milliseconds: 121.673 Mib/s
16777216 ints generated by mersenne twister in 747 milliseconds: 685.408
Mib/s
core.exception.AssertError@/build/src/ldc-build/runtime/phobos/std/internal/math/biguintcore.d(2044):
Assertion failure

real 0m8.957s
user 0m8.499s
sys 0m0.387s


rdmd --compiler=ldmd2 --force -I../ -m32 benchmark.d
2048 md2 in 2680 milliseconds: 5.97015 Mib/s
32768 md4 in 2088 milliseconds: 122.605 Mib/s
32768 md5 in 2465 milliseconds: 103.854 Mib/s
8192 ripemd160 in 2051 milliseconds: 31.2043 Mib/s
4096 sha1 in 742 milliseconds: 43.1267 Mib/s
16777216 ints generated by mersenne twister in 1580 milliseconds: 324.051
Mib/s
core.exception.AssertError@/build/src/ldc-build/runtime/phobos/std/internal/math/biguintcore.d(2044):
Assertion failure

real 0m14.722s
user 0m14.412s
sys 0m0.230s

I think gdc died...
binary /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.0/cc1d
version v2.059
parse benchmark
importall benchmark
import import import import import import import import import import
import import import impo
rt import import import import import import import import import import
import import import
import import import import import import import import import import
import import import im
port import import import import import import import import import import
import import import
import import import import import import import import import import
import import import
import import import import import import import import import import
import import import impo
rt import import import import import import import import import import
import import import
import import import import import import import import import import
import semantic benchmark
import import semantic2 benchmark
semantic3 benchmark
import import code benchmark
/usr/bin/ld: cannot find -lgphobos2
collect2: error: ld returned 1 exit status

real 0m15.950s
user 0m15.629s
sys 0m0.190s


I managed to force dmd and (partial) ldc builds for -m64
rdmd --force -O -m64 -release -noboundscheck -I../ benchmark.d 14.29s user
0.19s system 99% cpu 14.553 total
2048 md2 in 1026 milliseconds: 15.5945 Mib/s
32768 md4 in 737 milliseconds: 347.354 Mib/s
32768 md5 in 1078 milliseconds: 237.477 Mib/s
8192 ripemd160 in 922 milliseconds: 69.4143 Mib/s
4096 sha1 in 309 milliseconds: 103.56 Mib/s
16777216 ints generated by mersenne twister in 1079 milliseconds: 474.513
Mib/s
256 ints generated by BlumBlumShub in 3661 milliseconds: 0.00213398 Mib/s
1048576 texts blowfish encrypted in 593 milliseconds: 107.926 Mib/s
65536 texts threefish encrypted in 2376 milliseconds: 6.73401 Mib/s
131072 texts AES128 encrypted in 874 milliseconds: 18.3066 Mib/s

2048 md2 in 587 milliseconds: 27.2572 Mib/s
32768 md4 in 675 milliseconds: 379.259 Mib/s
32768 md5 in 752 milliseconds: 340.426 Mib/s
8192 ripemd160 in 539 milliseconds: 118.738 Mib/s
4096 sha1 in 236 milliseconds: 135.593 Mib/s
16777216 ints generated by mersenne twister in 684 milliseconds: 748.538
Mib/s
core.exception.AssertError@/build/src/ldc-build/runtime/phobos/std/internal/math/biguintcore.d(2044):
Assertion failure


dmd -O -release -m64 -noboundscheck
2048 md2 in 1079 milliseconds: 14.8285 Mib/s
32768 md4 in 804 milliseconds: 318.408 Mib/s
32768 md5 in 1042 milliseconds: 245.681 Mib/s
8192 ripemd160 in 972 milliseconds: 65.8436 Mib/s
4096 sha1 in 324 milliseconds: 98.7654 Mib/s
16777216 ints generated by mersenne twister in 1072 milliseconds: 477.612
Mib/s
256 ints generated by BlumBlumShub in 3611 milliseconds: 0.00216353 Mib/s
1048576 texts blowfish encrypted in 581 milliseconds: 110.155 Mib/s
65536 texts threefish encrypted in 2456 milliseconds: 6.51466 Mib/s
131072 texts AES128 encrypted in 878 milliseconds: 18.2232 Mib/s


Please hold while gdc is being recompiled....

August 20, 2012
On Sunday, 19 August 2012 at 23:48:36 UTC, 1100110 wrote:
> Here are my results!  iirc -release implies -noboundscheck..
> Also I am on x64, and these files only compile to 32bit. So there could be
> performance missing there.

Wow, thanks. It looks like ldc2 does not play nice with std.bigint, which is all the more reason for me to use my own version. If you want to see it run and not assert out, remove benchmark_bbs(); from main() in benchamrk.d

std.bigint seems to have a lot of problems as I had to repeatedly mess around with things that SHOULD work. I think I should file a few bug reports :/

I think GDC is dying because I have scope imports scattered everywhere and it might not play nice with those... bah.

So it looks like ldc2 produces somewhat faster code, if not for the fact that it did not play nice with std.bigint and that gdc does not follow the reference compiler in its support of scope imports... :/

So basically my code is dmd only atm and can be easily converted to support ldc2, and maybe gdc if scope imports are the only problem...

On the topic of Whirlpool, I'm almost done a naive non-optimised version, and just need to make the S-box mixin.
« First   ‹ Prev
1 2