Thread overview
Is the current D slow or Java fast ?
Feb 07, 2003
Mike Wynn
Feb 07, 2003
Burton Radons
Feb 07, 2003
Mike Wynn
Feb 07, 2003
Mike Wynn
Feb 08, 2003
Nic Tiger
Feb 07, 2003
Walter
Feb 07, 2003
Mike Wynn
Feb 08, 2003
Walter
Feb 09, 2003
Walter
February 07, 2003
I've been testing some crypto code, basic port of some Java crypto to D (the C versions are all macro'ed).

and I got some disturbing results

PC used Athlon 1G33, 512Mb 266 DDR

java version "1.4.0_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0_03-b04)
Java HotSpot(TM) Client VM (build 1.4.0_03-b04, mixed mode)
testing md5 expect a 3 to 10 minute delay
1M blocks - 24065ms 24s
1K blocks - 21381ms 21s
1G hashed in
1M blocks - 24065ms 24s
1K blocks - 21381ms 21s
100K hashed in
1B blocks - 16704ms 16s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
testing sha0 expect a 3 to 10 minute delay
1M blocks - 38215ms 38s
1K blocks - 38085ms 38s
1G hashed in
1M blocks - 38215ms 38s
1K blocks - 38085ms 38s
100K hashed in
1B blocks - 18757ms 18s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -

compiled with dmd  no options
testing md5 expect a 3 to 10 minute delay
start =1044598753, tm=1044598753
1M blocks - 40s (0m)
1K blocks - 39s (0m)
1G hashed in
1M blocks - 40s (0m)
1K blocks - 39s (0m)
100K hashed in
1B blocks - 35s (0m)
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
testing sha0 expect a 3 to 10 minute delay
start =1044598867, tm=1044598867
1M blocks - 76s (1m)
1K blocks - 76s (1m)
1G hashed in
1M blocks - 76s (1m)
1K blocks - 76s (1m)
100K hashed in
1B blocks - 44s (0m)
- -- - -- - -- - -- - -- - -- - -- - -- - -- -]

with dmd -release things are a little better
testing md5 expect a 3 to 10 minute delay
start =1044599115, tm=1044599115
1M blocks - 36s (0m)
1K blocks - 36s (0m)
1G hashed in
1M blocks - 36s (0m)
1K blocks - 36s (0m)
100K hashed in
1B blocks - 9s (0m)
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
testing sha0 expect a 3 to 10 minute delay
start =1044599196, tm=1044599196
1M blocks - 50s (0m)
1K blocks - 49s (0m)
1G hashed in
1M blocks - 50s (0m)
1K blocks - 49s (0m)
100K hashed in
1B blocks - 12s (0m)
- -- - -- - -- - -- - -- - -- - -- - -- - -- -

apart from the 100K hashed as single bytes, it is still slower than Java

a very odd thing happened when I recompiled and ran with the jdk 1.1.8
testing md5 expect a 3 to 10 minute delay
1M blocks - 14511ms 14s
1K blocks - 14370ms 14s
1G hashed in
1M blocks - 14511ms 14s
1K blocks - 14370ms 14s
100K hashed in
1B blocks - 7030ms 7s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
testing sha0 expect a 3 to 10 minute delay
1M blocks - 29593ms 29s
1K blocks - 26869ms 26s
1G hashed in
1M blocks - 29593ms 29s
1K blocks - 26869ms 26s
100K hashed in
1B blocks - 8041ms 8s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -

the same class files run under Java 2

testing md5 expect a 3 to 10 minute delay
1M blocks - 24054ms 24s
1K blocks - 21301ms 21s
1G hashed in
1M blocks - 24054ms 24s
1K blocks - 21301ms 21s
100K hashed in
1B blocks - 16443ms 16s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
testing sha0 expect a 3 to 10 minute delay
1M blocks - 40878ms 40s
1K blocks - 38185ms 38s
1G hashed in
1M blocks - 40878ms 40s
1K blocks - 38185ms 38s
100K hashed in
1B blocks - 17916ms 17s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -

I have to tryout C# and C ( I know I can read from disk and hash at over 45M/s with the C version of MD5 I have )

I've included the source,  so can someone else please verify my findings, and ideal workout where the performance hit is :)





February 07, 2003
Mike Wynn wrote:
> I've been testing some crypto code, basic port of some Java crypto to D
> (the C versions are all macro'ed).
> 
> and I got some disturbing results

Uh, yeah.  You need to enable optimisations.  When I enabled -O the first test went from 36s to 21s; when I enabled -inline as well it went from 21s to 14s.

February 07, 2003
well it's as good as C# using dmd -release
C# basic edition, not got the full optiomiser etc.
(I've only ported MD5 so far)
testing MD5 expect a 3 to 10 minute delay

start =10786002

1M blocks - 35670ms (35s)
1K blocks - 33165ms (33s)
1G hashed in
1M blocks - 35670ms (35s)
1K blocks - 33165ms (33m)

100K hashed in
1B blocks - 7130ms (7m)

I never noticed -inline or -O  yes that helps, much better results
-release -inline
testing md5 expect a 3 to 10 minute delay
start =1044643391, tm=1044643391
1M blocks - 20s (0m)
1K blocks - 19s (0m)
1G hashed in
1M blocks - 20s (0m)
1K blocks - 19s (0m)
100K hashed in
1B blocks - 8s (0m)
- -- - -- - -- - -- - -- - -- - -- - -- - -- -

-release -inline -O

testing md5 expect a 3 to 10 minute delay
start =1044644195, tm=1044644195
1M blocks - 14s (0m)
1K blocks - 14s (0m)
1G hashed in
1M blocks - 14s (0m)
1K blocks - 14s (0m)
100K hashed in
1B blocks - 8s (0m)

now that's a lot more acceptable, and much closer to what I expected.
(not tries just -O)
looks like we're running on similar hardware
time to find out what a C version can do.

Mike.
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
"Burton Radons" <loth@users.sourceforge.net> wrote in message
news:b1vs2a$2smt$1@digitaldaemon.com...
> Mike Wynn wrote:
> > I've been testing some crypto code, basic port of some Java crypto to D (the C versions are all macro'ed).
> >
> > and I got some disturbing results
>
> Uh, yeah.  You need to enable optimisations.  When I enabled -O the first test went from 36s to 21s; when I enabled -inline as well it went from 21s to 14s.
>


February 07, 2003
D is actually faster than dmc :)

similar MD5 test (C version)

compiled `dmc`
MD5 speed test (C)
time for 1073741824 bytes (1048576k [1024M])
 is 27s (block:1048576 [1024k], count:1024)
time for 1073741824 bytes (1048576k [1024M])
 is 24s (block:1024 [1k], count:1048576)
time for 1073741824 bytes (1048576k [1024M])
 is 83s (block:1 [0k], count:1073741824)

compiled with `dmc -o+speed`
MD5 speed test (C)
time for 1073741824 bytes (1048576k [1024M])
 is 17s (block:1048576 [1024k], count:1024)
time for 1073741824 bytes (1048576k [1024M])
 is 14s (block:1024 [1k], count:1048576)
time for 1073741824 bytes (1048576k [1024M])
 is 60s (block:1 [0k], count:1073741824)

gcc --version    -> 2.95.3-6  (mingw)
MD5 speed test (C)
time for 1073741824 bytes (1048576k [1024M])
 is 30s (block:1048576 [1024k], count:1024)
time for 1073741824 bytes (1048576k [1024M])
 is 27s (block:1024 [1k], count:1048576)
time for 1073741824 bytes (1048576k [1024M])
 is 79s (block:1 [0k], count:1073741824)

gcc -O3
MD5 speed test (C)
time for 1073741824 bytes (1048576k [1024M])
 is 10s (block:1048576 [1024k], count:1024)
time for 1073741824 bytes (1048576k [1024M])
 is 9s (block:1024 [1k], count:1048576)
time for 1073741824 bytes (1048576k [1024M])
 is 52s (block:1 [0k], count:1073741824)


compiled with VC++6 release
MD5 speed test (C)
time for 1073741824 bytes (1048576k [1024M])
 is 10s (block:1048576 [1024k], count:1024)
time for 1073741824 bytes (1048576k [1024M])
 is 10s (block:1024 [1k], count:1048576)
time for 1073741824 bytes (1048576k [1024M])
 is 55s (block:1 [0k], count:1073741824)

and I was interested to find gcc -O3 is the same or faster than VC++

has anyone tried to port the linux D gcc front end to mingw (I've never managed to get gcc to build so I'm not going to even start to try) ?



"Mike Wynn" <mike.wynn@l8night.co.uk> wrote in message news:b210qd$fqe$1@digitaldaemon.com...
> well it's as good as C# using dmd -release
> C# basic edition, not got the full optiomiser etc.
> (I've only ported MD5 so far)
> testing MD5 expect a 3 to 10 minute delay
>
> start =10786002
>
> 1M blocks - 35670ms (35s)
> 1K blocks - 33165ms (33s)
> 1G hashed in
> 1M blocks - 35670ms (35s)
> 1K blocks - 33165ms (33m)
>
> 100K hashed in
> 1B blocks - 7130ms (7m)
>
> I never noticed -inline or -O  yes that helps, much better results
> -release -inline
> testing md5 expect a 3 to 10 minute delay
> start =1044643391, tm=1044643391
> 1M blocks - 20s (0m)
> 1K blocks - 19s (0m)
> 1G hashed in
> 1M blocks - 20s (0m)
> 1K blocks - 19s (0m)
> 100K hashed in
> 1B blocks - 8s (0m)
> - -- - -- - -- - -- - -- - -- - -- - -- - -- -
>
> -release -inline -O
>
> testing md5 expect a 3 to 10 minute delay
> start =1044644195, tm=1044644195
> 1M blocks - 14s (0m)
> 1K blocks - 14s (0m)
> 1G hashed in
> 1M blocks - 14s (0m)
> 1K blocks - 14s (0m)
> 100K hashed in
> 1B blocks - 8s (0m)
>
> now that's a lot more acceptable, and much closer to what I expected.
> (not tries just -O)
> looks like we're running on similar hardware
> time to find out what a C version can do.
>
> Mike.
> - -- - -- - -- - -- - -- - -- - -- - -- - -- -
> "Burton Radons" <loth@users.sourceforge.net> wrote in message
> news:b1vs2a$2smt$1@digitaldaemon.com...
> > Mike Wynn wrote:
> > > I've been testing some crypto code, basic port of some Java crypto to
D
> > > (the C versions are all macro'ed).
> > >
> > > and I got some disturbing results
> >
> > Uh, yeah.  You need to enable optimisations.  When I enabled -O the first test went from 36s to 21s; when I enabled -inline as well it went from 21s to 14s.
> >
>
>


February 07, 2003
"Mike Wynn" <mike.wynn@l8night.co.uk> wrote in message news:b210qd$fqe$1@digitaldaemon.com...
> looks like we're running on similar hardware
> time to find out what a C version can do.

When comparing to C, use DMC++. The reason is that DMD and DMC++ share the optimizer and back end code generator, so you really are comparing the languages rather than the back ends.


February 07, 2003
I used the dmc.exe that comes with the D alpha
do you mean use dmc -cpp ?
and is -o+speed the right options to get the fastest code ?

the C version is a bit of a devils advocate version realy, and realisticly I would expect any OO version to always be a bit slower hashing big blocks (overhead of virtual methods, and the code reuse) and anything when hashing 1 byte entities. just shows how efficient the virtual call code is.

I think I started off trying to compare langs, and ended up comparing
backends :)
with D, Java and C# the differences in the code are subtle, I've got to try
a Java version that uses and Interface as Sun's VM used to be very poor with
interface methods, a C# version that used COM interfaces, unless it does
anyway. and a C# version with `unsafe` code, as the D version currently uses
pointers.
which means I should write a pointer free D version, along with a D version
that use interfaces too.

I think what was yesterday a random query about performance will be converted into a real utility to test languages and backends, and methods of optimising for lang X impl Y.

at first I was very conserned that D was unexpectly slow, it is not, I was doing the wrong things. however it has reasserted my faith in dynamic compilers, and that compilation speed is irrelivant, gcc which is dog slow, comes out top.

so the gauntlet has been put down for the gcc front end coders to get D to compiler to code at least as fast as equiv C++.

however, performance of such a dedicated app is almost irrelivant, jview
(the MS java) performs about the same as dmd with no options, yet on a real
word app it out performs jdk 1.1.8 (or at least its GUI responce is better).
like dmd is out performs the Jdk on the single byte at a time hash, where
the code path is basically a long chain of calls, branches, and reads.
rather than the maths intense md code. and is IMHO a more important
benchmark as it reflects the types of ops that a real usefull app would be
doing.
the 1M hash speed is a test of how good the optimiser for maths code and
register allocaters are.
I've got a couple of other implementations and will have a play with seeing
if the changes to code make any changes to performance.


Mike.


"Walter" <walter@digitalmars.com> wrote in message news:b219t5$la2$1@digitaldaemon.com...
>
> "Mike Wynn" <mike.wynn@l8night.co.uk> wrote in message news:b210qd$fqe$1@digitaldaemon.com...
> > looks like we're running on similar hardware
> > time to find out what a C version can do.
>
> When comparing to C, use DMC++. The reason is that DMD and DMC++ share the optimizer and back end code generator, so you really are comparing the languages rather than the back ends.
>
>


February 08, 2003
In reality, you shouldn't see much difference in that code between C/C++/D/Java/C#. The reason is it is integer math intensive, and does not do much with objects, strings, etc. Those languages all treat integer math about the same.


February 08, 2003
If MD5 uses floating point math, you should specify -ff switch. It gives real speed gain.

Nic Tiger.

"Mike Wynn" <mike.wynn@l8night.co.uk> ÓÏÏÂÝÉÌ/ÓÏÏÂÝÉÌÁ × ÎÏ×ÏÓÔÑÈ ÓÌÅÄÕÀÝÅÅ: news:b214it$i5d$1@digitaldaemon.com...
> D is actually faster than dmc :)
>
> similar MD5 test (C version)
>
> compiled `dmc`
> MD5 speed test (C)
> time for 1073741824 bytes (1048576k [1024M])
>  is 27s (block:1048576 [1024k], count:1024)
> time for 1073741824 bytes (1048576k [1024M])
>  is 24s (block:1024 [1k], count:1048576)
> time for 1073741824 bytes (1048576k [1024M])
>  is 83s (block:1 [0k], count:1073741824)
>
> compiled with `dmc -o+speed`
> MD5 speed test (C)
> time for 1073741824 bytes (1048576k [1024M])
>  is 17s (block:1048576 [1024k], count:1024)
> time for 1073741824 bytes (1048576k [1024M])
>  is 14s (block:1024 [1k], count:1048576)
> time for 1073741824 bytes (1048576k [1024M])
>  is 60s (block:1 [0k], count:1073741824)
>
> gcc --version    -> 2.95.3-6  (mingw)
> MD5 speed test (C)
> time for 1073741824 bytes (1048576k [1024M])
>  is 30s (block:1048576 [1024k], count:1024)
> time for 1073741824 bytes (1048576k [1024M])
>  is 27s (block:1024 [1k], count:1048576)
> time for 1073741824 bytes (1048576k [1024M])
>  is 79s (block:1 [0k], count:1073741824)
>
> gcc -O3
> MD5 speed test (C)
> time for 1073741824 bytes (1048576k [1024M])
>  is 10s (block:1048576 [1024k], count:1024)
> time for 1073741824 bytes (1048576k [1024M])
>  is 9s (block:1024 [1k], count:1048576)
> time for 1073741824 bytes (1048576k [1024M])
>  is 52s (block:1 [0k], count:1073741824)
>
>
> compiled with VC++6 release
> MD5 speed test (C)
> time for 1073741824 bytes (1048576k [1024M])
>  is 10s (block:1048576 [1024k], count:1024)
> time for 1073741824 bytes (1048576k [1024M])
>  is 10s (block:1024 [1k], count:1048576)
> time for 1073741824 bytes (1048576k [1024M])
>  is 55s (block:1 [0k], count:1073741824)
>
> and I was interested to find gcc -O3 is the same or faster than VC++
>
> has anyone tried to port the linux D gcc front end to mingw (I've never managed to get gcc to build so I'm not going to even start to try) ?
>
>
>
> "Mike Wynn" <mike.wynn@l8night.co.uk> wrote in message news:b210qd$fqe$1@digitaldaemon.com...
> > well it's as good as C# using dmd -release
> > C# basic edition, not got the full optiomiser etc.
> > (I've only ported MD5 so far)
> > testing MD5 expect a 3 to 10 minute delay
> >
> > start =10786002
> >
> > 1M blocks - 35670ms (35s)
> > 1K blocks - 33165ms (33s)
> > 1G hashed in
> > 1M blocks - 35670ms (35s)
> > 1K blocks - 33165ms (33m)
> >
> > 100K hashed in
> > 1B blocks - 7130ms (7m)
> >
> > I never noticed -inline or -O  yes that helps, much better results
> > -release -inline
> > testing md5 expect a 3 to 10 minute delay
> > start =1044643391, tm=1044643391
> > 1M blocks - 20s (0m)
> > 1K blocks - 19s (0m)
> > 1G hashed in
> > 1M blocks - 20s (0m)
> > 1K blocks - 19s (0m)
> > 100K hashed in
> > 1B blocks - 8s (0m)
> > - -- - -- - -- - -- - -- - -- - -- - -- - -- -
> >
> > -release -inline -O
> >
> > testing md5 expect a 3 to 10 minute delay
> > start =1044644195, tm=1044644195
> > 1M blocks - 14s (0m)
> > 1K blocks - 14s (0m)
> > 1G hashed in
> > 1M blocks - 14s (0m)
> > 1K blocks - 14s (0m)
> > 100K hashed in
> > 1B blocks - 8s (0m)
> >
> > now that's a lot more acceptable, and much closer to what I expected.
> > (not tries just -O)
> > looks like we're running on similar hardware
> > time to find out what a C version can do.
> >
> > Mike.
> > - -- - -- - -- - -- - -- - -- - -- - -- - -- -
> > "Burton Radons" <loth@users.sourceforge.net> wrote in message
> > news:b1vs2a$2smt$1@digitaldaemon.com...
> > > Mike Wynn wrote:
> > > > I've been testing some crypto code, basic port of some Java crypto
to
> D
> > > > (the C versions are all macro'ed).
> > > >
> > > > and I got some disturbing results
> > >
> > > Uh, yeah.  You need to enable optimisations.  When I enabled -O the first test went from 36s to 21s; when I enabled -inline as well it
went
> > > from 21s to 14s.
> > >
> >
> >
>
>


February 09, 2003
"Mike Wynn" <mike.wynn@l8night.co.uk> wrote in message news:b1vlms$2o31$1@digitaldaemon.com...
> I've been testing some crypto code, basic port of some Java crypto to D (the C versions are all macro'ed).

Could you post/email the C version please, just so I'm using the same code? Thanks!