DMD 1.034 and 2.018 releases (page 4)

bearophile wrote: > First benchmark, just D against itself, not used GCC yet, the results > show that vector ops are generally slower, but maybe there's some > bug/problem in my benchmark (note it needs just Phobos!), not tested > on Linux yet: [...] > a3[] = a1[] / a2[]; I wouldn't be a bit surprised at that since / for int[]s does not have a custom asm routine for it. See phobos/internal/arrayint.d If someone wants to write one, I'll put it in!

Walter Bright: > I wouldn't be a bit surprised at that since / for int[]s does not have a custom asm routine for it. I didn't know it. We may write a list about such things. But as you can see I have performed benchmarks with + * / not just /. It's very easy to write wrong benchmarks, so I am careful, but from the little I have seen so far the speed improvements are absent or less than 1 (slow down). And I haven't seen yet SS2 asm in my compiled programs :-) >>Is it able to compute a+b+c with a single loop (as all Fortran compilers do)?<< >Yes.< But later on Reddit the answer by Walter was: >This optimization is called "loop fusion", and is well known. It doesn't always result in a speedup, though. The dmd compiler doesn't do it, but that is not the fault of D.< At a closer look the two questions are different, I think he meant: a += b + c; => single loop a += b; a += c; => two loops I think this is acceptable. Bye, bearophile

Walter Bright wrote: > Lars Ivar Igesund wrote: >> Jarrett Billingsley wrote: >> >>> "Lars Ivar Igesund" <larsivar@igesund.net> wrote in message news:g7ias2$2kbo$1@digitalmars.com... >>>> Walter Bright wrote: >>>> >>>>> This one has (finally) got array operations implemented. For those who want to show off their leet assembler skills, the initial assembler implementation code is in phobos/internal/array*.d. Burton Radons wrote the assembler. Can you make it faster? >>>>> >>>>> http://www.digitalmars.com/d/1.0/changelog.html http://ftp.digitalmars.com/dmd.1.034.zip >>>> The array op docs aren't actually on the 1.0 array page. But great! I remember trying to use these 3 years ago :D >>> Too bad Tango doesn't support them yet. :C >> >> Are you suggesting that Walter should have told us that he was implementing this feature ahead of releasing 1.034? > > All Tango needs to do is copy the internal\array*.d files over and add them to the makefile. I know :) No malice intended. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango

August 09, 2008

Re: DMD 1.034 and 2.018 releases

Posted by Sean Kelly
in reply to Walter Bright

Permalink

Sean Kelly

Posted in reply to Walter Bright

Permalink

== Quote from Walter Bright (newshound1@digitalmars.com)'s article
> Lars Ivar Igesund wrote:
> > Jarrett Billingsley wrote:
> >
> >> "Lars Ivar Igesund" <larsivar@igesund.net> wrote in message news:g7ias2$2kbo$1@digitalmars.com...
> >>> Walter Bright wrote:
> >>>
> >>>> This one has (finally) got array operations implemented. For those who want to show off their leet assembler skills, the initial assembler implementation code is in phobos/internal/array*.d. Burton Radons wrote the assembler. Can you make it faster?
> >>>>
> >>>> http://www.digitalmars.com/d/1.0/changelog.html http://ftp.digitalmars.com/dmd.1.034.zip
> >>> The array op docs aren't actually on the 1.0 array page. But great! I remember trying to use these 3 years ago :D
> >> Too bad Tango doesn't support them yet.  :C
> >
> > Are you suggesting that Walter should have told us that he was implementing this feature ahead of releasing 1.034?
> All Tango needs to do is copy the internal\array*.d files over and add them to the makefile.

I took care of this when 1.033 was released, since the files were first included then.  There are likely updates in 1.034, but nothing a few minutes with my merge tool can't handle.  Sadly, that particular merge tool is a bit broken at the moment, but I'll see about taking care of this anyway.


Sean

Very exciting stuff! Keep up the good work. Currently it only optimizes int and float. I assume you could get it working for double pretty easily as well. Is it extensible to user defined types like a Vector3 class? -Craig

Craig Black: > Currently it only optimizes int and float. Currently it optimizes very little, I think. I have posted C and D benchmarks: http://codepad.org/BlwSIBKl http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=12718 Bye, bearophile

bearophile wrote: > It's very easy to write wrong benchmarks, so I am careful, but from > the little I have seen so far the speed improvements are absent or > less than 1 (slow down). If this happens, then it's worth verifying that the asm code is actually being run by inserting a printf in it. > And I haven't seen yet SS2 asm in my compiled programs :-) The dmd compiler doesn't generate SS2 instructions. But the routines in internal\array*.d do.

bearophile wrote: > D code with +: I found the results to be heavily dependent on the data set size: C:\mars>test5 1000 10000 array len= 8000 nloops= 10000 vec time= 0.0926506 s non-vec time= 0.626356 s C:\mars>test5 2000 10000 array len= 16000 nloops= 10000 vec time= 0.279727 s non-vec time= 1.70048 s C:\mars>test5 3000 10000 array len= 24000 nloops= 10000 vec time= 0.795482 s non-vec time= 2.47597 s C:\mars>test5 4000 10000 array len= 32000 nloops= 10000 vec time= 2.36905 s non-vec time= 3.90906 s C:\mars>test5 5000 10000 array len= 40000 nloops= 10000 vec time= 3.12636 s non-vec time= 3.70741 s For smaller sets, it's a 2x speedup, for larger ones only a few percent. What we're seeing here is most likely the effects of the data set size exceeding the cache. It would be a fun project for someone to see if somehow the performance for such large data sets could be improved, perhaps by "warming" up the cache?

Walter Bright: > If this happens, then it's worth verifying that the asm code is actually being run by inserting a printf in it. I presume I'll have to recompile Phobos for that. > > And I haven't seen yet SS2 asm in my compiled programs :-) > The dmd compiler doesn't generate SS2 instructions. But the routines in internal\array*.d do. I know. I was talking about the parts of the code that for example adds the arrays; according to the phobos source code they use SSE2 but in the final source code produces they are absent. Bye, bearophile

Forums