August 09, 2008
bearophile wrote:
> First benchmark, just D against itself, not used GCC yet, the results
> show that vector ops are generally slower, but maybe there's some
> bug/problem in my benchmark (note it needs just Phobos!), not tested
> on Linux yet:
[...]
> a3[] = a1[] / a2[];

I wouldn't be a bit surprised at that since / for int[]s does not have a custom asm routine for it. See phobos/internal/arrayint.d

If someone wants to write one, I'll put it in!
August 09, 2008
bearophile wrote:
> This output looks like a bug of the compiler anyway: [1,2,3,0,0,0,0]

Please post all bugs to bugzilla! thanks
August 09, 2008
Walter Bright:
> I wouldn't be a bit surprised at that since / for int[]s does not have a custom asm routine for it.

I didn't know it. We may write a list about such things.
But as you can see I have performed benchmarks with + * / not just /.

It's very easy to write wrong benchmarks, so I am careful, but from the little I have seen so far the speed improvements are absent or less than 1 (slow down). And I haven't seen yet SS2 asm in my compiled programs :-)


>>Is it able to compute a+b+c with a single loop (as all Fortran compilers do)?<<

>Yes.<

But later on Reddit the answer by Walter was:

>This optimization is called "loop fusion", and is well known. It doesn't always result in a speedup, though. The dmd compiler doesn't do it, but that is not the fault of D.<

At a closer look the two questions are different, I think he meant:
a += b + c; => single loop
a += b; a += c; => two loops
I think this is acceptable.

Bye,
bearophile
August 09, 2008
Walter Bright wrote:

> Lars Ivar Igesund wrote:
>> Jarrett Billingsley wrote:
>> 
>>> "Lars Ivar Igesund" <larsivar@igesund.net> wrote in message news:g7ias2$2kbo$1@digitalmars.com...
>>>> Walter Bright wrote:
>>>>
>>>>> This one has (finally) got array operations implemented. For those who want to show off their leet assembler skills, the initial assembler implementation code is in phobos/internal/array*.d. Burton Radons wrote the assembler. Can you make it faster?
>>>>>
>>>>> http://www.digitalmars.com/d/1.0/changelog.html http://ftp.digitalmars.com/dmd.1.034.zip
>>>> The array op docs aren't actually on the 1.0 array page. But great! I remember trying to use these 3 years ago :D
>>> Too bad Tango doesn't support them yet.  :C
>> 
>> Are you suggesting that Walter should have told us that he was implementing this feature ahead of releasing 1.034?
> 
> All Tango needs to do is copy the internal\array*.d files over and add them to the makefile.

I know :) No malice intended.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
August 09, 2008
== Quote from Walter Bright (newshound1@digitalmars.com)'s article
> Lars Ivar Igesund wrote:
> > Jarrett Billingsley wrote:
> >
> >> "Lars Ivar Igesund" <larsivar@igesund.net> wrote in message news:g7ias2$2kbo$1@digitalmars.com...
> >>> Walter Bright wrote:
> >>>
> >>>> This one has (finally) got array operations implemented. For those who want to show off their leet assembler skills, the initial assembler implementation code is in phobos/internal/array*.d. Burton Radons wrote the assembler. Can you make it faster?
> >>>>
> >>>> http://www.digitalmars.com/d/1.0/changelog.html http://ftp.digitalmars.com/dmd.1.034.zip
> >>> The array op docs aren't actually on the 1.0 array page. But great! I remember trying to use these 3 years ago :D
> >> Too bad Tango doesn't support them yet.  :C
> >
> > Are you suggesting that Walter should have told us that he was implementing this feature ahead of releasing 1.034?
> All Tango needs to do is copy the internal\array*.d files over and add them to the makefile.

I took care of this when 1.033 was released, since the files were first included then.  There are likely updates in 1.034, but nothing a few minutes with my merge tool can't handle.  Sadly, that particular merge tool is a bit broken at the moment, but I'll see about taking care of this anyway.


Sean
August 10, 2008
Very exciting stuff!  Keep up the good work.

Currently it only optimizes int and float.  I assume you could get it working for double pretty easily as well.  Is it extensible to user defined types like a Vector3 class?

-Craig 

August 10, 2008
Craig Black:
> Currently it only optimizes int and float.

Currently it optimizes very little, I think. I have posted C and D benchmarks:

http://codepad.org/BlwSIBKl

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=12718

Bye,
bearophile
August 10, 2008
bearophile wrote:
> It's very easy to write wrong benchmarks, so I am careful, but from
> the little I have seen so far the speed improvements are absent or
> less than 1 (slow down).

If this happens, then it's worth verifying that the asm code is actually being run by inserting a printf in it.

> And I haven't seen yet SS2 asm in my compiled programs :-)

The dmd compiler doesn't generate SS2 instructions. But the routines in internal\array*.d do.
August 10, 2008
bearophile wrote:
> D code with +:

I found the results to be heavily dependent on the data set size:

C:\mars>test5 1000 10000
array len= 8000  nloops= 10000
    vec time= 0.0926506 s
non-vec time= 0.626356 s

C:\mars>test5 2000 10000
array len= 16000  nloops= 10000
    vec time= 0.279727 s
non-vec time= 1.70048 s

C:\mars>test5 3000 10000
array len= 24000  nloops= 10000
    vec time= 0.795482 s
non-vec time= 2.47597 s

C:\mars>test5 4000 10000
array len= 32000  nloops= 10000
    vec time= 2.36905 s
non-vec time= 3.90906 s

C:\mars>test5 5000 10000
array len= 40000  nloops= 10000
    vec time= 3.12636 s
non-vec time= 3.70741 s

For smaller sets, it's a 2x speedup, for larger ones only a few percent.

What we're seeing here is most likely the effects of the data set size exceeding the cache. It would be a fun project for someone to see if somehow the performance for such large data sets could be improved, perhaps by "warming" up the cache?
August 10, 2008
Walter Bright:
> If this happens, then it's worth verifying that the asm code is actually being run by inserting a printf in it.

I presume I'll have to recompile Phobos for that.


> > And I haven't seen yet SS2 asm in my compiled programs :-)
> The dmd compiler doesn't generate SS2 instructions. But the routines in internal\array*.d do.

I know. I was talking about the parts of the code that for example adds the arrays; according to the phobos source code they use SSE2 but in the final source code produces they are absent.

Bye,
bearophile