May 19, 2016
On Thursday, 19 May 2016 at 12:04:31 UTC, Andrei Alexandrescu wrote:
> On 5/19/16 4:12 AM, Jens Müller wrote:

> What test data did you use?

An instance for benchmarking is generated as follows. Given nnz which is the sum of non-zero indices in input vector a and b.

auto lengthA = uniform!"[]"(0, nnz, gen);
auto lengthB = nnz - lengthA;

auto a = iota(0, nnz).randomSample(lengthA, gen).map!(i => Pair(i, 10)).array();
auto b = iota(0, nnz).randomSample(lengthB, gen).map!(i => Pair(i, 10)).array();

So I take a random sample of (0, ..., nnz) for each input.
Any better idea? I've seen that people generate vectors with larger gaps.

> 10%-20% win on dot product is significant because for many algorithms dot product is a kernel and dominates everything else. For those any win goes straight to the bottom line.

Sure. Still I wasn't sure whether I got the idea from your talk. So maybe there is/was more.

>> The base line (dot1 in the graphs) is the straightforward version
>>
>> ---
>> size_t i,j = 0;
>> double sum = 0;
>> while (i < a.length && j < b.length)
>> {
>>      if (a[i].index < b[j].index) i++;
>>      else if (a[i].index > b[j].index) j++;
>>      else
>>      {
>>          assert(a[i].index == b[j].index);
>>          sum += a[i].value * b[j].value;
>>          i++;
>>          j++;
>>      }
>> }
>> return sum;
>> ---
>
> That does redundant checking. There's a better baseline:
>
> ---
> if (a.length == 0 || b.length == 0)
>     return 0;
> const amax = a.length - 1, bmax = b.length - 1;
> size_t i,j = 0;
> double sum = 0;
> for (;;)
> {
>     if (a[i].index < b[j].index) {
>         if (i++ == amax) break;
>     }
>     else if (a[i].index > b[j].index) {
>         bumpJ: if (j++ == bmax) break;
>     }
>     else
>     {
>         assert(a[i].index == b[j].index);
>         sum += a[i].value * b[j].value;
>         if (i++ == amax) break;
>         goto bumpJ;
>     }
> }
> return sum;
> ---

I check that.

> Then if you add the sentinel you only need the bounds tests in the third case.

I post the sentinel code later. Probably there is something to improve there as well.

>> BTW the effects vary greatly for different compilers.
>> For example with dmd the optimized version is slowest. The baseline is
>> best. Weird. With gdc the optimized is best and gdc's code is always
>> faster than dmd's code. With ldc it's really strange. Slower than dmd. I
>> assume I'm doing something wrong here.
>>
>> Used compiler flags
>> dmd v2.071.0
>> -wi -dw -g -O -inline -release -noboundscheck
>> gdc (crosstool-NG 203be35 - 20160205-2.066.1-e95a735b97) 5.2.0
>> -Wall  -g -O3 -fomit-frame-pointer -finline-functions -frelease
>> -fno-bounds-check -ffast-math
>> ldc (0.15.2-beta2) based on DMD v2.066.1 and LLVM 3.6.1
>> -wi -dw -g -O3 -enable-inlining -release -boundscheck=off
>>
>> Am I missing some flags?
>
> These look reasonable.

But ldc looks so bad.
Any comments from ldc users or developers? Because I see this in many other measurements as well. I would love to have another compiler producing efficient like gdc.
For example what's equivalent to gdc's -ffast-math in ldc.

>> I uploaded my plots.
>> - running time https://www.scribd.com/doc/312951947/Running-Time
>> - speed up https://www.scribd.com/doc/312951964/Speedup
>
> What is dot2? Could you please redo the experiments with the modified code as well?

dot2 is an optimization for jumping over gaps more quickly replacing the first two if statements with while statements.
But my benchmark tests have no large gaps but interestingly it does make things worse.

Jens
May 19, 2016
On Thursday, 19 May 2016 at 12:04:31 UTC, Andrei Alexandrescu wrote:
> On 5/19/16 4:12 AM, Jens Müller wrote:
>
> ---
> if (a.length == 0 || b.length == 0)
>     return 0;
> const amax = a.length - 1, bmax = b.length - 1;
> size_t i,j = 0;
> double sum = 0;
> for (;;)
> {
>     if (a[i].index < b[j].index) {
>         if (i++ == amax) break;
>     }
>     else if (a[i].index > b[j].index) {
>         bumpJ: if (j++ == bmax) break;
>     }
>     else
>     {
>         assert(a[i].index == b[j].index);
>         sum += a[i].value * b[j].value;
>         if (i++ == amax) break;
>         goto bumpJ;
>     }
> }
> return sum;
> ---
>
> Then if you add the sentinel you only need the bounds tests in the third case.

I'm not seeing it. Let me explain.
Consider the input a = [1] and b = [2, 3] (I only write the indices). The smallest back index is 1, i.e., a.back is the chosen sentinel. Now I assume that we set b.back to a.back restoring it after the loop. Now in the case a[i].index < b[j].index I have to check whether a[i].index == a.back.index to break because otherwise i is incremented (since a[i].index = 1 and b[j].index = 2, for i = 0 and j = 0 respectively). In the last case I only check a[i].index == a.back.index, since this implies b[j].index == a.back.index.
So in sum I have two bounds tests. But I think this is not what you are thinking of.
This does not look right.
Here are the plots for the implementations.
https://www.scribd.com/doc/313204510/Running-Time
https://www.scribd.com/doc/313204526/Speedup

dot1 is my baseline, which is indeed worse than your baseline (dot2). But only on gdc. I choose dot2 as the baseline for computing the speedup. dot3 is the sentinel version.

I removed the code to optimize for large gaps. Because it is only confusing. I may generate some benchmark data with larger gaps later to see whether it is worthwhile for such data.
It looks much more regular now (ldc is still strange).

Jens
May 19, 2016
On 05/19/2016 05:36 PM, Jens Müller wrote:
> I'm not seeing it. Let me explain.
> Consider the input a = [1] and b = [2, 3] (I only write the indices).
> The smallest back index is 1, i.e., a.back is the chosen sentinel.

Nonono, you stamp the largest index over the smaller index. So you overwrite a = [3] and you leave b = [2, 3] as it is.

Now you know that you're multiplying two correct sparse vectors in which _definitely_ the last elements have equal indexes. So the inner loop is:

if (a[i].idx < b[j].idx) {
  i++; // no check needed
} else if (a[i].idx > b[j].idx) {
  j++; // no check needed
} else {
  // check needed
  r += a[i].val * b[j].val;
  if (i == imax || j == jmax) break;
  ++i;
  ++j;
}

At the end you need a fixup to make sure you account for the last index that you overwrote (which of course you need to save first).

Makes sense?


Andrei
May 19, 2016
On 05/19/2016 05:36 PM, Jens Müller wrote:
> I removed the code to optimize for large gaps. Because it is only
> confusing. I may generate some benchmark data with larger gaps later to
> see whether it is worthwhile for such data.

For skipping large gaps quickly, check galloping search (google for it, we also have it in phobos). -- Andrei
May 19, 2016
On Thursday, 19 May 2016 at 22:02:53 UTC, Andrei Alexandrescu wrote:
> On 05/19/2016 05:36 PM, Jens Müller wrote:
>> I'm not seeing it. Let me explain.
>> Consider the input a = [1] and b = [2, 3] (I only write the indices).
>> The smallest back index is 1, i.e., a.back is the chosen sentinel.
>
> Nonono, you stamp the largest index over the smaller index. So you overwrite a = [3] and you leave b = [2, 3] as it is.
>
> Now you know that you're multiplying two correct sparse vectors in which _definitely_ the last elements have equal indexes. So the inner loop is:
>
> if (a[i].idx < b[j].idx) {
>   i++; // no check needed
> } else if (a[i].idx > b[j].idx) {
>   j++; // no check needed
> } else {
>   // check needed
>   r += a[i].val * b[j].val;
>   if (i == imax || j == jmax) break;
>   ++i;
>   ++j;
> }
>
> At the end you need a fixup to make sure you account for the last index that you overwrote (which of course you need to save first).
>
> Makes sense?

What if you stomped over an index in a that has as an equal index in b (it could be anywhere in b). After the loop finishes you restore the index in a. But how do you address the values for the stomped over index if needed?
For instance test it on
a = [2]
b = [2,3]
Note the 2 in b could be anywhere.

I think you can check for
if (a[i].idx == sentinelIdx) break;
instead of
if (i == imax || j == jmax) break;

Jens
May 19, 2016
On Thursday, 19 May 2016 at 22:04:56 UTC, Andrei Alexandrescu wrote:
> On 05/19/2016 05:36 PM, Jens Müller wrote:
>> I removed the code to optimize for large gaps. Because it is only
>> confusing. I may generate some benchmark data with larger gaps later to
>> see whether it is worthwhile for such data.
>
> For skipping large gaps quickly, check galloping search (google for it, we also have it in phobos). -- Andrei

Sure. I've already seen this. It's nice. But you have to include it in the sparse dot product (or list intersection) algorithm somehow. Then you require random access and galloping is only beneficial if the gaps are large. As a library writer this is a difficult position because this turns easily into over engineering. Optimally one just exposes the primitives and the user plugs them together. Ideally without having to many knobs per algorithm.

Jens
May 20, 2016
On 19 May 2016 at 22:10, Andrei Alexandrescu via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> wrote:
> On 5/18/16 7:42 AM, Manu via Digitalmars-d-announce wrote:
>>
>> On 16 May 2016 at 23:46, Andrei Alexandrescu via Digitalmars-d-announce <digitalmars-d-announce@puremagic.com> wrote:
>>>
>>> Uses D for examples, showcases Design by Introspection, and rediscovers a fast partition routine. It was quite well received. https://www.youtube.com/watch?v=AxnotgLql0k
>>>
>>> Andrei
>>
>>
>> This isn't the one you said you were going to "destroy concepts" is it? At dconf, you mentioned a talk for release on some date I can't remember, and I got the impression you were going to show how C++'s concepts proposal was broken, and ideally, propose how we can nail it in D?
>
>
> That's the one - sorry to disappoint :o). -- Andrei

Ah. Okay, well while this is a very interesting talk, I was indeed hoping you were going to make a D concepts proposal... do you have such a thing in mind, or are you against concepts in D?
May 20, 2016
On 5/19/2016 11:50 PM, Manu via Digitalmars-d-announce wrote:
> Ah. Okay, well while this is a very interesting talk, I was indeed
> hoping you were going to make a D concepts proposal... do you have
> such a thing in mind, or are you against concepts in D?

D has constraints. No point in adding concepts.

May 20, 2016
On Fri, May 20, 2016 at 01:26:31AM -0700, Walter Bright via Digitalmars-d-announce wrote:
> On 5/19/2016 11:50 PM, Manu via Digitalmars-d-announce wrote:
> > Ah. Okay, well while this is a very interesting talk, I was indeed hoping you were going to make a D concepts proposal... do you have such a thing in mind, or are you against concepts in D?
> 
> D has constraints.  No point in adding concepts.

Constraints have been a headache when it comes to generating
user-friendly diagnostics. Not to mention inconsistency in what exactly
is being tested for: if you want to check if something is an input
range, do you use is(typeof(R.empty)), etc., or should you use
__traits(compiles, R.init.empty), or is it is(typeof((R r){r.empty;})),
or any of the 15 or so slightly different ways of testing for the
existence and type of some aggregate member, all subtly different from
each other? Subtly different as in, for instance, testing for
is(typeof((){R r; bool x = r.empty;})) is different from is(typeof(R
r){bool x = r.empty;}), because the former doesn't work with R that has
parameters closing over a local scope, whereas the latter does.

And none of this has even begun addressing the issue of human-readable diagnostics.  Whereas if D had concepts, it would have been a simple matter of defining the prototypical range with struct-like syntax and calling it a day.

(I still think sig constraints are awesome, though. Just not to the point of replacing concepts. I don't see them as being mutually exclusive. In fact, they would complement each other quite nicely.)


T

-- 
Not all rumours are as misleading as this one.
May 20, 2016
On 05/20/2016 09:47 AM, H. S. Teoh via Digitalmars-d-announce wrote:
> Constraints have been a headache when it comes to generating
> user-friendly diagnostics.

This is a matter that can be addressed tactically. Every time we discuss this, a couple of good ideas come up, they don't get worked on, then they get forgotten, and a couple months later the discussion is reopened anew.

This is inefficient. Who would want to sit down and work on this?


Andrei