Jump to page: 1 2 3
Thread overview
Dynamic array comparison optimization: check for equal pointers?
Jun 02, 2020
Johan
Jun 02, 2020
Stefan Koch
Jun 02, 2020
Stanislav Blinov
Jun 02, 2020
Johan
Jun 02, 2020
Stanislav Blinov
Jun 02, 2020
Stanislav Blinov
Jun 03, 2020
Stanislav Blinov
Jun 03, 2020
Stanislav Blinov
Jun 02, 2020
Jonathan M Davis
Jun 02, 2020
Johan
Jun 02, 2020
NaN
Jun 02, 2020
Stanislav Blinov
Jun 03, 2020
Patrick Schluter
Jun 03, 2020
Patrick Schluter
Jun 03, 2020
Johan
Jun 03, 2020
Johan
June 02, 2020
Hi all,
  I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal.

I was expecting something like:
```
  if ( sizes are not equal )
     return false;
  if ( pointers are equal  )
     return true;
  return memcmp;
```

I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?

cheers,
  Johan


June 02, 2020
On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:
> Hi all,
>   I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal.
>
> I was expecting something like:
> ```
>   if ( sizes are not equal )
>      return false;
>   if ( pointers are equal  )
>      return true;
>   return memcmp;
> ```
>
> I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?
>
> cheers,
>   Johan

I would have expected this optimization to be present as well ...

It seems like a no-brainer.
June 02, 2020
On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:

> I was expecting something like:
> ```
>   if ( sizes are not equal )
>      return false;
>   if ( pointers are equal  )
>      return true;
>   return memcmp;
> ```

That's, worst case, two branches and a scan, and best case one branch which may or may not be slower than the actual scan.

> I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?

There can't be a definitive answer for this, it's dependent on architecture as well as size of data. As far as array comparison goes, if a compiler (talking about ldc/gdc here) can infer pointer/length information, it'll be free to throw away the scan. In all other cases, I'd rather see the comparison function, well, do the comparison, and leave branches up to the user. It's much better than to inflict the branches on everyone unconditionally (pun intended): if your data shows that you can benefit from the short-circuit, you can always do just that. Short of rethinking the design, of course ;) I mean, I can't imagine a scenario where a program would have to routinely compare identical memory ranges. Doesn't mean there aren't such scenarios though.

June 02, 2020
On 6/2/20 3:38 PM, Johan wrote:
> Hi all,
>    I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal.
> 
> I was expecting something like:
> ```
>    if ( sizes are not equal )
>       return false;
>    if ( pointers are equal  )
>       return true;
>    return memcmp;
> ```
> 
> I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?
> 

I thought I remembered this being true, but maybe it was for class instances? Looking back in history, it seems prior to 2017, it was done by the compiler generating the code instead of a template, but seems like it did not include the pointer check.

-Steve
June 02, 2020
On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:
> Hi all,
>   I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal.
>
> I was expecting something like:
> ```
>   if ( sizes are not equal )
>      return false;
>   if ( pointers are equal  )
>      return true;
>   return memcmp;
> ```
>
> I am surprised that memcmp itself does not implement the pointers are equal check.
I also would expect this kind of optimization in memcmp and not in the functions using it. Anything at higher level should be unaware of pointers wherever possible.
And how do you know this optimization is NOT in memcmp? It's extern C and the source may be not available - have you examined it? There are many implementations of memcmp available - maybe on you're platform the cost for this extra check is too high? (especially given the very low probability that it is ever called with src==dst).
June 02, 2020
On Tuesday, 2 June 2020 at 20:54:39 UTC, Stanislav Blinov wrote:
> On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:
>
>> I was expecting something like:
>> ```
>>   if ( sizes are not equal )
>>      return false;
>>   if ( pointers are equal  )
>>      return true;
>>   return memcmp;
>> ```
>
> That's, worst case, two branches and a scan, and best case one branch which may or may not be slower than the actual scan.

The "actual scan" is a function call, which will be super slow (note that in druntime's implementation it cannot be optimized to a tail call) compared to the comparison and early return.

>> I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?
>
> There can't be a definitive answer for this, it's dependent on architecture as well as size of data. As far as array comparison goes, if a compiler (talking about ldc/gdc here) can infer pointer/length information, it'll be free to throw away the scan. In all other cases, I'd rather see the comparison function, well, do the comparison, and leave branches up to the user. It's much better than to inflict the branches on everyone unconditionally (pun intended): if your data shows that you can benefit from the short-circuit, you can always do just that. Short of rethinking the design, of course ;) I mean, I can't imagine a scenario where a program would have to routinely compare identical memory ranges. Doesn't mean there aren't such scenarios though.

You are assuming that an if-statement costs significant time, but that doesn't have to be the case. The pointer values already must be read from memory (otherwise you cannot do the memory scan), and with speculative execution the comparison+branch only costs very little (codegen could be coaxed to make if(false) be the default execution path) and potentially saves many many memory reads.

Could someone take the effort and check the performance difference (for example the compiler itself) with this change?

Thanks!
 Johan

June 02, 2020
On Tuesday, 2 June 2020 at 21:27:44 UTC, Dominikus Dittes Scherkl wrote:
> On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:
>> Hi all,
>>   I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal.
>>
>> I was expecting something like:
>> ```
>>   if ( sizes are not equal )
>>      return false;
>>   if ( pointers are equal  )
>>      return true;
>>   return memcmp;
>> ```
>>
>> I am surprised that memcmp itself does not implement the pointers are equal check.
> I also would expect this kind of optimization in memcmp and not in the functions using it. Anything at higher level should be unaware of pointers wherever possible.

Pointers are nothing to be scared of.

> And how do you know this optimization is NOT in memcmp? It's extern C and the source may be not available - have you examined it?

That's what I wrote, yes.

> There are many implementations of memcmp available - maybe on you're platform the cost for this extra check is too high? (especially given the very low probability that it is ever called with src==dst).

glibc is a very commonly used library for memcmp. It does many other checks before starting the comparison (e.g. optimizing for pointer alignment), so the cost of the extra check is perhaps not even measurable.

-Johan


June 02, 2020
On Tuesday, 2 June 2020 at 21:27:44 UTC, Dominikus Dittes Scherkl wrote:

> I also would expect this kind of optimization in memcmp and not in the functions using it. Anything at higher level should be unaware of pointers wherever possible.

How do you pass pointers to memcmp if you're not aware of them? ;)
June 02, 2020
On Tuesday, 2 June 2020 at 22:01:20 UTC, Johan wrote:
> On Tuesday, 2 June 2020 at 20:54:39 UTC, Stanislav Blinov wrote:

>> That's, worst case, two branches and a scan, and best case one branch which may or may not be slower than the actual scan.
>
> The "actual scan" is a function call, which will be super slow (note that in druntime's implementation it cannot be optimized to a tail call) compared to the comparison and early return.

LDC (and I'm assuming GDC as well) is well aware of memcmp, it should not be a function call. At the very least in -Ox builds.

> You are assuming that an if-statement costs significant time, but that doesn't have to be the case. The pointer values already must be read from memory (otherwise you cannot do the memory scan), and with speculative execution the comparison+branch only costs very little (codegen could be coaxed to make if(false) be the default execution path) and potentially saves many many memory reads.

I'm not assuming anything. The branch(es) may, or may not, be more costly than just comparing some words. Depending on architecture and the amount of data to compare. That's literally what I wrote :) May, or may not - implementation has no business assuming either way. That's why such a check absolutely should *not* be there. It should be where it belongs - in user code, backed up by profiling as necessary.
Including it there may be beneficial for you specifically, but detrimental for everyone else.
June 02, 2020
On Tuesday, 2 June 2020 at 22:05:06 UTC, Johan wrote:
> On Tuesday, 2 June 2020 at 21:27:44 UTC, Dominikus Dittes Scherkl wrote:
>
> glibc is a very commonly used library for memcmp. It does many other checks before starting the comparison (e.g. optimizing for pointer alignment), so the cost of the extra check is perhaps not even measurable.

Getting two pointers to the same memory location may be such a rare occurrence that it's not worth having the check for it. IE, Adding a tiny overhead on 1000 calls can still cost more than a very large overhead on 1 call.
« First   ‹ Prev
1 2 3