June 02, 2020
On 6/2/20 6:20 PM, Stanislav Blinov wrote:
> On Tuesday, 2 June 2020 at 22:01:20 UTC, Johan wrote:
>> On Tuesday, 2 June 2020 at 20:54:39 UTC, Stanislav Blinov wrote:
> 
>>> That's, worst case, two branches and a scan, and best case one branch which may or may not be slower than the actual scan.
>>
>> The "actual scan" is a function call, which will be super slow (note that in druntime's implementation it cannot be optimized to a tail call) compared to the comparison and early return.
> 
> LDC (and I'm assuming GDC as well) is well aware of memcmp, it should not be a function call. At the very least in -Ox builds.

Let's assume LDC knows how to optimize this. How do you think it would do so? I'd imagine very similarly to how Johan wrote it.

What makes you think LDC can't optimize out the check if there's a better way? Then the check is there in case the optimizer you are dealing with is going to call an opaque memcmp in all cases.

-Steve
June 02, 2020
On Tuesday, June 2, 2020 3:18:40 PM MDT Steven Schveighoffer via Digitalmars-d wrote:
> On 6/2/20 3:38 PM, Johan wrote:
> > Hi all,
> >
> >    I thought that we optimized dynamic array comparison with a check to
> >
> > see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal.
> >
> > I was expecting something like:
> > ```
> >
> >    if ( sizes are not equal )
> >       return false;
> >    if ( pointers are equal  )
> >       return true;
> >    return memcmp;
> >
> > ```
> >
> > I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?
>
> I thought I remembered this being true, but maybe it was for class instances? Looking back in history, it seems prior to 2017, it was done by the compiler generating the code instead of a template, but seems like it did not include the pointer check.

The free function opEquals for classes does check the references first and has for years (as well doing checks like whether either reference is null). I've never looked at what the array implementation does though.

- Jonathna M Davis



June 02, 2020
On 6/2/20 6:48 PM, Jonathan M Davis wrote:
> On Tuesday, June 2, 2020 3:18:40 PM MDT Steven Schveighoffer via Digitalmars-d
> wrote:
>> I thought I remembered this being true, but maybe it was for class
>> instances? Looking back in history, it seems prior to 2017, it was done
>> by the compiler generating the code instead of a template, but seems
>> like it did not include the pointer check.
> 
> The free function opEquals for classes does check the references first and
> has for years (as well doing checks like whether either reference is null).
> I've never looked at what the array implementation does though.

Sorry, to be clear I meant the array implementation prior to 2017 implemented opEquals by generating code with the compiler. After that, it used the linked template.

-Steve
June 02, 2020
On Tuesday, 2 June 2020 at 22:44:24 UTC, Steven Schveighoffer wrote:

> Let's assume LDC knows how to optimize this. How do you think it would do so? I'd imagine very similarly to how Johan wrote it.
>
> What makes you think LDC can't optimize out the check if there's a better way? Then the check is there in case the optimizer you are dealing with is going to call an opaque memcmp in all cases.

That actually is irrelevant. Whether it gets inlined, whether even it's loop is unrolled by the compiler - that is all beside the point. If you hardcode the check in the library, that check is there for everyone using that library. On all hardware and for all data sizes. Irrespective of whether it can, on a given platform with a given array size, be actually faster to just do the call/loop/etc. than to do a check first. Irrespective of what a given compiler is capable of. There cannot be a universal optimization here.

Users can always write a compare_arrays_ptrcheck if so required, but they cannot "unwrite" druntime. Same goes for actual memcmp. So let them do the minimal work required, and let users make higher-level decisions at higher level.

If a program is attempting to "compare" identical memory ranges so often as to warrant a test - it's up to that program to deal with it, not the whole community.
June 02, 2020
On 6/2/20 7:35 PM, Stanislav Blinov wrote:
> On Tuesday, 2 June 2020 at 22:44:24 UTC, Steven Schveighoffer wrote:
> 
>> Let's assume LDC knows how to optimize this. How do you think it would do so? I'd imagine very similarly to how Johan wrote it.
>>
>> What makes you think LDC can't optimize out the check if there's a better way? Then the check is there in case the optimizer you are dealing with is going to call an opaque memcmp in all cases.
> 
> That actually is irrelevant. Whether it gets inlined, whether even it's loop is unrolled by the compiler - that is all beside the point. If you hardcode the check in the library, that check is there for everyone using that library. On all hardware and for all data sizes. Irrespective of whether it can, on a given platform with a given array size, be actually faster to just do the call/loop/etc. than to do a check first. Irrespective of what a given compiler is capable of. There cannot be a universal optimization here.
> 
> Users can always write a compare_arrays_ptrcheck if so required, but they cannot "unwrite" druntime.

I'm saying the compiler can unwrite druntime, if it knows a better way.

Happens all the time with LDC, it will remove loops that it knows don't do anything.

And the upside is that compilers that don't do the "smarter" thing will not needlessly call memcmp.

-Steve
June 03, 2020
On Wednesday, 3 June 2020 at 00:04:43 UTC, Steven Schveighoffer wrote:

>> Users can always write a compare_arrays_ptrcheck if so required, but they cannot "unwrite" druntime.
>
> I'm saying the compiler can unwrite druntime, if it knows a better way.
>
> Happens all the time with LDC, it will remove loops that it knows don't do anything.
>
> And the upside is that compilers that don't do the "smarter" thing will not needlessly call memcmp.

...Yes? And that will fall under a hypothetical case when the check just gets removed (or heck, inserted if it's faster). Leaving it in for all other cases when the compiler *can't* remove it. Make the best case better but the worst case worse? No thanks.

For the third time, and let me be a little more explicit this time - if a program gives memcmp identical pointers so much as to warrant a test (which is found out by measuring) - the program is doing something wrong. *It* should do a test and not bother with memcmp at all. Or its logic needs to change so as to not have this problem in the first place. Same translates onto druntime's arrays.

There's no definitive answer to Johan's original question. It depends. On hardware. On data. Therefore, that check shouldn't be there, it should be delegated to someone capable of making such decision - to the user.
June 02, 2020
On 6/2/20 8:35 PM, Stanislav Blinov wrote:
> On Wednesday, 3 June 2020 at 00:04:43 UTC, Steven Schveighoffer wrote:
> 
>>> Users can always write a compare_arrays_ptrcheck if so required, but they cannot "unwrite" druntime.
>>
>> I'm saying the compiler can unwrite druntime, if it knows a better way.
>>
>> Happens all the time with LDC, it will remove loops that it knows don't do anything.
>>
>> And the upside is that compilers that don't do the "smarter" thing will not needlessly call memcmp.
> 
> ....Yes? And that will fall under a hypothetical case when the check just gets removed (or heck, inserted if it's faster). Leaving it in for all other cases when the compiler *can't* remove it. Make the best case better but the worst case worse? No thanks.

The worst case is opaque memcmp is called in all cases. No thanks. I'll take a few extra cycles over that any day.

-Steve
June 03, 2020
On Wednesday, 3 June 2020 at 01:10:55 UTC, Steven Schveighoffer wrote:

>> ....Yes? And that will fall under a hypothetical case when the check just gets removed (or heck, inserted if it's faster). Leaving it in for all other cases when the compiler *can't* remove it. Make the best case better but the worst case worse? No thanks.
>
> The worst case is opaque memcmp is called in all cases. No thanks. I'll take a few extra cycles over that any day.

With the check in place, the worst case would be a useless comparison plus the memcmp, or a failed branch on the off chance that you do try to compare identical ranges. Or a useless comparison and a failed branch + memcmp. Depends on your program. Whether the effect would be significant or immeasurable - also depends on your program, and your data, and your hardware.

"I'm not sure what my program is doing, so let's make everyone else pay for it". That's what including that test, in memcmp or in druntime, effectively would be.
June 03, 2020
On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:
> Hi all,
>   I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal.
>
> I was expecting something like:
> ```
>   if ( sizes are not equal )
>      return false;
>   if ( pointers are equal  )
>      return true;
>   return memcmp;
> ```
>
> I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?

memcmp() doesn't implement the check to avoid memcmp(NULL, NULL, length) returning with 0 instead of segfaulting as it does if any of the 2 pointers in NULL.
It's a case of consistant behaviour, either you tolerate NULL pointers in all cases or you segfault in all cases.

June 03, 2020
On Wednesday, 3 June 2020 at 05:48:56 UTC, Patrick Schluter wrote:
> On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:
>> [...]
>
> memcmp() doesn't implement the check to avoid memcmp(NULL, NULL, length) returning with 0 instead of segfaulting as it does if any of the 2 pointers in NULL.
> It's a case of consistant behaviour, either you tolerate NULL pointers in all cases or you segfault in all cases.

https://stackoverflow.com/questions/16362925/can-i-pass-a-null-pointer-to-memcmp