Jump to page: 1 2
Thread overview
String compare in words?
May 29, 2016
qznc
May 29, 2016
Era Scarecrow
May 29, 2016
Jonathan M Davis
May 29, 2016
Era Scarecrow
May 29, 2016
qznc
May 29, 2016
qznc
May 29, 2016
qznc
May 29, 2016
Seb
May 29, 2016
Stefan Koch
May 30, 2016
chmike
May 30, 2016
chmike
May 30, 2016
ag0aep6g
May 30, 2016
Christophe Meessen
May 30, 2016
Márcio Martins
May 29, 2016
Given two string (or char[] or ubyte[]) objects, I want to compare them. The naive loop accesses the arrays byte-wise. How could I turn this into a word-wise compare for better performance?

Is a cast into size_t[] ok? Some Phobos helper functions?
May 29, 2016
On Sunday, 29 May 2016 at 17:13:49 UTC, qznc wrote:
> Given two string (or char[] or ubyte[]) objects, I want to compare them. The naive loop accesses the arrays byte-wise. How could I turn this into a word-wise compare for better performance?
>
> Is a cast into size_t[] ok? Some Phobos helper functions?

 Assuming you don't have codepoints and only want to do a raw compare, i believe you can as long as the sizes align.
May 29, 2016
On Sunday, May 29, 2016 17:13:49 qznc via Digitalmars-d-learn wrote:
> Given two string (or char[] or ubyte[]) objects, I want to compare them. The naive loop accesses the arrays byte-wise. How could I turn this into a word-wise compare for better performance?
>
> Is a cast into size_t[] ok? Some Phobos helper functions?

In what way are you trying to compare them? If all you're doing is comparing them for equality, then just use ==. e.g.

if(str1 == str2)
{
}

And if you're not simply comparing for equality, what are you looking to figure out? Without more information about what you're trying to do, it's kind of hard to help you.

- Jonathan M Davis

May 29, 2016
On Sunday, 29 May 2016 at 17:38:17 UTC, Jonathan M Davis wrote:
> In what way are you trying to compare them? If all you're doing is comparing them for equality, then just use ==. e.g.
>
> if(str1 == str2)
> {
> }
>
> And if you're not simply comparing for equality, what are you looking to figure out? Without more information about what you're trying to do, it's kind of hard to help you.

 I'm reminded that the GNU stdlib has a string compare function which defaults to using larger double words to get a speedup, and I think he wants to do that the same way. Although unless they are both the same size and both divisible by the size of size_t, then it's a multi-stage process to do correctly.

 Worse I'm not sure if the code generation already does that and possibly does a better job than what we could do by hand...
May 29, 2016
On Sunday, 29 May 2016 at 17:38:17 UTC, Jonathan M Davis wrote:
> And if you're not simply comparing for equality, what are you looking to figure out? Without more information about what you're trying to do, it's kind of hard to help you.

If I write the comparison naively, the assembly clearly shows a "movzbl" [0]. It loads a single byte! The other single byte load is encoded in the address mode of "cmp". Implementation:

bool stringcmp(string x, string y) {
  foreach(i; 0..x.length) {
    if (x[i] != y[i]) // byte compare
      return false;
  }
  return true;
}

It makes no sense to load single bytes here. Since we only want to check for equality, we could load two full words and compare four or eight bytes in one go.

This example is simplified and far-fetched. Actually, this is about the find algorithm [1].

[0] http://goo.gl/ttybAB
[1] http://forum.dlang.org/post/vdjraubhtoqtxeshjvqt@forum.dlang.org
May 29, 2016
On Sunday, 29 May 2016 at 18:15:16 UTC, qznc wrote:
> On Sunday, 29 May 2016 at 17:38:17 UTC, Jonathan M Davis wrote:
>> And if you're not simply comparing for equality, what are you looking to figure out? Without more information about what you're trying to do, it's kind of hard to help you.
>
> If I write the comparison naively, the assembly clearly shows a "movzbl" [0]. It loads a single byte! The other single byte load is encoded in the address mode of "cmp". Implementation:
>
> bool stringcmp(string x, string y) {
>   foreach(i; 0..x.length) {
>     if (x[i] != y[i]) // byte compare
>       return false;
>   }
>   return true;
> }
>
> It makes no sense to load single bytes here. Since we only want to check for equality, we could load two full words and compare four or eight bytes in one go.

Ok, to answer my own question, this looks good:

bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y) {
    pragma(inline, false);
    if (x.length != y.length) return false;
    int i=0;
    // word-wise compare is faster than byte-wise
    if (x.length > size_t.sizeof)
        for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) {
            size_t* xw = cast(size_t*) &x[i];
            size_t* yw = cast(size_t*) &x[i];
            if (*xw != *yw) return false;
        }
    // last sub-word part
    for (; i < x.length; i+=1) {
        if (x[i] != y[i]) // byte compare
            return false;
    }
    return true;
}

Any comments or recommendations?
May 29, 2016
On Sunday, 29 May 2016 at 17:42:48 UTC, Era Scarecrow wrote:
>  Worse I'm not sure if the code generation already does that and possibly does a better job than what we could do by hand...

Not with dmd v2.071.0 or ldc 0.17.1.

At least not in all the variations I tried to trick them with, like copying into fixed-size array. Well, they did insert a memcmp and libc is probably optimized like that, but they copied data first, which is unnecessary.
May 29, 2016
On Sunday, 29 May 2016 at 20:40:52 UTC, qznc wrote:
> On Sunday, 29 May 2016 at 18:15:16 UTC, qznc wrote:
>> [...]
>
> Ok, to answer my own question, this looks good:
>
> bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y) {
>     pragma(inline, false);
>     if (x.length != y.length) return false;
>     int i=0;
>     // word-wise compare is faster than byte-wise
>     if (x.length > size_t.sizeof)
>         for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) {
>             size_t* xw = cast(size_t*) &x[i];
>             size_t* yw = cast(size_t*) &x[i];
>             if (*xw != *yw) return false;
>         }
>     // last sub-word part
>     for (; i < x.length; i+=1) {
>         if (x[i] != y[i]) // byte compare
>             return false;
>     }
>     return true;
> }
>
> Any comments or recommendations?

Isn't that something that the compiler should optimize for you when you do an equality comparison?
Is it really faster than ldc (with all optimzations turned on)?
May 29, 2016
On Sunday, 29 May 2016 at 20:51:19 UTC, Seb wrote:
> On Sunday, 29 May 2016 at 20:40:52 UTC, qznc wrote:
>> [...]
>
> Isn't that something that the compiler should optimize for you when you do an equality comparison?
> Is it really faster than ldc (with all optimzations turned on)?

It can be faster because of inlining.
memcmp is a runtime call.
May 30, 2016
On Sunday, 29 May 2016 at 20:40:52 UTC, qznc wrote:
> On Sunday, 29 May 2016 at 18:15:16 UTC, qznc wrote:
>> On Sunday, 29 May 2016 at 17:38:17 UTC, Jonathan M Davis wrote:
>>> And if you're not simply comparing for equality, what are you looking to figure out? Without more information about what you're trying to do, it's kind of hard to help you.
>>
>> If I write the comparison naively, the assembly clearly shows a "movzbl" [0]. It loads a single byte! The other single byte load is encoded in the address mode of "cmp". Implementation:
>>
>> bool stringcmp(string x, string y) {
>>   foreach(i; 0..x.length) {
>>     if (x[i] != y[i]) // byte compare
>>       return false;
>>   }
>>   return true;
>> }
>>
>> It makes no sense to load single bytes here. Since we only want to check for equality, we could load two full words and compare four or eight bytes in one go.
>
> Ok, to answer my own question, this looks good:
>
> bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y) {
>     pragma(inline, false);
>     if (x.length != y.length) return false;
>     int i=0;
>     // word-wise compare is faster than byte-wise
>     if (x.length > size_t.sizeof)
>         for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) {
>             size_t* xw = cast(size_t*) &x[i];
>             size_t* yw = cast(size_t*) &x[i];
>             if (*xw != *yw) return false;
>         }
>     // last sub-word part
>     for (; i < x.length; i+=1) {
>         if (x[i] != y[i]) // byte compare
>             return false;
>     }
>     return true;
> }
>
> Any comments or recommendations?

I don't know if this would be faster, but here is my attempt.

It assumes the arrays start at an address multiple of 8.

if (x is y)
    return true;
if (x.length != y.length)
    return false;
size_t l = x.length;
ubyte* a = x.ptr, b = y.ptr;
for (size_t n = l>>3; n != 0; --n, a+=8, b+=8)
    if (*cast(long*)a ^ *cast(long*)b)
        return false;
if (l & 4)
{
    if (*cast(int*)a ^ *cast(int*)b)
        return false;
    a+= 4;
    b+= 4;
}
if (l & 2)
{
    if (*cast(short*)a ^ *cast(short*)b)
        return false;
    a+=2;
    b+=2;
}
return (l & 1) && (*a ^ *b) ? false : true;

If the pointers are not on an address multiple of 8, one has to inverse the trailing tests to consume the bytes in front of the array until the address becomes a multiple of 8.

The trailing tests could eventually be replaced by a simple sequential byte compare. I don't know which is faster.
« First   ‹ Prev
1 2