May 30, 2016 Re: String compare in words? | ||||
---|---|---|---|---|
| ||||
Posted in reply to chmike | On average there would be less than 4 bytes remaining to compare. So a simple straightforward byte comparison should do the job efficiently. |
May 30, 2016 Re: String compare in words? | ||||
---|---|---|---|---|
| ||||
Posted in reply to qznc | On 05/29/2016 10:40 PM, qznc wrote: > bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y) { Having "string" in the function name may be a bit misleading. This doesn't have any special functionality for text/characters/Unicode, does it? Should have const parameters, not immutable. > pragma(inline, false); I think you have to put this pragma on the function signature, not in the body. Also, why prevent inlining of the function? > if (x.length != y.length) return false; > int i=0; int isn't large enough for array lengths. > // word-wise compare is faster than byte-wise > if (x.length > size_t.sizeof) > for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) { > size_t* xw = cast(size_t*) &x[i]; > size_t* yw = cast(size_t*) &x[i]; Typo: Should be `&y[i]` here. > if (*xw != *yw) return false; > } > // last sub-word part > for (; i < x.length; i+=1) { > if (x[i] != y[i]) // byte compare > return false; > } > return true; > } > > Any comments or recommendations? Did you benchmark this against the built-in `==`, with ldc or gdc? If this is correct and faster than the built-in `==`, why isn't it the built-in `==`? |
May 30, 2016 Re: String compare in words? | ||||
---|---|---|---|---|
| ||||
Posted in reply to ag0aep6g | I didn't check assembly for '=='. What I have seen is that struct comparison in dmd is implemented as byte per byte compare even if the struct is 64bit long (e.g. Rebindable). I suppose dmd uses this strategy because struct/array may not be 64bit aligned, or they could have different alignment. In order to use the suggested optimization we need a good planet alignment. The effort to check that, or enforce it, is not worth the benefit on average. There could be a benefit in specific use cases where the alignment is ensured. I would be interested in such optimization with Rebindable. Can the 'is' operator be overloaded ? Le 30/05/2016 11:28, ag0aep6g via Digitalmars-d-learn a écrit : > On 05/29/2016 10:40 PM, qznc wrote: >> bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y) { > > Having "string" in the function name may be a bit misleading. This doesn't have any special functionality for text/characters/Unicode, does it? > > Should have const parameters, not immutable. > >> pragma(inline, false); > > I think you have to put this pragma on the function signature, not in the body. Also, why prevent inlining of the function? > >> if (x.length != y.length) return false; >> int i=0; > > int isn't large enough for array lengths. > >> // word-wise compare is faster than byte-wise >> if (x.length > size_t.sizeof) >> for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) { >> size_t* xw = cast(size_t*) &x[i]; >> size_t* yw = cast(size_t*) &x[i]; > > Typo: Should be `&y[i]` here. > >> if (*xw != *yw) return false; >> } >> // last sub-word part >> for (; i < x.length; i+=1) { >> if (x[i] != y[i]) // byte compare >> return false; >> } >> return true; >> } >> >> Any comments or recommendations? > > Did you benchmark this against the built-in `==`, with ldc or gdc? > > If this is correct and faster than the built-in `==`, why isn't it the built-in `==`? -- Bien cordialement, Ch.Meessen |
May 30, 2016 Re: String compare in words? | ||||
---|---|---|---|---|
| ||||
Posted in reply to ag0aep6g | On Monday, 30 May 2016 at 09:28:29 UTC, ag0aep6g wrote:
> On 05/29/2016 10:40 PM, qznc wrote:
>> bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y) {
>
> Having "string" in the function name may be a bit misleading. This doesn't have any special functionality for text/characters/Unicode, does it?
>
> Should have const parameters, not immutable.
>
>> pragma(inline, false);
>
> I think you have to put this pragma on the function signature, not in the body. Also, why prevent inlining of the function?
>
>> if (x.length != y.length) return false;
>> int i=0;
>
> int isn't large enough for array lengths.
>
>> // word-wise compare is faster than byte-wise
>> if (x.length > size_t.sizeof)
>> for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) {
>> size_t* xw = cast(size_t*) &x[i];
>> size_t* yw = cast(size_t*) &x[i];
>
> Typo: Should be `&y[i]` here.
>
>> if (*xw != *yw) return false;
>> }
>> // last sub-word part
>> for (; i < x.length; i+=1) {
>> if (x[i] != y[i]) // byte compare
>> return false;
>> }
>> return true;
>> }
>>
>> Any comments or recommendations?
>
> Did you benchmark this against the built-in `==`, with ldc or gdc?
>
> If this is correct and faster than the built-in `==`, why isn't it the built-in `==`?
I too expected it to compile to a memcmp call, but according to asm.dlang.org DMD with -O and -release, DMD compiles a == b to a byte-wise compare.
I suppose for very tiny strings this is the fastest, but for slightly larger strings, calling memcmp() would be faster. I think inlining a string comparison is also not great for code size. In the general case, for element types with trivial equality, a call to memcmp() will always be preferable, right?
|
Copyright © 1999-2021 by the D Language Foundation