Jump to page: 1 2 3
Thread overview
4x faster strlen with 4 char sentinel
Jun 26, 2016
Jay Norwood
Jun 26, 2016
David Nadlinger
Jun 26, 2016
Jay Norwood
Jun 27, 2016
chmike
Jun 27, 2016
chmike
Jun 27, 2016
Jay Norwood
Jun 27, 2016
Jay Norwood
Jun 27, 2016
Jay Norwood
Jun 27, 2016
Mike Parker
Jun 27, 2016
Brad Roberts
Jun 28, 2016
deadalnix
Jun 28, 2016
Jay Norwood
Jun 28, 2016
Jay Norwood
Jun 28, 2016
Bauss
Jun 28, 2016
deadalnix
Jun 28, 2016
Sebastiaan Koppe
Jun 28, 2016
Jay Norwood
Jun 28, 2016
qznc
Jun 28, 2016
Jay Norwood
June 26, 2016
After watching Andre's sentinel thing, I'm playing with strlen on char strings with 4 terminating 0s instead of a single one.  Seems to work and is 4x faster compared to the runtime version.

nothrow pure size_t strlen2(const(char)* c) {
 if (c is null)
   return 0;
 size_t l=0;
 while (*c){ c+=4; l+=4;}
 while (*c==0){ c--; l--;}
 return l+1;
}

This is the timing of my test case, which I can post if anyone is interested.
strlen\Release>strlen
2738
681


June 26, 2016
Hi Jay,

On Sunday, 26 June 2016 at 16:40:08 UTC, Jay Norwood wrote:
> After watching Andre's sentinel thing, I'm playing with strlen on char strings with 4 terminating 0s instead of a single one.  Seems to work and is 4x faster compared to the runtime version.

Please keep general discussions like this off the announce list, which would e.g. be suitable for announcing a fleshed out collection of high-performance string handling routines.

> nothrow pure size_t strlen2(const(char)* c) {
>  if (c is null)
>    return 0;
>  size_t l=0;
>  while (*c){ c+=4; l+=4;}
>  while (*c==0){ c--; l--;}
>  return l+1;
> }

A couple of quick hints:
 - This is not a correct implementation of strlen, as it already assumes that the array is terminated by four zero bytes. That iterating memory with a stride of 4 instead of 1 will be faster is a self-evident truth.
 - You should be benchmarking against a "proper" SIMD-optimised strlen implementation.

 — David
June 26, 2016
On Sunday, 26 June 2016 at 16:59:54 UTC, David Nadlinger wrote:
> Please keep general discussions like this off the announce list, which would e.g. be suitable for announcing a fleshed out collection of high-performance string handling routines.
>
> A couple of quick hints:
>  - This is not a correct implementation of strlen, as it already assumes that the array is terminated by four zero bytes. That iterating memory with a stride of 4 instead of 1 will be faster is a self-evident truth.
>  - You should be benchmarking against a "proper" SIMD-optimised strlen implementation.
>
>  — David


This is more of just an observation that the choice of the single zero sentinel for C string termination comes at a cost of 4x strlen speed vs using four terminating zeros.

I don't see a SIMD strlen implementation in the D libraries.

The strlen2 function I posted works on any string that is terminated by four zeros, and returns the same len as strlen in that case, but much faster.

How to get strings initialized with four terminating zeros at compile time is a separate issue.  I don't know the solution, else I might consider doing more with this.


June 27, 2016
If you store the string size in a four byte field, you get a hard to beat fast strlen. :)

June 27, 2016
On Monday, 27 June 2016 at 05:10:37 UTC, chmike wrote:
> If you store the string size in a four byte field, you get a hard to beat fast strlen. :)

If you have the constrain to work with null terminated strings, then it could be interesting to look at SIMD as was suggested or use one of the following algorithms to test the presence of a 0 byte in a 4 byte or 8 byte integer when simd is not available.

https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord

Ending strings with a single null byte/char is to save space. It was critical in the 70´s when C was created and memory space was very limited. That's not the case anymore and I guess the reason why you assume that ending a string with 4 null chars is OK.
June 27, 2016
On Monday, 27 June 2016 at 05:27:12 UTC, chmike wrote:
> Ending strings with a single null byte/char is to save space. It was critical in the 70´s when C was created and memory space was very limited. That's not the case anymore and I guess the

Not only to save space, some CPUs also had cheap incrementing load/stores and branching on zero is faster than sacrificing another register for a counter.

IIRC Pascal has a short string with a single byte length.

Besides there are plenty of other advantages to using a terminating sentinel depending on the use scenario. E.g. if you want many versions of the same tail or if you are splitting a string at white space (overwrite a white space char with a zero).
June 27, 2016
On Monday, 27 June 2016 at 06:31:49 UTC, Ola Fosheim Grøstad wrote:
> Besides there are plenty of other advantages to using a terminating sentinel depending on the use scenario. E.g. if you want many versions of the same tail or if you are splitting a string at white space (overwrite a white space char with a zero).

This strlen2 doesn't require special alignment or casting of char pointer types to some larger type. That keeps the strlen2 implementation fairly simple.

The implementation is only testing one char per increment.  It doesn't require the extra xor processing used in some of the examples.

I haven't checked if there is a strlen for dchar or wchar, but it should also speed up those.




June 27, 2016
On Monday, 27 June 2016 at 16:22:56 UTC, Jay Norwood wrote:
> This strlen2 doesn't require special alignment or casting of char pointer types to some larger type. That keeps the strlen2 implementation fairly simple.

Yes, and the idea of speeding up strings by padding out with zeros is not new. ;-) I recall suggesting it back in 1999 when discussing the benefits of having a big endian cpu when sorting strings. If it is big endian you can compare ascii as 32/64 bit integers, so if you align the string and pad out with zeros then you can speed up strcmp() by a significant factor. Oh, here it is:

http://disinterest.org/resource/MUD-Dev/1999q1/009759.html

Of course, this is all moot now, little endian + simd has made such tricks redundant. Simd probably makes your strlen2 redundant too. The bottle neck tends to be memory access/prefetching for simple algorithms.

June 27, 2016
On 6/26/2016 11:47 AM, Jay Norwood via Digitalmars-d-announce wrote:
> On Sunday, 26 June 2016 at 16:59:54 UTC, David Nadlinger wrote:
>> Please keep general discussions like this off the announce list, which
>> would e.g. be suitable for announcing a fleshed out collection of
>> high-performance string handling routines.
>>
>> A couple of quick hints:
>>  - This is not a correct implementation of strlen, as it already
>> assumes that the array is terminated by four zero bytes. That
>> iterating memory with a stride of 4 instead of 1 will be faster is a
>> self-evident truth.
>>  - You should be benchmarking against a "proper" SIMD-optimised strlen
>> implementation.
>>
>>  — David
>
>
> This is more of just an observation that the choice of the single zero
> sentinel for C string termination comes at a cost of 4x strlen speed vs
> using four terminating zeros.
>
> I don't see a SIMD strlen implementation in the D libraries.
>
> The strlen2 function I posted works on any string that is terminated by
> four zeros, and returns the same len as strlen in that case, but much
> faster.
>
> How to get strings initialized with four terminating zeros at compile
> time is a separate issue.  I don't know the solution, else I might
> consider doing more with this.

Yup.. there's a reason that many many hours have been spent optimizing strlen and other memory related length and comparison routines.  They are used a lot and the number of ways of making them fast varies almost as much as the number of cpu's that exist.  This effort is embedded in the code gen of compilers (other than dmd) and libc runtimes.  Trying to re-invent it is noble, and very educational, but largely redundant.
June 27, 2016
On Monday, 27 June 2016 at 16:38:58 UTC, Ola Fosheim Grøstad wrote:
> Yes, and the idea of speeding up strings by padding out with zeros is not new. ;-) I recall suggesting it back in 1999 when discussing the benefits of having a big endian cpu when sorting strings. If it is big endian you can compare ascii as 32/64 bit integers, so if you align the string and pad out with zeros then you can speed up strcmp() by a significant factor. Oh, here it is:

Your link's use of padding pads out with a variable number of zeros, so that a larger data type can be used for the compare operations.  This isn't the same as my example, which is simpler due to not having to fiddle with alignment and data type casting.

I didn't find a strlen implementation for dchar or wchar in the D libraries.

I also found it strange, the non-zero initialization values for char, dchar, wchar.  I suppose there's some reason?

int [100]  to zeros.
char [100]  to 0xff;
dchar [100]   to 0xffff;
wchar [100]   to 0xffff;







« First   ‹ Prev
1 2 3