June 27, 2016 Re: 4x faster strlen with 4 char sentinel | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | On Monday, 27 June 2016 at 19:51:48 UTC, Jay Norwood wrote:
> Your link's use of padding pads out with a variable number of zeros, so that a larger data type can be used for the compare operations. This isn't the same as my example, which is simpler due to not having to fiddle with alignment and data type casting.
That's true, and it is fun to think about different string implementations. Just keep in mind that prior to the 90s, text was the essential datatype for many programmers and inventing new ways to do strings is heavily explored. I remember the first exercise we got at the university when doing the OS course was to implement "strlen", "strcpy" and "strcmp" in C or machine language. It can be fun.
Just keep in mind that the major bottleneck now is loading 64 bytes from memory into cache. So if you test performance you have to make sure to invalidate the caches before you test and test with spurious reads over a very large memory area to get realistic results.
But essentially, the operation is not heavy, so to speed it up you need to predict and prefetch from memory in time, meaning no library solution is sufficient. (you need to prefetch memory way before your library function is called)
|
June 27, 2016 Re: 4x faster strlen with 4 char sentinel | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad | On Monday, 27 June 2016 at 06:31:49 UTC, Ola Fosheim Grøstad wrote:
> On Monday, 27 June 2016 at 05:27:12 UTC, chmike wrote:
>> Ending strings with a single null byte/char is to save space. It was critical in the 70´s when C was created and memory space was very limited. That's not the case anymore and I guess the
>
> Not only to save space, some CPUs also had cheap incrementing load/stores and branching on zero is faster than sacrificing another register for a counter.
I incidentally just found my 1992 implementation for Motorola 68K, to illustrate:
_mystrcpy
move.l 4(sp),a1 ; pointer for destination
move.l 8(sp),a0 ; pointer for source
mystrcpy move.l a0,d0
1$ move.b (a0)+,(a1)+ ; copy
bne.s 1$ ; jump back up if not zero
rts
As you can see it is a tight loop. Other CPUs are even tighter, and have single-instruction loops (even 8086?)
So not only storage, also performance on specific CPUs. Which is a good reason for keeping datatypes in standard libraries abstract, different CPUs favour different representations. Even on very basic datatypes.
|
June 27, 2016 Re: 4x faster strlen with 4 char sentinel | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad | On Monday, 27 June 2016 at 20:43:40 UTC, Ola Fosheim Grøstad wrote:
> Just keep in mind that the major bottleneck now is loading 64 bytes from memory into cache. So if you test performance you have to make sure to invalidate the caches before you test and test with spurious reads over a very large memory area to get realistic results.
>
> But essentially, the operation is not heavy, so to speed it up you need to predict and prefetch from memory in time, meaning no library solution is sufficient. (you need to prefetch memory way before your library function is called)
I doubt the external memory accesses are involved in these measurements. I'm using a 100KB char array terminated by four zeros, and doing strlen on substring pointers into it incremented by 1 for 100K times. The middle of the three timings is for strlen2, while the two outer timings are for strlen during the same program execution.
I'm initializing the 100KB immediately prior to the measurement. The 100KB array should all be in L1 or L2 cache by the time I make even the first of the three time measurements.
The prefetch shouldn't have a problem predicting this.
2749
688
2783
2741
683
2738
|
June 27, 2016 Re: 4x faster strlen with 4 char sentinel | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | On Monday, 27 June 2016 at 21:41:57 UTC, Jay Norwood wrote:
> measurements. I'm using a 100KB char array terminated by four zeros, and doing strlen on substring pointers into it incremented by 1 for 100K times.
But this is a rather atypical use case for zero terminated strings? It would make more sense to test it on a massive amount of short strings stuffed into a very large hash-table. (filenames, keys, user names, email adresses etc)
|
June 27, 2016 Re: 4x faster strlen with 4 char sentinel | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | On Monday, 27 June 2016 at 19:51:48 UTC, Jay Norwood wrote:
> I also found it strange, the non-zero initialization values for char, dchar, wchar. I suppose there's some reason?
>
> int [100] to zeros.
> char [100] to 0xff;
> dchar [100] to 0xffff;
> wchar [100] to 0xffff;
The same reason float and double are default initialized to nan. char, wchar and dchar are default initialized to invalid unicode values.
|
June 28, 2016 Re: 4x faster strlen with 4 char sentinel | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | On Sunday, 26 June 2016 at 16:40:08 UTC, Jay Norwood wrote:
> After watching Andre's sentinel thing, I'm playing with strlen on char strings with 4 terminating 0s instead of a single one. Seems to work and is 4x faster compared to the runtime version.
>
> nothrow pure size_t strlen2(const(char)* c) {
> if (c is null)
> return 0;
> size_t l=0;
> while (*c){ c+=4; l+=4;}
> while (*c==0){ c--; l--;}
> return l+1;
> }
>
> This is the timing of my test case, which I can post if anyone is interested.
> strlen\Release>strlen
> 2738
> 681
If we were in interview, I'd ask you "what does this returns if you pass it an empty string ?"
|
June 28, 2016 Re: 4x faster strlen with 4 char sentinel | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On Tuesday, 28 June 2016 at 01:53:22 UTC, deadalnix wrote:
> If we were in interview, I'd ask you "what does this returns if you pass it an empty string ?"
I'd say use this one instead, to avoid negative size_t. It is also a little faster for the same measurement.
nothrow pure size_t strlen2(const(char)* c) {
if (c is null)
return 0;
const(char)* c_save = c;
while (*c){ c+=4; }
while (*c==0){ c--; }
c++;
return c - c_save;
}
2738
540
2744
|
June 28, 2016 Re: 4x faster strlen with 4 char sentinel | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | On Tuesday, 28 June 2016 at 03:11:26 UTC, Jay Norwood wrote:
> On Tuesday, 28 June 2016 at 01:53:22 UTC, deadalnix wrote:
>> If we were in interview, I'd ask you "what does this returns if you pass it an empty string ?"
oops. I see ... need to test for empty string.
nothrow pure size_t strlen2(const(char)* c) {
if (c is null || *c==0)
return 0;
const(char)* c_save = c;
while (*c){ c+=4; }
while (*c==0){ c--; }
c++;
return c - c_save;
}
|
June 28, 2016 Re: 4x faster strlen with 4 char sentinel | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | On Tuesday, 28 June 2016 at 03:58:23 UTC, Jay Norwood wrote:
> On Tuesday, 28 June 2016 at 03:11:26 UTC, Jay Norwood wrote:
>> On Tuesday, 28 June 2016 at 01:53:22 UTC, deadalnix wrote:
>>> If we were in interview, I'd ask you "what does this returns if you pass it an empty string ?"
>
> oops. I see ... need to test for empty string.
>
> nothrow pure size_t strlen2(const(char)* c) {
> if (c is null || *c==0)
> return 0;
> const(char)* c_save = c;
> while (*c){ c+=4; }
> while (*c==0){ c--; }
> c++;
> return c - c_save;
> }
Why not just do
nothrow pure size_t strlen2(const(char)* c) {
if (!c)
return 0;
...
}
|
June 28, 2016 Re: 4x faster strlen with 4 char sentinel | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | On Tuesday, 28 June 2016 at 03:11:26 UTC, Jay Norwood wrote:
> On Tuesday, 28 June 2016 at 01:53:22 UTC, deadalnix wrote:
>> If we were in interview, I'd ask you "what does this returns if you pass it an empty string ?"
>
> I'd say use this one instead, to avoid negative size_t. It is also a little faster for the same measurement.
>
> nothrow pure size_t strlen2(const(char)* c) {
> if (c is null)
> return 0;
> const(char)* c_save = c;
> while (*c){ c+=4; }
> while (*c==0){ c--; }
> c++;
> return c - c_save;
> }
>
> 2738
> 540
> 2744
If we were in an interview, I would insist.
|
Copyright © 1999-2021 by the D Language Foundation