March 24, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to safety0ff | On 2014-03-24 06:08:30 +0000, "safety0ff" <safety0ff.dev@gmail.com> said: > Everything seems to work in your corrected versions: > http://dpaste.dzfl.pl/3b32535559ba > > Andrei's version doesn't compile on recent compiler versions due to goto skipping initialization. Ok, so let's check what happens in actual usage. In other words, what happens when front is inlined and used in a loop. I'm counting the number of assembly lines of this main function for looping through each possible input using the various implementations: http://goo.gl/QjnA2k implementation line count of main in assembly andralex 285 - 9 labels = 276 instructions MFortin1 135 - 13 labels = 122 instructions MFortin2 150 - 14 labels = 136 instructions safety0ff -- not inlined (too big?) -- dnspies 161 - 15 labels = 146 instructions CWilliams 233 - 25 labels = 208 instructions For compactness, my first implementation seems to be the winner, both with and without inlining. That said, I think front should only assume a non-empty char[] and thus should check the length before accessing anything after the first byte to protect against incomplete multi-byte sequences such as [0x1000_0000]. So I added this line to my two implementations: if (1+tailLength >= s.length) return dchar.init; Now lets see what happens with those changes: http://goo.gl/XPCGYE implementation line count of main in assembly MFortin1Check 103 - 11 labels = 92 instructions MFortin2Check 135 - 13 labels = 122 instructions Now I'm baffled. Adding a test makes the code shorter? It actually make the standalone functions longer by 8 instructions (as expected), but the inlined version is shorter. My guess is that the optimizer creates something entirely different and it turns out that this different version optimises better after inlining. That said, my test "main" function isn't very representative of the general case because the length is statically known by the optimizer. Let's see what it does when the length is not statically known: http://goo.gl/E2Q0Yu implementation line count of main in assembly andralex 384 - 9 labels = 375 instructions MFortin1 173 - 19 labels = 154 instructions MFortin2 182 - 18 labels = 164 instructions safety0ff -- not inlined -- dnspies -- not inlined -- CWilliams 229 - 23 labels = 206 instructions MFortin1Check 211 - 24 labels = 187 instructions MFortin2Check 218 - 21 labels = 197 instructions So again MFortin1 is the winner for compactness. Still, I maintain that we ought to at least check the length of the array for multi-byte sequences to protect against incomplete sequences that could lay at the end of the string, so I'd favor MFortin1Check. -- Michel Fortin michel.fortin@michelf.ca http://michelf.ca |
March 24, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Michel Fortin | On 2014-03-24 13:39:45 +0000, Michel Fortin <michel.fortin@michelf.ca> said: > That said, my test "main" function isn't very representative of the general case because the length is statically known by the optimizer. Let's see what it does when the length is not statically known: Oops, wrong link, was from an intermediary test. This one will give you the results in the table: http://goo.gl/lwU4hv > implementation line count of main in assembly > andralex 384 - 9 labels = 375 instructions > MFortin1 173 - 19 labels = 154 instructions > MFortin2 182 - 18 labels = 164 instructions > safety0ff -- not inlined -- > dnspies -- not inlined -- > CWilliams 229 - 23 labels = 206 instructions > MFortin1Check 211 - 24 labels = 187 instructions > MFortin2Check 218 - 21 labels = 197 instructions -- Michel Fortin michel.fortin@michelf.ca http://michelf.ca |
March 24, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote: > Here's a baseline: http://goo.gl/91vIGc. Destroy! > > Andrei I've probably missed something, but here's 15 instructions: http://goo.gl/d3Mgj0 Also, I contacted the owner of d.godbolt.org and hopefully we'll have a more up-to-date gdc on there soon. Perhaps ldc as well :) |
March 24, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Monday, 24 March 2014 at 15:32:09 UTC, John Colvin wrote:
> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>
>> Andrei
>
> I've probably missed something, but here's 15 instructions: http://goo.gl/d3Mgj0
woops yeah, that's completely broken.
|
March 24, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote: > Here's a baseline: http://goo.gl/91vIGc. Destroy! > > Andrei http://forum.dlang.org/post/fsgdviklnbugdzjsgfsk@forum.dlang.org |
March 24, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris Williams | On 3/24/14, 12:53 AM, Chris Williams wrote:
> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>
>> Andrei
>
> http://goo.gl/TaZTNB
Nice! Why assert(ret != 0)?
Andrei
|
March 24, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to dnspies | On 3/24/14, 1:06 AM, dnspies wrote: > On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote: >> Here's a baseline: http://goo.gl/91vIGc. Destroy! >> >> Andrei > > Here's mine (the loop gets unrolled): > > dchar front(const char[] s) { > assert(s.length > 0); > byte c = s[0]; > dchar res = cast(ubyte)c; > if(c >= 0) { > return res; > } > c <<= 1; > assert(c < 0); > for(int i = 1; i < 4; i++) { > assert(i < s.length); > ubyte d = s[i]; > assert((d >> 6) == 0b10); > res = (res << 8) | d; > c <<= 1; > if(c >= 0) > return res; > } > assert(false); > } http://goo.gl/HBSv5T - looks nice! For minimum size replace the assert(false) with one inside the loop: http://goo.gl/eWs0ft Andrei |
March 24, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to dnspies | On 3/24/14, 1:56 AM, dnspies wrote:
> On Monday, 24 March 2014 at 08:06:53 UTC, dnspies wrote:
>> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
>>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>>
>>> Andrei
>>
>> Here's mine (the loop gets unrolled):
>>
>> dchar front(const char[] s) {
>> assert(s.length > 0);
>> byte c = s[0];
>> dchar res = cast(ubyte)c;
>> if(c >= 0) {
>> return res;
>> }
>> c <<= 1;
>> assert(c < 0);
>> for(int i = 1; i < 4; i++) {
>> assert(i < s.length);
>> ubyte d = s[i];
>> assert((d >> 6) == 0b10);
>> res = (res << 8) | d;
>> c <<= 1;
>> if(c >= 0)
>> return res;
>> }
>> assert(false);
>> }
>
> Sorry, I misunderstood. We only want the x's in the output. Here's my
> revised solution
>
> http://goo.gl/PL729J
That turned out quite large. -- Andrei
|
March 24, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On 3/24/14, 2:02 AM, monarch_dodra wrote:
> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>
>> Andrei
>
> Before we roll this out, could we discuss a strategy/guideline in
> regards to detecting and handling invalid UTF sequences?
I think std.array.front should return the invalid dchar on error, and popFront should attempt to resync on error.
Andrei
|
March 24, 2014 Re: Challenge: write a really really small front() for UTF8 | ||||
---|---|---|---|---|
| ||||
Posted in reply to dnspies | On 3/24/14, 3:30 AM, dnspies wrote:
> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>
>> Andrei
>
> Ok, I managed to make it smaller. I think this is the smallest so far
> with only 23 instructions (no loop unrolling in this one):
>
> http://goo.gl/RKF5Vm
This is nice because even the checks are relatively cheap. It's quite difficult to understand tho :o). -- Andrei
|
Copyright © 1999-2021 by the D Language Foundation