Jump to page: 1 27  
Page
Thread overview
Challenge: write a really really small front() for UTF8
Mar 23, 2014
Dmitry Olshansky
Mar 23, 2014
Dmitry Olshansky
Mar 23, 2014
Anonymous
Mar 24, 2014
Michel Fortin
Mar 24, 2014
Vladimir Panteleev
Mar 24, 2014
Mike
Mar 24, 2014
Walter Bright
Mar 24, 2014
Simen Kjærås
Mar 24, 2014
Dmitry Olshansky
Mar 24, 2014
Daniel N
Mar 25, 2014
Daniel N
Mar 25, 2014
Iain Buclaw
Mar 25, 2014
Nick Sabalausky
Mar 25, 2014
dennis luehring
Mar 25, 2014
Daniel N
Mar 26, 2014
Piotr Szturmaj
Mar 26, 2014
Temtaime
Mar 26, 2014
Temtaime
Mar 24, 2014
Michel Fortin
Mar 24, 2014
Michel Fortin
Mar 24, 2014
bearophile
Mar 24, 2014
Michel Fortin
Mar 24, 2014
Ali Çehreli
Mar 24, 2014
bearophile
Mar 24, 2014
Michel Fortin
Mar 24, 2014
Michel Fortin
Mar 24, 2014
safety0ff
Mar 24, 2014
Michel Fortin
Mar 24, 2014
Michel Fortin
Mar 24, 2014
dnspies
Mar 24, 2014
dnspies
Mar 24, 2014
safety0ff
Mar 24, 2014
Michel Fortin
Mar 24, 2014
safety0ff
Mar 24, 2014
Chris Williams
Mar 24, 2014
JR
Mar 24, 2014
Chris Williams
Mar 24, 2014
dnspies
Mar 24, 2014
dnspies
Mar 24, 2014
monarch_dodra
Mar 24, 2014
dennis luehring
Mar 24, 2014
Chris Williams
Mar 24, 2014
w0rp
Mar 24, 2014
dennis luehring
Mar 25, 2014
dennis luehring
Mar 24, 2014
Vladimir Panteleev
Mar 24, 2014
monarch_dodra
Mar 24, 2014
dnspies
Mar 24, 2014
John Colvin
Mar 24, 2014
John Colvin
Mar 24, 2014
John Colvin
Mar 24, 2014
John Colvin
Mar 24, 2014
John Colvin
Mar 24, 2014
Dmitry Olshansky
Mar 25, 2014
Dmitry Olshansky
Mar 25, 2014
anonymous
March 23, 2014
Here's a baseline: http://goo.gl/91vIGc. Destroy!

Andrei
March 23, 2014
24-Mar-2014 01:22, Andrei Alexandrescu пишет:
> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>
Assertions to check encoding?!
I thought it would detect broken encoding and do a substitution at least.

> Andrei


-- 
Dmitry Olshansky
March 23, 2014
On 3/23/14, 2:29 PM, Dmitry Olshansky wrote:
> 24-Mar-2014 01:22, Andrei Alexandrescu пишет:
>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>
> Assertions to check encoding?!
> I thought it would detect broken encoding and do a substitution at least.

That implementation does zero effort to optimize checks themselves, and indeed puts them in asserts. I think there's value in having such a primitive down below.

Andrei


March 23, 2014
24-Mar-2014 01:34, Andrei Alexandrescu пишет:
> On 3/23/14, 2:29 PM, Dmitry Olshansky wrote:
>> 24-Mar-2014 01:22, Andrei Alexandrescu пишет:
>>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>>
>> Assertions to check encoding?!
>> I thought it would detect broken encoding and do a substitution at least.
>
> That implementation does zero effort to optimize checks themselves, and
> indeed puts them in asserts. I think there's value in having such a
> primitive down below.

Just how much you are willing to assert? You don't even check length.
In short - what are the specs of this primitive and where you see it being used.

>
> Andrei
>
>


-- 
Dmitry Olshansky
March 23, 2014
On 3/23/14, 3:10 PM, Dmitry Olshansky wrote:
> 24-Mar-2014 01:34, Andrei Alexandrescu пишет:
>> On 3/23/14, 2:29 PM, Dmitry Olshansky wrote:
>>> 24-Mar-2014 01:22, Andrei Alexandrescu пишет:
>>>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>>>
>>> Assertions to check encoding?!
>>> I thought it would detect broken encoding and do a substitution at
>>> least.
>>
>> That implementation does zero effort to optimize checks themselves, and
>> indeed puts them in asserts. I think there's value in having such a
>> primitive down below.
>
> Just how much you are willing to assert? You don't even check length.

Array bounds checking takes care of that.

> In short - what are the specs of this primitive and where you see it
> being used.

A replacement for front() in arrays of char and wchar.


Andrei

March 23, 2014
dchar front(char[] s) {
    uint c = s[0];
    ubyte p = ~s[0];
    if (p>>7)
      return c;
    c = c<<8 | s[1];
    if (p>>5)
      return c;
    c = c<<8 | s[2];
    if (p>>4)
      return c;
    return c<<8 | s[3];
}
March 24, 2014
On 3/23/14, 4:28 PM, Anonymous wrote:
> dchar front(char[] s) {
>      uint c = s[0];
>      ubyte p = ~s[0];
>      if (p>>7)
>        return c;
>      c = c<<8 | s[1];
>      if (p>>5)
>        return c;
>      c = c<<8 | s[2];
>      if (p>>4)
>        return c;
>      return c<<8 | s[3];
> }

That's smaller but doesn't seem to do the same!

Andrei

March 24, 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>
> Andrei

This example only considers encodings of up to 4 bytes, but UTF-8 can encode code points in as many as 6 bytes.  Is that not a concern?

Mike
March 24, 2014
On 3/23/2014 5:32 PM, Mike wrote:
> This example only considers encodings of up to 4 bytes, but UTF-8 can encode
> code points in as many as 6 bytes.  Is that not a concern?

It's not anymore. The 5 and 6 byte encodings are now illegal.

March 24, 2014
On 2014-03-24 00:32, Mike wrote:
> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>
>> Andrei
>
> This example only considers encodings of up to 4 bytes, but UTF-8 can
> encode code points in as many as 6 bytes.  Is that not a concern?
>
> Mike

RFC 3629 (http://tools.ietf.org/html/rfc3629) restricted UTF-8 to conform to constraints in UTF-16, removing all 5- and 6-byte sequences.

--
  Simen
« First   ‹ Prev
1 2 3 4 5 6 7