Thread overview
UTF-8 strings and endianness
Oct 29, 2012
denizzzka
Oct 29, 2012
Adam D. Ruppe
Oct 29, 2012
denizzzka
Oct 30, 2012
Jesse Phillips
Oct 30, 2012
Tobias Pankrath
Oct 30, 2012
Dmitry Olshansky
Oct 30, 2012
Jesse Phillips
Oct 29, 2012
Jordi Sayol
Oct 29, 2012
denizzzka
Oct 29, 2012
denizzzka
October 29, 2012
Hi!

How to convert D's string to big endian?
How to convert to D's string from big endian?

October 29, 2012
UTF-8 isn't affected by endianness.
October 29, 2012
On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
> UTF-8 isn't affected by endianness.

Ok, thanks!
October 29, 2012
Al 29/10/12 16:17, En/na denizzzka ha escrit:
> Hi!
> 
> How to convert D's string to big endian?
> How to convert to D's string from big endian?
> 
> 

UTF-8 is always big emdian.
-- 
Jordi Sayol
October 29, 2012
On Monday, 29 October 2012 at 15:46:43 UTC, Jordi Sayol wrote:
> Al 29/10/12 16:17, En/na denizzzka ha escrit:
>> Hi!
>> 
>> How to convert D's string to big endian?
>> How to convert to D's string from big endian?
>> 
>> 
>
> UTF-8 is always big emdian.

Yes.

(I thought that the problem in this place but the problem was different.)
October 29, 2012
On Monday, 29 October 2012 at 15:46:43 UTC, Jordi Sayol wrote:
> Al 29/10/12 16:17, En/na denizzzka ha escrit:
>> Hi!
>> 
>> How to convert D's string to big endian?
>> How to convert to D's string from big endian?
>> 
>> 
>
> UTF-8 is always big emdian.

oops, what?

Q: Is the UTF-8 encoding scheme the same irrespective of whether the underlying processor is little endian or big endian?

A: Yes. Since UTF-8 is interpreted as a sequence of bytes, there is no endian problem as there is for encoding forms that use 16-bit or 32-bit code units. Where a BOM is used with UTF-8, it is only used as an ecoding signature to distinguish UTF-8 from other encodings — it has nothing to do with byte order.
October 30, 2012
On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
> UTF-8 isn't affected by endianness.

If this is true why does the BOM have marks for big and little endian?

http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
October 30, 2012
On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips wrote:
> On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
>> UTF-8 isn't affected by endianness.
>
> If this is true why does the BOM have marks for big and little endian?
>
> http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

UTF8 has only one?
October 30, 2012
10/30/2012 5:17 PM, Tobias Pankrath пишет:
> On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips wrote:
>> On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
>>> UTF-8 isn't affected by endianness.
>>
>> If this is true why does the BOM have marks for big and little endian?
>>
>> http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
>>
>
> UTF8 has only one?

Even Wiki knows the simple truth:
> Byte order has no meaning in UTF-8, [5] so its only use in UTF-8 is to  signal at the start that the text stream is encoded in UTF-8

-- 
Dmitry Olshansky
October 30, 2012
On Tuesday, 30 October 2012 at 17:17:36 UTC, Tobias Pankrath wrote:
> On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips wrote:
>> On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
>>> UTF-8 isn't affected by endianness.
>>
>> If this is true why does the BOM have marks for big and little endian?
>>
>> http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
>
> UTF8 has only one?

oops, mixed up and thought he just said "UTF isn't ..."