UTF-8 strings and endianness - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » UTF-8 strings and endianness

Thread overview

UTF-8 strings and endianness
Oct 29, 2012 denizzzka
Oct 29, 2012 Adam D. Ruppe
Oct 29, 2012 denizzzka
Oct 30, 2012 Jesse Phillips
Oct 30, 2012 Tobias Pankrath
Oct 30, 2012 Dmitry Olshansky
Oct 30, 2012 Jesse Phillips
Oct 29, 2012 Jordi Sayol
Oct 29, 2012 denizzzka
Oct 29, 2012 denizzzka

October 29, 2012

UTF-8 strings and endianness

Posted by denizzzka

denizzzka

Hi!

How to convert D's string to big endian?
How to convert to D's string from big endian?

October 29, 2012

Re: UTF-8 strings and endianness

Posted by Adam D. Ruppe
in reply to denizzzka

Adam D. Ruppe

Posted in reply to denizzzka

UTF-8 isn't affected by endianness.

October 29, 2012

Re: UTF-8 strings and endianness

Posted by denizzzka
in reply to Adam D. Ruppe

denizzzka

Posted in reply to Adam D. Ruppe

On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
> UTF-8 isn't affected by endianness.

Ok, thanks!

October 29, 2012

Re: UTF-8 strings and endianness

Posted by Jordi Sayol
in reply to denizzzka

Jordi Sayol

Posted in reply to denizzzka

Al 29/10/12 16:17, En/na denizzzka ha escrit:
> Hi!
> 
> How to convert D's string to big endian?
> How to convert to D's string from big endian?
> 
> 

UTF-8 is always big emdian.
-- 
Jordi Sayol

October 29, 2012

Re: UTF-8 strings and endianness

Posted by denizzzka
in reply to Jordi Sayol

denizzzka

Posted in reply to Jordi Sayol

On Monday, 29 October 2012 at 15:46:43 UTC, Jordi Sayol wrote:
> Al 29/10/12 16:17, En/na denizzzka ha escrit:
>> Hi!
>> 
>> How to convert D's string to big endian?
>> How to convert to D's string from big endian?
>> 
>> 
>
> UTF-8 is always big emdian.

Yes.

(I thought that the problem in this place but the problem was different.)

October 29, 2012

Re: UTF-8 strings and endianness

Posted by denizzzka
in reply to Jordi Sayol

denizzzka

Posted in reply to Jordi Sayol

On Monday, 29 October 2012 at 15:46:43 UTC, Jordi Sayol wrote:
> Al 29/10/12 16:17, En/na denizzzka ha escrit:
>> Hi!
>> 
>> How to convert D's string to big endian?
>> How to convert to D's string from big endian?
>> 
>> 
>
> UTF-8 is always big emdian.

oops, what?

Q: Is the UTF-8 encoding scheme the same irrespective of whether the underlying processor is little endian or big endian?

A: Yes. Since UTF-8 is interpreted as a sequence of bytes, there is no endian problem as there is for encoding forms that use 16-bit or 32-bit code units. Where a BOM is used with UTF-8, it is only used as an ecoding signature to distinguish UTF-8 from other encodings — it has nothing to do with byte order.

October 30, 2012

Re: UTF-8 strings and endianness

Posted by Jesse Phillips
in reply to Adam D. Ruppe

Jesse Phillips

Posted in reply to Adam D. Ruppe

On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
> UTF-8 isn't affected by endianness.

If this is true why does the BOM have marks for big and little endian?

http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

October 30, 2012

Re: UTF-8 strings and endianness

Posted by Tobias Pankrath
in reply to Jesse Phillips

Tobias Pankrath

Posted in reply to Jesse Phillips

On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips wrote:
> On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
>> UTF-8 isn't affected by endianness.
>
> If this is true why does the BOM have marks for big and little endian?
>
> http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

UTF8 has only one?

October 30, 2012

Re: UTF-8 strings and endianness

Posted by Dmitry Olshansky
in reply to Tobias Pankrath

Dmitry Olshansky

Posted in reply to Tobias Pankrath

10/30/2012 5:17 PM, Tobias Pankrath пишет:
> On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips wrote:
>> On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
>>> UTF-8 isn't affected by endianness.
>>
>> If this is true why does the BOM have marks for big and little endian?
>>
>> http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
>>
>
> UTF8 has only one?

Even Wiki knows the simple truth:
> Byte order has no meaning in UTF-8, [5] so its only use in UTF-8 is to  signal at the start that the text stream is encoded in UTF-8

-- 
Dmitry Olshansky

October 30, 2012

Re: UTF-8 strings and endianness

Posted by Jesse Phillips
in reply to Tobias Pankrath

Jesse Phillips

Posted in reply to Tobias Pankrath

On Tuesday, 30 October 2012 at 17:17:36 UTC, Tobias Pankrath wrote:
> On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips wrote:
>> On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
>>> UTF-8 isn't affected by endianness.
>>
>> If this is true why does the BOM have marks for big and little endian?
>>
>> http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
>
> UTF8 has only one?

oops, mixed up and thought he just said "UTF isn't ..."

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation