Thread overview
[Issue 23186] wchar/dchar do not have their endianess defined
Jun 15, 2022
Dennis
Jun 15, 2022
Richard Cattermole
Jun 15, 2022
Dennis
Jun 15, 2022
Richard Cattermole
Jun 16, 2022
Dlang Bot
Sep 02, 2022
Dlang Bot
June 15, 2022
https://issues.dlang.org/show_bug.cgi?id=23186

Dennis <dkorpel@live.nl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dkorpel@live.nl
                 OS|Windows                     |All

--- Comment #1 from Dennis <dkorpel@live.nl> ---
This is relevant when e.g. converting a `ubyte[]` to a `wchar[]` or `dchar[]`, but I don't think the language ever does that itself. A `wchar` and `dchar` are defined as "unsigned 16/32 bit" basic types, just like `ushort` or `uint`, and endianness in general is already specified to be target defined here:

https://dlang.org/spec/abi.html#endianness

Would it suffice to add char types to the table below it?

https://dlang.org/spec/abi.html#basic_types

--
June 15, 2022
https://issues.dlang.org/show_bug.cgi?id=23186

--- Comment #2 from Richard Cattermole <alphaglosined@gmail.com> ---
No, this isn't an ABI thing, it's about encodings.

Ideally, wchar/dchar would have little and big endian versions so that we can represent both forms of the encoding in the type system.

It gotta be in: https://dlang.org/spec/type.html#basic-data-types

However, it can be kept pretty simple something like ``Unicode 8-bit code point with matching target endian``.

--
June 15, 2022
https://issues.dlang.org/show_bug.cgi?id=23186

--- Comment #3 from Dennis <dkorpel@live.nl> ---
(In reply to Richard Cattermole from comment #2)
> No, this isn't an ABI thing, it's about encodings.

I don't follow, do you have a reference for me? I'm looking at:

https://en.wikipedia.org/wiki/UTF-16

"Each Unicode code point is encoded either as one or two 16-bit code units. How these 16-bit codes are stored as bytes then depends on the 'endianness' of the text file or communication protocol."

The `wchar` type is an integer, the 16-bit code. No integral operations on a `wchar` reveal the endianness, only once you reinterpret cast 'the text file' (a `ubyte[]`) will endianness come up, but at that point I think it's no different than casting a `ubyte[]` to a `ushort[]`. We don't have BE and LE `short` types either.

> However, it can be kept pretty simple something like `Unicode 8-bit code point with matching target endian`.

There's no endian difference for 8-bit code points, or are we talking about bit order instead of byte order?

--
June 15, 2022
https://issues.dlang.org/show_bug.cgi?id=23186

--- Comment #4 from Richard Cattermole <alphaglosined@gmail.com> ---
(In reply to Dennis from comment #3)
> (In reply to Richard Cattermole from comment #2)
> > No, this isn't an ABI thing, it's about encodings.
> 
> I don't follow, do you have a reference for me? I'm looking at:
> 
> https://en.wikipedia.org/wiki/UTF-16
> 
> "Each Unicode code point is encoded either as one or two 16-bit code units. How these 16-bit codes are stored as bytes then depends on the 'endianness' of the text file or communication protocol."
> 
> The `wchar` type is an integer, the 16-bit code. No integral operations on a `wchar` reveal the endianness, only once you reinterpret cast 'the text file' (a `ubyte[]`) will endianness come up, but at that point I think it's no different than casting a `ubyte[]` to a `ushort[]`. We don't have BE and LE `short` types either.

Indeed. Integers you kinda expect that it is the same as cpu endian. But you cannot assume the same for UTF (hence we should document it).

> > However, it can be kept pretty simple something like `Unicode 8-bit code point with matching target endian`.
> 
> There's no endian difference for 8-bit code points, or are we talking about bit order instead of byte order?

That should have been UTF-16 or UTF-32, but its the same.

--
June 16, 2022
https://issues.dlang.org/show_bug.cgi?id=23186

Dlang Bot <dlang-bot@dlang.rocks> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |pull

--- Comment #5 from Dlang Bot <dlang-bot@dlang.rocks> ---
@dkorpel created dlang/dlang.org pull request #3319 "Fix 23186 - wchar/dchar do not have their endianess defined" fixing this issue:

- Fix 23186 - wchar/dchar do not have their endianess defined

https://github.com/dlang/dlang.org/pull/3319

--
September 02, 2022
https://issues.dlang.org/show_bug.cgi?id=23186

Dlang Bot <dlang-bot@dlang.rocks> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #6 from Dlang Bot <dlang-bot@dlang.rocks> ---
dlang/dlang.org pull request #3319 "Fix 23186 - wchar/dchar do not have their endianess defined" was merged into master:

- d3e822cf7d4acfd38fcf3dc3a632c3644741c6d3 by Dennis Korpel:
  Fix 23186 - wchar/dchar do not have their endianess defined

https://github.com/dlang/dlang.org/pull/3319

--