Thread overview | ||||||||
---|---|---|---|---|---|---|---|---|
|
June 15, 2022 [Issue 23186] wchar/dchar do not have their endianess defined | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=23186 Dennis <dkorpel@live.nl> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dkorpel@live.nl OS|Windows |All --- Comment #1 from Dennis <dkorpel@live.nl> --- This is relevant when e.g. converting a `ubyte[]` to a `wchar[]` or `dchar[]`, but I don't think the language ever does that itself. A `wchar` and `dchar` are defined as "unsigned 16/32 bit" basic types, just like `ushort` or `uint`, and endianness in general is already specified to be target defined here: https://dlang.org/spec/abi.html#endianness Would it suffice to add char types to the table below it? https://dlang.org/spec/abi.html#basic_types -- |
June 15, 2022 [Issue 23186] wchar/dchar do not have their endianess defined | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=23186 --- Comment #2 from Richard Cattermole <alphaglosined@gmail.com> --- No, this isn't an ABI thing, it's about encodings. Ideally, wchar/dchar would have little and big endian versions so that we can represent both forms of the encoding in the type system. It gotta be in: https://dlang.org/spec/type.html#basic-data-types However, it can be kept pretty simple something like ``Unicode 8-bit code point with matching target endian``. -- |
June 15, 2022 [Issue 23186] wchar/dchar do not have their endianess defined | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=23186 --- Comment #3 from Dennis <dkorpel@live.nl> --- (In reply to Richard Cattermole from comment #2) > No, this isn't an ABI thing, it's about encodings. I don't follow, do you have a reference for me? I'm looking at: https://en.wikipedia.org/wiki/UTF-16 "Each Unicode code point is encoded either as one or two 16-bit code units. How these 16-bit codes are stored as bytes then depends on the 'endianness' of the text file or communication protocol." The `wchar` type is an integer, the 16-bit code. No integral operations on a `wchar` reveal the endianness, only once you reinterpret cast 'the text file' (a `ubyte[]`) will endianness come up, but at that point I think it's no different than casting a `ubyte[]` to a `ushort[]`. We don't have BE and LE `short` types either. > However, it can be kept pretty simple something like `Unicode 8-bit code point with matching target endian`. There's no endian difference for 8-bit code points, or are we talking about bit order instead of byte order? -- |
June 15, 2022 [Issue 23186] wchar/dchar do not have their endianess defined | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=23186 --- Comment #4 from Richard Cattermole <alphaglosined@gmail.com> --- (In reply to Dennis from comment #3) > (In reply to Richard Cattermole from comment #2) > > No, this isn't an ABI thing, it's about encodings. > > I don't follow, do you have a reference for me? I'm looking at: > > https://en.wikipedia.org/wiki/UTF-16 > > "Each Unicode code point is encoded either as one or two 16-bit code units. How these 16-bit codes are stored as bytes then depends on the 'endianness' of the text file or communication protocol." > > The `wchar` type is an integer, the 16-bit code. No integral operations on a `wchar` reveal the endianness, only once you reinterpret cast 'the text file' (a `ubyte[]`) will endianness come up, but at that point I think it's no different than casting a `ubyte[]` to a `ushort[]`. We don't have BE and LE `short` types either. Indeed. Integers you kinda expect that it is the same as cpu endian. But you cannot assume the same for UTF (hence we should document it). > > However, it can be kept pretty simple something like `Unicode 8-bit code point with matching target endian`. > > There's no endian difference for 8-bit code points, or are we talking about bit order instead of byte order? That should have been UTF-16 or UTF-32, but its the same. -- |
June 16, 2022 [Issue 23186] wchar/dchar do not have their endianess defined | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=23186 Dlang Bot <dlang-bot@dlang.rocks> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |pull --- Comment #5 from Dlang Bot <dlang-bot@dlang.rocks> --- @dkorpel created dlang/dlang.org pull request #3319 "Fix 23186 - wchar/dchar do not have their endianess defined" fixing this issue: - Fix 23186 - wchar/dchar do not have their endianess defined https://github.com/dlang/dlang.org/pull/3319 -- |
September 02, 2022 [Issue 23186] wchar/dchar do not have their endianess defined | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=23186 Dlang Bot <dlang-bot@dlang.rocks> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Dlang Bot <dlang-bot@dlang.rocks> --- dlang/dlang.org pull request #3319 "Fix 23186 - wchar/dchar do not have their endianess defined" was merged into master: - d3e822cf7d4acfd38fcf3dc3a632c3644741c6d3 by Dennis Korpel: Fix 23186 - wchar/dchar do not have their endianess defined https://github.com/dlang/dlang.org/pull/3319 -- |
Copyright © 1999-2021 by the D Language Foundation