Thread overview
[Issue 7393] New: Which character code does wchar be, UTF-16BE or UTF-16LE?
Jan 28, 2012
SHOO
Jan 28, 2012
SHOO
Jan 28, 2012
Kenji Hara
Jan 28, 2012
Walter Bright
Jan 28, 2012
Kenji Hara
Jan 28, 2012
Walter Bright
Jan 28, 2012
Walter Bright
January 28, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7393

           Summary: Which character code does wchar be, UTF-16BE or
                    UTF-16LE?
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: websites
        AssignedTo: nobody@puremagic.com
        ReportedBy: zan77137@nifty.com


--- Comment #0 from SHOO <zan77137@nifty.com> 2012-01-28 09:09:25 PST ---
It is not clear whether wchar is UTF-16LE or UTF-16BE or system-dependent. There is a similar problem with dchar.

These should be described in specifications clearly.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 28, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7393



--- Comment #1 from SHOO <zan77137@nifty.com> 2012-01-28 09:43:30 PST ---
The current implementation is system-dependent.

In addition, via specifications of the C language, it becomes clear that wchar is equal with wchar_t(2byte). http://www.d-programming-language.org/interfaceToC.html

I think that higher accessibility is necessary for these specifications.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 28, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7393


Kenji Hara <k.hara.pg@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |diagnostic


--- Comment #2 from Kenji Hara <k.hara.pg@gmail.com> 2012-01-28 09:49:27 PST ---
IMO, The simple way is adding following two rowsinto 'Basic Types' table in http://www.d-programming-language.org/abi.html:

wchar     16 bit unsigned value (same as ushort)
dchar     32 bit unsigned value (same as uint)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 28, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7393



--- Comment #3 from Walter Bright <bugzilla@digitalmars.com> 2012-01-28 10:51:04 PST ---
C does not specify the size of wchar_t. On Windows, wchar_t is 2 bytes, but on Linux it is 4.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 28, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7393



--- Comment #4 from Kenji Hara <k.hara.pg@gmail.com> 2012-01-28 11:20:13 PST ---
(In reply to comment #3)
> C does not specify the size of wchar_t. On Windows, wchar_t is 2 bytes, but on Linux it is 4.

Yes, I know, and he knows it.

The original question is "Does the representation of wchar and dchar type value depend on system-endianness?".

The http://d-programming-language.org/abi.html page says
"The endianness (byte order) of the layout of the data will conform to the
endianness of the target machine.", but following "Basic Types" table does not
mention about char, wchar, and dchar types.

So I had said to him in Twitter, "D's wchar type is same as C's wchar_t in 32bit system, and wchar_t depends on system endianness. So D's wchar is also system-endianness", but he had not been able to believe that.

At least, I think the lack of descriptions about character types in abi page should be fixed, as Comment#2.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 28, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7393



--- Comment #5 from Walter Bright <bugzilla@digitalmars.com> 2012-01-28 12:17:16 PST ---
(In reply to comment #2)
> IMO, The simple way is adding following two rowsinto 'Basic Types' table in
> http://www.d-programming-language.org/abi.html:
> wchar     16 bit unsigned value (same as ushort)
> dchar     32 bit unsigned value (same as uint)

They are already described in type.html.

I don't think it is necessary to say if they are BE or LE, any more than saying for a uint which order the bytes come in. The least significant bytes are grabbed with w&0xFF, the most significant with w>>8.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 28, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7393



--- Comment #6 from github-bugzilla@puremagic.com 2012-01-28 12:42:02 PST ---
Commit pushed to master at https://github.com/D-Programming-Language/dmd

https://github.com/D-Programming-Language/dmd/commit/a48c43c9e3b7dc57092c1a72c1e019c46178f11b fix Issue 7393 - Which character code does wchar be, UTF-16BE or UTF-16LE?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 28, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7393



--- Comment #7 from github-bugzilla@puremagic.com 2012-01-28 12:42:12 PST ---
Commit pushed to dmd-1.x at https://github.com/D-Programming-Language/dmd

https://github.com/D-Programming-Language/dmd/commit/999ef822efdca31c6983ea89805b178526c53c3d fix Issue 7393 - Which character code does wchar be, UTF-16BE or UTF-16LE?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 28, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7393



--- Comment #8 from Walter Bright <bugzilla@digitalmars.com> 2012-01-28 12:43:42 PST ---
Ignore, those fixes are meant for 4371.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
November 02, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=7393


monarchdodra@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |monarchdodra@gmail.com
         Resolution|                            |INVALID


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------