April 28, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5904

           Summary: std.json parseString doesn't handle chars outside the
                    BMP
           Product: D
           Version: D2
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: sean@invisibleduck.org


--- Comment #0 from Sean Kelly <sean@invisibleduck.org> 2011-04-28 12:24:48 PDT ---
According to RFC 4627, characters outside the Basic Multilingual Plane (ie. those that require more than two bytes to represent) are encoded as a surrogate pair in JSON strings.  In effect, what you have to do is test whether a "\uXXXX" value is >= 0xD800 and <= 0xDBFF.  If so, then the next value should be another "\uXXXX" character representing the low surrogate.  To verify this, the value should be >= 0xDC00 and <= 0xDFFF.  If it isn't, then skip the preceding "\uXXXX" value (the high surrogate) as invalid and decode the following "\uXXXX" value as a standalone Unicode code-point (the RFC is actually unclear on this point, but this seems the most reasonable failure mode).  Assuming that you have a valid high and low surrogate, stick them into a wchar[2] and convert to UTF8.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------