Thread overview
[Issue 17553] std.json should not do UTF decoding when encoding JSON
[Issue 17553] std.json invalid utf8 sequence
Jun 26, 2017
Vladimir Panteleev
Jun 26, 2017
Vladimir Panteleev
June 26, 2017
https://issues.dlang.org/show_bug.cgi?id=17553

Vladimir Panteleev <dlang-bugzilla@thecybershadow.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |dlang-bugzilla@thecybershad
                   |                            |ow.net
         Resolution|---                         |INVALID

--- Comment #1 from Vladimir Panteleev <dlang-bugzilla@thecybershadow.net> ---
As far as Phobos (and some parts of the language itself) are concerned, D strings are expected to be UTF-encoded, i.e. contain a valid stream of UTF characters. Your program elides that assumption by using a cast - the normal way to read text data into a string is the readText function, which does UTF validation. When using readText, reading a file which does not contain valid UTF will result in an exception being thrown.

As for JSON encoding - although most of JSON transformations concern themselves with just the ASCII part, the JSON standard does forbid encoding Unicode control characters, which may appear in a valid D string but must not appear in a JSON-encoded one. This includes the high control characters (code points 0x80 to 0x9F); so, the encoding code must check for these code points when constructing the JSON string. Although they could in theory be special cased, the most straight-forward way to do it is to look at the input string as a range of Unicode code points (dchars), i.e. rely on auto-decoding, which is what the current implementation does.

In any case, JSON strings are certainly not meant to store binary data - even if the example "worked" (for a certain definition of "work"), the resulting JSON object will not be in any particular encoding. Even though the JSON syntax is restricted to ASCII characters, JSON itself is not - it is Unicode aware, and contains instructions on how to properly encode and decode Unicode characters, so it can't be used for storing arbitrary binary data.

If you have a specific use case in mind which is in line with the JSON spec and how D deals with Unicode and strings, please reopen; otherwise, there is no actionable defect presented in this issue.

--
June 26, 2017
https://issues.dlang.org/show_bug.cgi?id=17553

Vladimir Panteleev <dlang-bugzilla@thecybershadow.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |---
            Summary|std.json invalid utf8       |std.json should not do UTF
                   |sequence                    |decoding when encoding JSON

--- Comment #2 from Vladimir Panteleev <dlang-bugzilla@thecybershadow.net> ---
RFC 7159 specifies that Unicode control characters don't need escaping, so actually we can avoid auto-decoding when encoding JSON.

--
July 03, 2017
https://issues.dlang.org/show_bug.cgi?id=17553

--- Comment #3 from github-bugzilla@puremagic.com ---
Commit pushed to stable at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/5031ff1446f58a4a76e16d76aa80329d1981cb32 Fix Issue 17553 - std.json should not do UTF decoding when encoding JSON

--
July 03, 2017
https://issues.dlang.org/show_bug.cgi?id=17553

github-bugzilla@puremagic.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |FIXED

--
July 08, 2017
https://issues.dlang.org/show_bug.cgi?id=17553

--- Comment #4 from github-bugzilla@puremagic.com ---
Commit pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/5031ff1446f58a4a76e16d76aa80329d1981cb32 Fix Issue 17553 - std.json should not do UTF decoding when encoding JSON

--
January 05, 2018
https://issues.dlang.org/show_bug.cgi?id=17553

--- Comment #5 from github-bugzilla@puremagic.com ---
Commit pushed to dmd-cxx at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/5031ff1446f58a4a76e16d76aa80329d1981cb32 Fix Issue 17553 - std.json should not do UTF decoding when encoding JSON

--