Thread overview
[Issue 15586] std.utf.toUTF8() segfaults when fed an invalid dchar
January 20, 2016
https://issues.dlang.org/show_bug.cgi?id=15586

--- Comment #1 from thomas.bockman@gmail.com ---
Fix for the Phobos bug:
https://github.com/D-Programming-Language/phobos/pull/3943
Fix for the DMD bug: https://github.com/D-Programming-Language/dmd/pull/5229

--
January 20, 2016
https://issues.dlang.org/show_bug.cgi?id=15586

hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh@quickfur.ath.cx

--- Comment #2 from hsteoh@quickfur.ath.cx ---
I don't understand something here.  The in-contract of toUTF8 asserts that the dchar must be valid... but why does the assert not trigger at runtime (even in spite of not compiling with -release).  This doesn't seem like a Phobos bug; it seems to be a bug in the compiler for "optimizing" away the assert just because it thinks that dchar's cannot have invalid values.

--
January 20, 2016
https://issues.dlang.org/show_bug.cgi?id=15586

--- Comment #3 from thomas.bockman@gmail.com ---
> This doesn't seem like a Phobos bug; it seems to be a bug in the compiler for "optimizing" away the assert just because it thinks that dchar's cannot have invalid values.

That's the compiler bug I linked to in the OP (15585).

--
January 20, 2016
https://issues.dlang.org/show_bug.cgi?id=15586

--- Comment #4 from hsteoh@quickfur.ath.cx ---
Keep in mind, though, that one *could* argue that casting an arbitrary value to dchar already constitutes UB, if dchars are deemed to only contain valid Unicode codepoints. If you need to work with incoming character data of unknown validity, you're probably better off working with uint (or uint[], ubyte[], etc.) instead, and only convert to dchar (string, etc.) after explicit validation.

Generally, you probably shouldn't be casting stuff unless you know what you're doing and are ready to handle the consequences when things go wrong.

--
January 20, 2016
https://issues.dlang.org/show_bug.cgi?id=15586

--- Comment #5 from thomas.bockman@gmail.com ---
> Keep in mind, though, that one *could* argue that casting an arbitrary value to dchar already constitutes UB

This has already been settled. It's not:
    http://forum.dlang.org/post/oionrfexehapzicgpbrw@forum.dlang.org

> Generally, you probably shouldn't be casting stuff unless you know what you're doing and are ready to handle the consequences when things go wrong.

1) No casting is required to trigger this bug. I was just giving you a simplified test case. Here's a slightly less simple one, without any casts:

import std.stdio;

void main() {
    import std.utf : toUTF8;

    dchar a = dchar.max;
    a += 1;

    char[4] buf = void;
    auto b = toUTF8(buf, a);

    import std.stdio;
    writeln(b);
}

2) "If you don't want segfaults, don't cast stuff" is an awful solution to a hard crash in @safe code. There is no good reason that casts of basic numeric or character types should cause this kind of failure.

--
January 21, 2016
https://issues.dlang.org/show_bug.cgi?id=15586

--- Comment #6 from thomas.bockman@gmail.com ---
I may have reduced this one too far:

https://github.com/D-Programming-Language/phobos/pull/3943#issuecomment-173381348

Arguably, the real bug is that certain other functions in Phobos call `toUTF8()` without verifying that the input they are supplying satisfies the contract. This will be a bit more work to fix, though.

--
February 11, 2016
https://issues.dlang.org/show_bug.cgi?id=15586

--- Comment #7 from github-bugzilla@puremagic.com ---
Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/2b09b8b59cc94ff23f52f3d18212c727d3e89d7b Fix Phobos issue #15586 - std.utf.toUTF8() halts on invalid dchar.

https://github.com/D-Programming-Language/phobos/commit/cadc8197614dac7313ffc44dd4e81c51c17ee8f3 Merge pull request #3943 from tsbockman/dchar-crash

Fix Phobos Issue 15586

--
February 11, 2016
https://issues.dlang.org/show_bug.cgi?id=15586

thomas.bockman@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--
March 19, 2016
https://issues.dlang.org/show_bug.cgi?id=15586

--- Comment #8 from github-bugzilla@puremagic.com ---
Commits pushed to stable at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/2b09b8b59cc94ff23f52f3d18212c727d3e89d7b Fix Phobos issue #15586 - std.utf.toUTF8() halts on invalid dchar.

https://github.com/D-Programming-Language/phobos/commit/cadc8197614dac7313ffc44dd4e81c51c17ee8f3 Merge pull request #3943 from tsbockman/dchar-crash

--