Thread overview | |||||||
---|---|---|---|---|---|---|---|
|
June 15, 2014 Crash in byCodeUnit() <- byDchar() when converting faulty text to HTML | ||||
---|---|---|---|---|
| ||||
I'm using the following snippet to convert a UTF-8 string to HTML /** Convert character $(D c) to HTML representation. */ string toHTML(C)(C c) @safe pure if (isSomeChar!C) { import std.conv: to; if (c == '&') return "&"; // ampersand else if (c == '<') return "<"; // less than else if (c == '>') return ">"; // greater than else if (c == '\"') return """; // double quote else if (0 < c && c < 128) return to!string(cast(char)c); else return "&#" ~ to!string(cast(int)c) ~ ";"; } static if (__VERSION__ >= 2066L) { /** Convert string $(D s) to HTML representation. */ auto encodeHTML(string s) @safe pure { import std.utf: byDchar; import std.algorithm: joiner, map; return s.byDchar.map!toHTML.joiner(""); } } Note that it uses Walter's new std.utf.byDchar. But it triggers core.exception.RangeError@std/utf.d(2703): Range violation ---------------- Stack trace: #1: ?? line (0) #2: ?? line (0) #3: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/utf.d line (2703) #4: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/utf.d line (3232) #5: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d line (510) #6: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d line (3440) #7: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d line (3540) #8: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/range.d line (1861) #9: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d line (2172) #10: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d line (2843) #11: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d line (3167) #12: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d line (526) #13: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/stdio.d line (1168) for non-utf-8 input. Is this intentional? utf.d on line 2703 is inside byCodeUnit(). When I use byChar() i doesn't crash but then I get incorrect conversions. Could somebody explain the different between byChar, byWchar and byDchar? |
June 15, 2014 Re: Crash in byCodeUnit() <- byDchar() when converting faulty text to HTML | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | > But it triggers See also: https://github.com/nordlow/justd/blob/master/test/t_err.d |
June 16, 2014 Re: Crash in byCodeUnit() <- byDchar() when converting faulty text to HTML | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Sunday, 15 June 2014 at 23:09:24 UTC, Nordlöw wrote: > Is this intentional? > > utf.d on line 2703 is inside byCodeUnit(). AFAIK, no. You hit an Error, and those shouldn't occur unless you go out of your way for them. I'll look into it. > When I use byChar() i doesn't crash but then I get incorrect conversions. > > Could somebody explain the different between byChar, byWchar and byDchar? What's there to say? They all take a range of characters, and return it as a range of the corresponding requested type. In the case of "byDchar", it decodes the string (while returning a "BadChar") for invalid encodings. The others first decode using "byDchar", and then re-encode the individual dchars into the corresponding requested char-type. |
June 16, 2014 Re: Crash in byCodeUnit() <- byDchar() when converting faulty text to HTML | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Monday, 16 June 2014 at 10:02:16 UTC, monarch_dodra wrote:
> I'll look into it.
Yeah, there's an issue in the implementation. I brought it up in the pull page. If it doesn't get attention there, I'll file it.
|
June 16, 2014 Re: Crash in byCodeUnit() <- byDchar() when converting faulty text to HTML | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | > AFAIK, no. You hit an Error, and those shouldn't occur unless you go out of your way for them. > > I'll look into it. Superb! > What's there to say? They all take a range of characters, and return it as a range of the corresponding requested type. Excuse me for the kind of dumb question. I was unsure about the details. Is there a bleeding edge (in sync with git master) variant of dlang.org docs I can read instead of the source? If not, I build dmd, druntime amd phobos daily for testing purposes so I might aswell build the docs aswell and get it from there. > In the case of "byDchar", it decodes the string (while returning a "BadChar") for invalid encodings. This is what I want/need :) > The others first decode using "byDchar", and then re-encode the individual dchars into the corresponding requested char-type. Ok. Got it! Thx a lot. |
Copyright © 1999-2021 by the D Language Foundation