dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead (page 12)

November 15, 2021

Re: dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Posted by user1234
in reply to FeepingCreature

Permalink

user1234

Posted in reply to FeepingCreature

Permalink

On Monday, 15 November 2021 at 11:20:04 UTC, FeepingCreature wrote:

On Monday, 15 November 2021 at 08:22:13 UTC, user1234 wrote:

On Monday, 15 November 2021 at 08:20:57 UTC, user1234 wrote:

On Friday, 12 November 2021 at 10:42:15 UTC, kdevel wrote:

This does not yet compile:

[...]
R = ubyte[]
must satisfy one of the following constraints:
isSomeChar!(ElementType!R) is(StringTypeOf!R)

auto-decoding or not... you need to decode from whatever is the OS encoding (must be ancient ANSI I presume ?) to UTF-8.

I meant decode then re-enc to utf

I don't see how that could work. readText would need to encode it to the OS codepage, but readText has no idea what encoding you intend. And the encoding of a filename isn't even always determined by the locale; consider trying to access filenames saved in a different locale, ie. what iconv does. There's no way around readText taking ubyte[].

I think I was off-topic, my reply was about the filename, e.g

fname.fromAnsi(cp).toUTF!char.readText()

you were more talking about the file content apparently ? sorry about that.

On Monday, 15 November 2021 at 08:22:13 UTC, user1234 wrote: > On Monday, 15 November 2021 at 08:20:57 UTC, user1234 wrote: >> On Friday, 12 November 2021 at 10:42:15 UTC, kdevel wrote: >>> This does not yet compile: >>> >>> [...] >>> R = ubyte[]` >>> must satisfy one of the following constraints: >>> ` isSomeChar!(ElementType!R) >>> is(StringTypeOf!R)` >> >> auto-decoding or not... you need to decode from whatever is the OS encoding (must be ancient ANSI I presume ?) to UTF-8. > > I meant decode then re-enc to utf You can only decode what has been (or is ment to be) encoded. Except for '.', '\0', and '/' the character values (0 .. 255) have no meaning within a filename.

On Monday, 15 November 2021 at 19:59:40 UTC, kdevel wrote: > You can only decode what has been (or is ment to be) encoded. Except for '.', '\0', and '/' the character values (0 .. 255) have no meaning within a filename. It should probably be a system specific string-type that validates using the rules of the specific OS.

Forums