Thread overview
[Issue 5173] New: std.process.shell cannot handle non-UTF8 output
Nov 05, 2010
Lars Holowko
Nov 05, 2010
Lars Holowko
Nov 05, 2010
Lars Holowko
November 05, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5173

           Summary: std.process.shell cannot handle non-UTF8 output
           Product: D
           Version: D2
          Platform: All
        OS/Version: Windows
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: lars.holowko@gmail.com


--- Comment #0 from Lars Holowko <lars.holowko@gmail.com> 2010-11-05 12:15:15 PDT ---
std.process.shell dies with an exception when the utility returns UTF-16.

for example:

import std.process, std.stdio, std.string;

int main(string[] args)
{
    auto output = shell("wmic NTDOMAIN GET DomainName /value");
    writefln("Output: %s", output);
    return 0;
}

produces this output:

dchar decode(in char[], ref size_t): Invalid UTF-8 sequence [255, 254, 13, 0, 10, 0, 13, 0, 10, 0, 68, 0, 111, 0, 109, 0, 97, 0, 105, 0, 110, 0, 78, 0, 97, 0, 109, 0, 101, 0, 61, 0, 13, 0, 10, 0, 13, 0, 10, 0, 13, 0, 10, 0] around index 0


wmic's output looks like UTF-16(little endian).

As a work-around, if I modify std.process.shell slightly to use a wstring instead:

import std.array, std.random, std.file, std.format, std.exception;

wstring shell2(string cmd)
{
    auto a = appender!string();
    foreach (ref e; 0 .. 8)
    {
        formattedWrite(a, "%x", rndGen.front);
        rndGen.popFront;
    }
    auto filename = a.data;
    scope(exit) if (exists(filename)) remove(filename);
    errnoEnforce(system(cmd ~ "> " ~ filename) == 0);
    return readText!wstring(filename);
}

things seem to work for this case. But a proper fix would be to make readText try to determine the encoding based on the prefix and then do the necessary conversion before calling std.utf.validate.

readText currently looks like this;

S readText(S = string)(in char[] name)
{
    auto result = cast(S) read(name);
    std.utf.validate(result);
    return result;
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
November 05, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5173



--- Comment #1 from Lars Holowko <lars.holowko@gmail.com> 2010-11-05 12:16:25 PDT ---
forgot to mention: this is on 2.050

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
November 05, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5173



--- Comment #2 from Lars Holowko <lars.holowko@gmail.com> 2010-11-05 16:47:38 PDT ---
Created an attachment (id=801)
replacement std.file.readText that would fix the issue

the attached std.file.readText function implements uses the UTF encoding detection "algorithm" described in TDPL and does the necessary conversions to fix the described bug.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------