Thread overview
Wrong output of quotes in Windows (encoding?)
Dec 18, 2013
Hugo Florentino
Dec 18, 2013
Ali Çehreli
Dec 18, 2013
Hugo Florentino
Dec 18, 2013
Ali Çehreli
Dec 19, 2013
Simon
Dec 19, 2013
Hugo Florentino
December 18, 2013
Hi,

A short while ago I had minor difficulties escaping quotes, and noticed (I don't remember where) a simple function by a D user which I have now tried to enhance. The problem is that output is incorrect in Windows (even with unicode-supporting fonts). I tried to use transcode but could not get it to work.

Check the following code, and please advise me what to do in order to get the correct output:


import std.stdio, std.string, std.encoding;

@trusted string quote(in string str, in char chr = 'd') pure {
  switch(chr) {
    case 'b': return '`' ~ str ~ '`'; // backtick
    case 'd': return `"` ~ str ~ `"`; // double
    case 'f': return `«` ~ str ~ `»`; // french
    case 's': return `'` ~ str ~ `'`; // single
    case 't': return `“` ~ str ~ `”`; // typographic
    default: return `"` ~ str ~ `"`; // double
  }
}

void main() {
  char[] a = ['b', 'd', 'f', 's', 't'];
  auto input = "just a test";
  foreach(char type; a)
    writeln(format("Quote type %s:\t%s", type, quote(input, type)));
}

December 18, 2013
On 12/18/2013 05:32 AM, Hugo Florentino wrote:

> output is incorrect in Windows (even with unicode-supporting
> fonts).

Is the code page also set to UTF-8? I think you must issue the command 'chcp 65001'.

I have changed your program to print the code units individually in hex. I changed the test string to a single space character so that you can identify it easily on the output:

import std.stdio, std.string, std.encoding;

@trusted string quote(in string str, in char chr = 'd') pure {
  switch(chr) {
    case 'b': return '`' ~ str ~ '`'; // backtick
    case 'd': return `"` ~ str ~ `"`; // double
    case 'f': return `«` ~ str ~ `»`; // french
    case 's': return `'` ~ str ~ `'`; // single
    case 't': return `“` ~ str ~ `”`; // typographic
    default: return `"` ~ str ~ `"`; // double
  }
}

void main() {
  char[] a = ['b', 'd', 'f', 's', 't'];
  auto input = " ";
  foreach(char type; a)
      writeln(format("Quote type %s:\t%(%02x %)", type,
                     cast(ubyte[])quote(input, type)));
}

Does the output of the program look correct according to UTF-8? Then your compiler has produced a correct program. :) Here is the output I get on SL6.1 compiled with dmd v2.065-devel-41ebb59:

Quote type b:	60 20 60
Quote type d:	22 20 22
Quote type f:	c2 ab 20 c2 bb
Quote type s:	27 20 27
Quote type t:	e2 80 9c 20 e2 80 9d

I trust the correctness of this feature of D so much that I am too lazy to check whether those code units correspond to the intended Unicode characters. :)

Ali

December 18, 2013
On Wed, 18 Dec 2013 10:05:49 -0800, Ali Çehreli wrote:
> On 12/18/2013 05:32 AM, Hugo Florentino wrote:
>
>> output is incorrect in Windows (even with unicode-supporting
>> fonts).
>
> Is the code page also set to UTF-8? I think you must issue the
> command 'chcp 65001'.
>
> I have changed your program to print the code units individually in
> hex. I changed the test string to a single space character so that you
> can identify it easily on the output:
>
> import std.stdio, std.string, std.encoding;
>
> @trusted string quote(in string str, in char chr = 'd') pure {
>   switch(chr) {
>     case 'b': return '`' ~ str ~ '`'; // backtick
>     case 'd': return `"` ~ str ~ `"`; // double
>     case 'f': return `«` ~ str ~ `»`; // french
>     case 's': return `'` ~ str ~ `'`; // single
>     case 't': return `“` ~ str ~ `”`; // typographic
>     default: return `"` ~ str ~ `"`; // double
>   }
> }
>
> void main() {
>   char[] a = ['b', 'd', 'f', 's', 't'];
>   auto input = " ";
>   foreach(char type; a)
>       writeln(format("Quote type %s:\t%(%02x %)", type,
>                      cast(ubyte[])quote(input, type)));
> }
>
> Does the output of the program look correct according to UTF-8? Then
> your compiler has produced a correct program. :) Here is the output I
> get on SL6.1 compiled with dmd v2.065-devel-41ebb59:
>
> Quote type b:	60 20 60
> Quote type d:	22 20 22
> Quote type f:	c2 ab 20 c2 bb
> Quote type s:	27 20 27
> Quote type t:	e2 80 9c 20 e2 80 9d
>
> I trust the correctness of this feature of D so much that I am too
> lazy to check whether those code units correspond to the intended
> Unicode characters. :)
>
> Ali

Changing the codepage worked indeed. Thanks.
Now, how could I do that programmatically, so that if my application runs on a system with a different codepage, the output looks correct?
After all, not all users feel comfortable typing unknown commands.

December 18, 2013
On 12/18/2013 01:17 PM, Hugo Florentino wrote:

> Changing the codepage worked indeed. Thanks.
> Now, how could I do that programmatically, so that if my application
> runs on a system with a different codepage, the output looks correct?

It is not solvable in general because stdout is nothing but a stream that accepts characters. (Well, UTF-8 code units when it comes to Unicode).

The program can detect or assume that it is running in a console and change that environment if it is allowed to do so.

Google searches like "change code page console programmatically windows" produce some answers but I don't have any experience. :)

Ali

December 19, 2013
On 18/12/2013 22:11, Ali Çehreli wrote:
> On 12/18/2013 01:17 PM, Hugo Florentino wrote:
>
>  > Changing the codepage worked indeed. Thanks.
>  > Now, how could I do that programmatically, so that if my application
>  > runs on a system with a different codepage, the output looks correct?
>
> It is not solvable in general because stdout is nothing but a stream
> that accepts characters. (Well, UTF-8 code units when it comes to Unicode).
>
> The program can detect or assume that it is running in a console and
> change that environment if it is allowed to do so.
>
> Google searches like "change code page console programmatically windows"
> produce some answers but I don't have any experience. :)
>
> Ali
>

Call:

  SetConsoleOutputCP(65001);

Works for me on win7 64bit. Not sure how far back it's supported though.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686036(v=vs.85).aspx

You might need your own definition of it, don't know it's available in the phobos windows bit.

-- 
My enormous talent is exceeded only by my outrageous laziness.
http://www.ssTk.co.uk
December 19, 2013
On Thu, 19 Dec 2013 19:38:20 +0000, Simon wrote:
> Call:
>
>   SetConsoleOutputCP(65001);
>
> Works for me on win7 64bit. Not sure how far back it's supported though.

Interesting, thanks.