Thread overview
DMD 144 Windows command line not in Locale
Jan 25, 2006
Georg Wrede
Jan 25, 2006
Tom S
Jan 25, 2006
Georg Wrede
Jan 25, 2006
Sean Kelly
Jan 25, 2006
Chris Miller
January 25, 2006
Below, I give some parameters in the dos window to hello, and it gets the non-usascii characters wrong.

My W2000 is set to Finland.

-----------

C:\dmd\samples\d>dmd hello.d
C:\dmd\bin\..\..\dm\bin\link.exe hello,,,user32+kernel32/noi;

C:\dmd\samples\d>hello itse senkin parjaava ämälämkä k,,,ölkjölkjölkj s
hello world
args.length = 7
args[0] = 'C:\dmd\samples\d\hello.exe'
args[1] = 'itse'
args[2] = 'senkin'
args[3] = 'parjaava'
args[4] = 'õmõlõmkõ'
args[5] = 'k,,,÷lkj÷lkj÷lkj'
args[6] = 's'

C:\dmd\samples\d>hello öööÖÖÖäÄåÅ
hello world
args.length = 1
args[0] = 'C:\dmd\samples\d\hello.exe'

C:\dmd\samples\d>hello öööÖÖÖäÄåÅ x
hello world
args.length = 2
args[0] = 'C:\dmd\samples\d\hello.exe'
args[1] = 'x'

C:\dmd\samples\d>hello aöööa x
hello world
args.length = 3
args[0] = 'C:\dmd\samples\d\hello.exe'
args[1] = 'a÷÷÷a'
args[2] = 'x'

C:\dmd\samples\d>hello öööö x
hello world
args.length = 2
args[0] = 'C:\dmd\samples\d\hello.exe'
args[1] = 'x'

C:\dmd\samples\d>dmd
Digital Mars D Compiler v0.144
January 25, 2006
Georg Wrede wrote:
> Below, I give some parameters in the dos window to hello, and it gets the non-usascii characters wrong.

The console is probably using encoding other than UTF8. Try executing "chcp 65001" before running the D program.


-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d-pu s+: a-->----- C+++$>++++ UL P+ L+ E--- W++ N++ o? K? w++ !O !M V? PS- PE- Y PGP t 5 X? R tv-- b DI- D+ G e>+++ h>++ !r !y
------END GEEK CODE BLOCK------

Tomasz Stachowiak  /+ a.k.a. h3r3tic +/
January 25, 2006
Tom S wrote:
> Georg Wrede wrote:
> 
>> Below, I give some parameters in the dos window to hello, and it gets the non-usascii characters wrong.
> 
> The console is probably using encoding other than UTF8. Try executing "chcp 65001" before running the D program.

Ehh, that's not the point, is it?
January 25, 2006
Georg Wrede wrote:

>> The console is probably using encoding other than UTF8. Try executing "chcp 65001" before running the D program.
> 
> Ehh, that's not the point, is it?

Not sure where the "bug" is here, though...

Either D does not "support" any consoles other than UTF-8, and using
anything else falls into undefined behaviour (i.e. what it does now)

Or, if it is *meant* to convert the local encoding into UTF-8 before
stuffing into the args[][] table. Right now, it just copies whatever.


It's pretty simple to get invalid Unicode, in the arguments array...
Just as it's very simple to get undefined return codes* from programs ?

But I haven't heard if the problem is in the spec or the implementation.

--anders

* void main() {}
January 25, 2006
Anders F Björklund wrote:
> Georg Wrede wrote:
> 
>>> The console is probably using encoding other than UTF8. Try executing "chcp 65001" before running the D program.
>>
>> Ehh, that's not the point, is it?
> 
> Not sure where the "bug" is here, though...
> 
> Either D does not "support" any consoles other than UTF-8, and using
> anything else falls into undefined behaviour (i.e. what it does now)
> 
> Or, if it is *meant* to convert the local encoding into UTF-8 before
> stuffing into the args[][] table. Right now, it just copies whatever.
> 
> 
> It's pretty simple to get invalid Unicode, in the arguments array...
> Just as it's very simple to get undefined return codes* from programs ?
> 
> But I haven't heard if the problem is in the spec or the implementation.

I think the data should be converted, if possible, by the compiler runtime before main() is called.  This should simply be a matter of adding some code to dmain2.d in phobos/internal.  How should this be handled?  ie. is there a specific known console encoding that should be converted from?


Sean
January 25, 2006
On Wed, 25 Jan 2006 17:49:44 -0500, Sean Kelly <sean@f4.ca> wrote:

> Anders F Björklund wrote:
>> Georg Wrede wrote:
>>
>>>> The console is probably using encoding other than UTF8. Try executing "chcp 65001" before running the D program.
>>>
>>> Ehh, that's not the point, is it?
>>  Not sure where the "bug" is here, though...
>>  Either D does not "support" any consoles other than UTF-8, and using
>> anything else falls into undefined behaviour (i.e. what it does now)
>>  Or, if it is *meant* to convert the local encoding into UTF-8 before
>> stuffing into the args[][] table. Right now, it just copies whatever.
>>   It's pretty simple to get invalid Unicode, in the arguments array...
>> Just as it's very simple to get undefined return codes* from programs ?
>>  But I haven't heard if the problem is in the spec or the implementation.
>
> I think the data should be converted, if possible, by the compiler runtime before main() is called.  This should simply be a matter of adding some code to dmain2.d in phobos/internal.  How should this be handled?  ie. is there a specific known console encoding that should be converted from?
>
>
> Sean

It's simple to fix main args[][], just use GetCommandLineW() (which is supported on win9x/me) and std.utf.toUTF8() it. I announced this simple fix awhile ago.