Thread overview
Commmandline arguments and UTF8 error
Feb 21, 2010
Nils Hensel
Feb 22, 2010
Daniel Keep
Feb 22, 2010
Nils Hensel
Feb 22, 2010
Nils Hensel
Feb 22, 2010
Don
Feb 22, 2010
Jacob Carlborg
Feb 23, 2010
Daniel Keep
Feb 23, 2010
Jacob Carlborg
February 21, 2010
Hello, group!

I have a problem writing a small console tool that needs to be given file names as commandline arguments. Not a difficult task one might assume. But everytime a filename contains an Umlaut (ä, ö, ü etc.) I receive "Error: 4invalid UTF-8 sequence".

Here's the sample code:

import std.stdio;

int main(string[] argv)
{
   foreach (arg; argv)
   {
      writef(arg);
   }
   return 0;
}

I use dmd v1.046 by the way.

How do I make the argument valid? I need to be able to use std.path and
 std.file methods on the file names.

Any help would be greatly appreciated.

Regards,
Nils Hensel
February 22, 2010

Nils Hensel wrote:
> Hello, group!
> 
> I have a problem writing a small console tool that needs to be given file names as commandline arguments. Not a difficult task one might assume. But everytime a filename contains an Umlaut (ä, ö, ü etc.) I receive "Error: 4invalid UTF-8 sequence".
> 
> Here's the sample code:
> 
> import std.stdio;
> 
> int main(string[] argv)
> {
>    foreach (arg; argv)
>    {
>       writef(arg);
>    }
>    return 0;
> }
> 
> I use dmd v1.046 by the way.
> 
> How do I make the argument valid? I need to be able to use std.path and
>  std.file methods on the file names.
> 
> Any help would be greatly appreciated.
> 
> Regards,
> Nils Hensel

If you look at the real main function in src\phobos\internal\dmain2.d, you'll see this somewhere around line 109 (I'm using 1.051, but it's unlikely to be much different in an earlier version):

> for (size_t i = 0; i < argc; i++)
> {
>     auto len = strlen(argv[i]);
>     am[i] = argv[i][0 .. len];
> }
>
> args = am[0 .. argc];
>
> result = main(args);

In other words, Phobos never bothers to actually convert the arguments to UTF-8.

Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
trunk).
February 22, 2010
Daniel Keep schrieb:
> If you look at the real main function in src\phobos\internal\dmain2.d, you'll see this somewhere around line 109 (I'm using 1.051, but it's unlikely to be much different in an earlier version):
> 
>> for (size_t i = 0; i < argc; i++)
>> {
>>     auto len = strlen(argv[i]);
>>     am[i] = argv[i][0 .. len];
>> }
>>
>> args = am[0 .. argc];
>>
>> result = main(args);
> 
> In other words, Phobos never bothers to actually convert the arguments to UTF-8.

Hmm, I really can't see any benefit. Did Walter ever comment on this matter? Surely, I can't be the only one who is unable to use D for something as mundane as a command line tool that takes file names for arguments?

> Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
> trunk).

Actually I was trying to avoid Tango. For one I'm not too fond of the interface [Stdout.format(...).newline just seems awkward und unnecessarily complicated compared to writef(...)]. Also, I use derelict which I don't believe supports Tango yet. And I liked the out-of-the-box-feeling of Phobos which is supposedly the standard.

Guess I have to make up my mind if all the extra hassle of installing and learning (and updating) another and utterly different "standard" library outweighs the benefits of developing in D.

Thanks a lot for your response!

Regards,
Nils
February 22, 2010
Nils Hensel wrote:
> Daniel Keep schrieb:
>> If you look at the real main function in src\phobos\internal\dmain2.d,
>> you'll see this somewhere around line 109 (I'm using 1.051, but it's
>> unlikely to be much different in an earlier version):
>>
>>> for (size_t i = 0; i < argc; i++)
>>> {
>>>     auto len = strlen(argv[i]);
>>>     am[i] = argv[i][0 .. len];
>>> }
>>>
>>> args = am[0 .. argc];
>>>
>>> result = main(args);
>> In other words, Phobos never bothers to actually convert the arguments
>> to UTF-8.
> 
> Hmm, I really can't see any benefit. Did Walter ever comment on this
> matter? Surely, I can't be the only one who is unable to use D for
> something as mundane as a command line tool that takes file names for
> arguments?
> 
>> Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
>> trunk).
> 
> Actually I was trying to avoid Tango. For one I'm not too fond of the
> interface [Stdout.format(...).newline just seems awkward und
> unnecessarily complicated compared to writef(...)]. Also, I use derelict
> which I don't believe supports Tango yet. And I liked the
> out-of-the-box-feeling of Phobos which is supposedly the standard.
> 
> Guess I have to make up my mind if all the extra hassle of installing
> and learning (and updating) another and utterly different "standard"
> library outweighs the benefits of developing in D.

My humble opinion is that instead of doing that, you should consider switching to D2.  Most D1 code should compile as D2 code (the most common change will be inout->ref), and Phobos2 has the same "feel" as Phobos1, just a lot better and more extensive.  Specifically, it has std.encoding, which may aid you in decoding filenames from your file system's character set.

If D2 is not an option, you can always look at the std.encoding source code and write your own YourEncoding->UTF-8 function:

http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/encoding.d

-Lars

If that is
February 22, 2010
Lars T. Kyllingstad wrote:
> Nils Hensel wrote:
>> Daniel Keep schrieb:
>>> If you look at the real main function in src\phobos\internal\dmain2.d,
>>> you'll see this somewhere around line 109 (I'm using 1.051, but it's
>>> unlikely to be much different in an earlier version):
>>>
>>>> for (size_t i = 0; i < argc; i++)
>>>> {
>>>>     auto len = strlen(argv[i]);
>>>>     am[i] = argv[i][0 .. len];
>>>> }
>>>>
>>>> args = am[0 .. argc];
>>>>
>>>> result = main(args);
>>> In other words, Phobos never bothers to actually convert the arguments
>>> to UTF-8.
>>
>> Hmm, I really can't see any benefit. Did Walter ever comment on this
>> matter? Surely, I can't be the only one who is unable to use D for
>> something as mundane as a command line tool that takes file names for
>> arguments?
>>
>>> Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
>>> trunk).
>>
>> Actually I was trying to avoid Tango. For one I'm not too fond of the
>> interface [Stdout.format(...).newline just seems awkward und
>> unnecessarily complicated compared to writef(...)]. Also, I use derelict
>> which I don't believe supports Tango yet. And I liked the
>> out-of-the-box-feeling of Phobos which is supposedly the standard.
>>
>> Guess I have to make up my mind if all the extra hassle of installing
>> and learning (and updating) another and utterly different "standard"
>> library outweighs the benefits of developing in D.
> 
> My humble opinion is that instead of doing that, you should consider switching to D2.  Most D1 code should compile as D2 code (the most common change will be inout->ref), and Phobos2 has the same "feel" as Phobos1, just a lot better and more extensive.  Specifically, it has std.encoding, which may aid you in decoding filenames from your file system's character set.
> 
> If D2 is not an option, you can always look at the std.encoding source code and write your own YourEncoding->UTF-8 function:
> 
> http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/encoding.d

I just realised that D2 also does what Daniel says Tango does.  I guess this is because D2's runtime, druntime, is based on Tango's runtime.  So most likely you don't need to use std.encoding after all.

-Lars
February 22, 2010
Lars T. Kyllingstad schrieb:
> I just realised that D2 also does what Daniel says Tango does.  I guess this is because D2's runtime, druntime, is based on Tango's runtime.  So most likely you don't need to use std.encoding after all.

Really? So all I'd need to do would be to switch to D2? I'd be fine with that if D2 were stable enough and derelict and dfl (and probably wxD) were available.

Thanks for the info, Lars!

Any opinions about D2? How much of a beta is it? Does one have to adjust code often because of language changes? What about debugging?

I've been using D since before 1.0 but I never made the transition over to D2.

Regards,
Nils
February 22, 2010
Nils Hensel wrote:
> Lars T. Kyllingstad schrieb:
>> I just realised that D2 also does what Daniel says Tango does.  I guess
>> this is because D2's runtime, druntime, is based on Tango's runtime.  So
>> most likely you don't need to use std.encoding after all.
> 
> Really? So all I'd need to do would be to switch to D2? I'd be fine with
> that if D2 were stable enough and derelict and dfl (and probably wxD)
> were available.
> 
> Thanks for the info, Lars!
> 
> Any opinions about D2? How much of a beta is it? Does one have to adjust
> code often because of language changes? What about debugging?

It began the freezing process last week. There are major changes to operator overloading which are implemented but not yet officially released, but no further major changes will occur to the language. Some smaller semantic issues will be changed in the next couple of months, but after that it'll just be bug fixes.
So you WILL need to change a fair amount of code in two months time, but after that, hardly at all.

Phobos will remain in a state of flux for some time, however.

If you're on Windows, the major D2 bug to be aware of is bugzilla bug 3342. It may be a blocker.

> 
> I've been using D since before 1.0 but I never made the transition over
> to D2.
> 
> Regards,
> Nils
February 22, 2010
On 2010-02-22 15.39, Nils Hensel wrote:
> Daniel Keep schrieb:
>> If you look at the real main function in src\phobos\internal\dmain2.d,
>> you'll see this somewhere around line 109 (I'm using 1.051, but it's
>> unlikely to be much different in an earlier version):
>>
>>> for (size_t i = 0; i<  argc; i++)
>>> {
>>>      auto len = strlen(argv[i]);
>>>      am[i] = argv[i][0 .. len];
>>> }
>>>
>>> args = am[0 .. argc];
>>>
>>> result = main(args);
>>
>> In other words, Phobos never bothers to actually convert the arguments
>> to UTF-8.
>
> Hmm, I really can't see any benefit. Did Walter ever comment on this
> matter? Surely, I can't be the only one who is unable to use D for
> something as mundane as a command line tool that takes file names for
> arguments?
>
>> Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
>> trunk).
>
> Actually I was trying to avoid Tango. For one I'm not too fond of the
> interface [Stdout.format(...).newline just seems awkward und
> unnecessarily complicated compared to writef(...)]. Also, I use derelict
> which I don't believe supports Tango yet. And I liked the
> out-of-the-box-feeling of Phobos which is supposedly the standard.

You can use derelict with tango. I can agree you about Stdout.format, You can create wrappers like this:

void writeln (ARGS...) (ARGS args)
{
	foreach (arg ; args)
		Stdout(arg);

	Stdout().newline;
}

void writefln (ARGS...) (char[] str, ARGS args)
{
	foreach (arg ; args)
		Stdout.format(str, arg);

	Stdout().newline;
}

> Guess I have to make up my mind if all the extra hassle of installing
> and learning (and updating) another and utterly different "standard"
> library outweighs the benefits of developing in D.

You can download dmd bundled with tango from tango's website.

> Thanks a lot for your response!
>
> Regards,
> Nils

February 23, 2010

Jacob Carlborg wrote:
> On 2010-02-22 15.39, Nils Hensel wrote:
>> Daniel Keep schrieb:
>>> If you look at the real main function in src\phobos\internal\dmain2.d, you'll see this somewhere around line 109 (I'm using 1.051, but it's unlikely to be much different in an earlier version):
>>>
>>>> for (size_t i = 0; i<  argc; i++)
>>>> {
>>>>      auto len = strlen(argv[i]);
>>>>      am[i] = argv[i][0 .. len];
>>>> }
>>>>
>>>> args = am[0 .. argc];
>>>>
>>>> result = main(args);
>>>
>>> In other words, Phobos never bothers to actually convert the arguments to UTF-8.
>>
>> Hmm, I really can't see any benefit. Did Walter ever comment on this matter? Surely, I can't be the only one who is unable to use D for something as mundane as a command line tool that takes file names for arguments?
>>
>>> Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
>>> trunk).
>>
>> Actually I was trying to avoid Tango. For one I'm not too fond of the
>> interface [Stdout.format(...).newline just seems awkward und
>> unnecessarily complicated compared to writef(...)].

It *is* more verbose.  It's one of the few things I've never liked about Tango.

That said, the justification for it is that Stdout.format / Stdout.formatln is significantly clearer.  Plus, you also get an Stderr version as well.

>> Also, I use derelict
>> which I don't believe supports Tango yet.

I'm fairly certain it should.  I'm positive I've used them together in the past.

>> And I liked the
>> out-of-the-box-feeling of Phobos which is supposedly the standard.

That's a bit like not using Boost because there's the C standard library.

Whilst Tango is not a strict superset of Phobos, it generally does more and does it better.  For example, it actually makes the effort to decode command line arguments.  :P

> You can use derelict with tango. I can agree you about Stdout.format, You can create wrappers like this:
> 
> void writeln (ARGS...) (ARGS args)
> {
>     foreach (arg ; args)
>         Stdout(arg);
> 
>     Stdout().newline;
> }
> 
> void writefln (ARGS...) (char[] str, ARGS args)
> {
>     foreach (arg ; args)
>         Stdout.format(str, arg);
> 
>     Stdout().newline;
> }

Shouldn't that be

void writefln(Args...)(char[] str, Args args)
{
    Stdout.formatln(str, args);
}

Incidentally, you don't need the `()`s before `.newline`.

>> Guess I have to make up my mind if all the extra hassle of installing and learning (and updating) another and utterly different "standard" library outweighs the benefits of developing in D.

Having written projects using both Phobos and Tango (not in the same project, mind you), I'd say Tango is very much worth the effort.

Just... just don't use the Zip module.  It's complete and utter crap.
February 23, 2010
On 2/23/10 01:35, Daniel Keep wrote:
>
>
> Jacob Carlborg wrote:
>> On 2010-02-22 15.39, Nils Hensel wrote:
>>> Daniel Keep schrieb:
>>>> If you look at the real main function in src\phobos\internal\dmain2.d,
>>>> you'll see this somewhere around line 109 (I'm using 1.051, but it's
>>>> unlikely to be much different in an earlier version):
>>>>
>>>>> for (size_t i = 0; i<   argc; i++)
>>>>> {
>>>>>       auto len = strlen(argv[i]);
>>>>>       am[i] = argv[i][0 .. len];
>>>>> }
>>>>>
>>>>> args = am[0 .. argc];
>>>>>
>>>>> result = main(args);
>>>>
>>>> In other words, Phobos never bothers to actually convert the arguments
>>>> to UTF-8.
>>>
>>> Hmm, I really can't see any benefit. Did Walter ever comment on this
>>> matter? Surely, I can't be the only one who is unable to use D for
>>> something as mundane as a command line tool that takes file names for
>>> arguments?
>>>
>>>> Tango does (tango\core\rt\compiler\dmd\rt\dmain2.d:238 for a recent-ish
>>>> trunk).
>>>
>>> Actually I was trying to avoid Tango. For one I'm not too fond of the
>>> interface [Stdout.format(...).newline just seems awkward und
>>> unnecessarily complicated compared to writef(...)].
>
> It *is* more verbose.  It's one of the few things I've never liked about
> Tango.
>
> That said, the justification for it is that Stdout.format /
> Stdout.formatln is significantly clearer.  Plus, you also get an Stderr
> version as well.
>
>>> Also, I use derelict
>>> which I don't believe supports Tango yet.
>
> I'm fairly certain it should.  I'm positive I've used them together in
> the past.
>
>>> And I liked the
>>> out-of-the-box-feeling of Phobos which is supposedly the standard.
>
> That's a bit like not using Boost because there's the C standard library.
>
> Whilst Tango is not a strict superset of Phobos, it generally does more
> and does it better.  For example, it actually makes the effort to decode
> command line arguments.  :P
>
>> You can use derelict with tango. I can agree you about Stdout.format,
>> You can create wrappers like this:
>>
>> void writeln (ARGS...) (ARGS args)
>> {
>>      foreach (arg ; args)
>>          Stdout(arg);
>>
>>      Stdout().newline;
>> }
>>
>> void writefln (ARGS...) (char[] str, ARGS args)
>> {
>>      foreach (arg ; args)
>>          Stdout.format(str, arg);
>>
>>      Stdout().newline;
>> }
>
> Shouldn't that be
>
> void writefln(Args...)(char[] str, Args args)
> {
>      Stdout.formatln(str, args);
> }

Yes, of course, my mistake.

> Incidentally, you don't need the `()`s before `.newline`.
>
>>> Guess I have to make up my mind if all the extra hassle of installing
>>> and learning (and updating) another and utterly different "standard"
>>> library outweighs the benefits of developing in D.
>
> Having written projects using both Phobos and Tango (not in the same
> project, mind you), I'd say Tango is very much worth the effort.
>
> Just... just don't use the Zip module.  It's complete and utter crap.