February 25, 2013
On Monday, 25 February 2013 at 06:41:32 UTC, Lars T. Kyllingstad wrote:
> Sure, I can think of another example.  But I wouldn't read too much into this one; it was never meant as a demonstration of the "correct" way to open a web page.   It was just a simple example of spawnProcess() usage that uses a cross-platform application everyone's heard of.
>
> After all, you *could* argue this way about almost any kind of application which wasn't just invented for the sake of the example.  (In the last one, shouldn't we open the user's preferred word processor, etc?)

The question is, what is the intent? Is it to just open some URL, or to specifically start Firefox? The same applies to the word processor case - if the document is in a file format understood by several applications, is the intent to simply open the document, or to open the document in that specific application?

Now, the documentation clearly says that the example specifically launches Firefox. However, that doesn't mean that someone won't reach out for that example when hastily putting together an application that needs to open an URL. After all, it's at the top of the file, and they may not even know about the existence of the browse function which actually does what they intend.

How about using "lynx -dump http://dlang.org/"? Dumping a text representation of a webpage is a feature specific to lynx, so the intent is clearer.

>> 2. (Nitpick) The grep example uses a POSIX quoting syntax (single quotes). Would be better to use double quotes, or pass as array to avoid one more needless OS-specific element.
>
> Actually, the quotes can just be removed altogether.

OK, and now it's worse: your example uses syntax that's specific to std.process2. If you type that command in the shell, you'll get different behavior (the backslash will escape the . as a shell escape, not a RE escape).

> If anyone has a good idea for sample code which will be familiar to users of all platforms, please speak up.

If we restrict ourselves to programs that would already work for all users, there's not much to pick from: the standard Windows and POSIX command-line utilities barely overlap, although there's also the programs included with D. Maybe include dmd, rdmd or dman in some examples?

> Personally, I don't think they should be part of the public API.  They are inherently platform-specific, and we've tried to keep the module as platform-agnostic as possible.

Constructing scripts is bound to be platform-specific. The current module version allows constructing batch files on POSIX.

Here's a practical use case example for this feature: DMD uses the same syntax for response files on all platforms, and it follows the Windows command-line parsing rules. Currently, rdmd uses escapeWindowsArgument to build that response file on all platforms.

> Besides, they are not really usable with any of the other functions, and I am afraid it will be interpreted that way if we make them public.

This is actually a design problem in the new module, which I haven't discussed yet. Have a look at the very last example in the current std.process docs. How do you accomplish that correctly in the new version, without manually piping the inputs yourself? You can't.

>> 5. The spawnProcess versions which take a single string command simply call std.string.split on the command. I believe this approach to be fundamentally wrong, as passing an argument in quotes will not work as expected. Furthermore, on Windows (where process arguments are passed as a single string), splitting the string in spawnProcess and then putting it back together in spawnProcessWindows will result in a final command line that is different from the one supplied by the user.
>
> The whole point was to avoid any kind of arcane platform-specific quoting and escaping rules.  If you have spaces inside your command line arguments, use the functions that take an array, and spawnProcess() will properly quote them.  If you have any other funky characters in there, don't worry about it, spawnProcess() will properly escape them.
>
> The way it is now, the rules (if you can call it that) are exceedingly simple, and they are the same on all platforms.  This has the added benefit of discouraging platform-dependent client code.

OK, then picture the following situation.

A user of the new module starts using the module, and invokes a specific command using the spawnProcess overload that takes it as a single string. Convenient, right? Then, as the program evolves, the string becomes an enum, then a config variable, which the user can adjust.

Then, a end-user tries setting the config variable to a path that contains spaces, and everything breaks. Wrapping the path in quotes does not help either. Due to the way the function is designed, it is IMPOSSIBLE for the end-user to configure the application to launch a program located at a path containing spaces. To end-users, this comes off as a classical problem in badly written applications that don't handle command-line escaping properly.

This problem is as with any case of an interface which works in simple cases, but behaves unexpectedly in more complicated cases: it is bad design (convenience or not), and must be avoided.

I suggest that either the overloads which take a single string be removed, or that they spawn a shell instead, and let the shell do the command-line splitting. Together with my command and filename escaping functions, they should allow the user to achieve any combination of executing commands with arbitrary punctuation in the program path or arguments, as well as redirecting the output to files (again, with correctly-escaped filenames) or other programs using the existing shell syntax present on both platforms.
February 25, 2013
On 2/25/13, Lars T. Kyllingstad <public@kyllingen.net> wrote:
> Personally, I don't think they should be part of the public API.

They're extremely useful, especially when you have to deal with Optlink or other software on win32.

> How about we put them somewhere in the std.windows package?
> (std.windows.util, for example?)

You can, but they should be public and available to all system (that
means no version(Windows) shenanigans).

E.g. someone might want to build a cross-platform build tool and invoke the compiler/linker manually (even if it's just an app sending commands remotely but running on a Posix system), in that case these escaping functions are very useful to have.

> If you have spaces inside your
> command line arguments, use the functions that take an array, and
> spawnProcess() will properly quote them.

What if we get a command from somewhere else as a single string and don't know if it has spaces?
February 25, 2013
On Monday, 25 February 2013 at 15:07:47 UTC, Vladimir Panteleev wrote:
> I suggest that either the overloads which take a single string be removed, or that they spawn a shell instead, and let the shell do the command-line splitting. Together with my command and filename escaping functions, they should allow the user to achieve any combination of executing commands with arbitrary punctuation in the program path or arguments, as well as redirecting the output to files (again, with correctly-escaped filenames) or other programs using the existing shell syntax present on both platforms.

I concur that they should be removed. If the user wants the behaviour of split(), the user can use split() explicitly and the serious implications of that will be out in the open rather than buried in standard library source code and documentation.
February 25, 2013
On Sunday, 24 February 2013 at 07:58:40 UTC, Andrei Alexandrescu wrote:
> On 2/24/13 4:58 AM, H. S. Teoh wrote:
>> I find this rather frustrating... sometimes it feels like Phobos is
>> suffering from premature standardization - we have a module with a
>> design that isn't very good, but just because it somehow got put into
>> Phobos, now it has to stick, no matter what.
>
> It's a good sign - growing pains and acquiring users and all. Python broke even "hello, world" from one major release to another.
>
> Andrei

I don't think this is true at all.
With respect -- I think Walter has absolutely no clue about backwards compatibility and deprecation.

Here's how it should work:
1. You make promises  (about future compatibility).
2. You keep those promises.

Walter tries to do (2). without doing (1). The result is the insanity we've had for years. It means an unpredictable, unplanned set of often undesirable behaviour is preserved, that doesn't help stability anyway.

We need to do (1).

Can we please stop pretending this is acceptable?
It's not "growing pains" or anything like that. It's a basic misunderstanding of stability.
February 25, 2013
On Monday, 25 February 2013 at 00:15:21 UTC, Vladimir Panteleev wrote:
> On Sunday, 24 February 2013 at 14:43:50 UTC, Lars T. Kyllingstad wrote:
>> [snip]
>
> Hi Lars,
>
> First of all, about environment. I think the old behavior makes more sense.
>
> I think you had a good point about making it behave like an associative array. I would expect using opIndex with an inexisting key to throw. Subtle deviations of behavior for types that generally behave like well-known types can introduce latent bugs.
>
> The danger is even more potent in the case of environment variables, as those are often used for constructing command-lines and such. If attempting to get the value of an inexisting variable now returns null, which is used to build a command line, unexpected things can happen.
>
> For example, let's say that you're writing a program for analyzing malware, which expects $BINDUMP to be set to the path of some analysis tool. So it runs environment["BINDUMP"] ~ args[1] - however, if BINDUMP is unset, the program runs the malware executable itself.
>
> For another example, here's this classic catastrophic bug in shell scripts:
>
> rm -rf $FOO/$BAR
>
> What happens if $FOO and $BAR are unset?
>
> One thing that I think is missing from the environment object is opIn_r. Implementing opIn_r would allow users to more safely explicitly check if a variable is set or not, and is more readable than environment.get("FOO", null).
>
> And of course, there's the issue of people migrating code from the old module version to the new one: if they relied on the old behavior, the code can break in unexpected ways after the migration.
>
> What are your specific reasons for changing environment's behavior?

My reasons were what I said in my other post:  In the time I have been using the 'environment' API -- that is, for 2 1/2 years (I checked) -- I don't think there is a *single* time when I've chosen environment[var] over environment.get(var, null).

The thing about the process environment, as opposed to an associative array inside your own program, is that you can never be certain which variables are defined and which aren't.  This means that you will almost *always* have to check whether a variable exists before using it, thus rendering opIndex() pretty much useless for most cases.

Furthermore, I really don't think it is too much to expect that a user of a systems language such as D checks the return values of functions that may return a 'null' value.

However, I also think that quick'n dirty scripting is an extremely compelling use case for D, and in that case, your point is well taken.  (I also get your arguments about backwards compatibility and not deviating from the AA interface, but that was what did it for me.)

I am now on the fence about this.


> Speaking of shells, I noticed you hardcode cmd.exe in std.process2. That's another bug, it should look at the COMSPEC variable.

Thanks, I didn't know that.  On POSIX, the -c switch is standard, and works on most, if not all, shells.  Can we assume that /C is equally standardised on Windows shells?


> Also, about the shell function, I noticed this recent bug report:
> http://d.puremagic.com/issues/show_bug.cgi?id=9444
>
> Maybe it somehow makes the transition to the new function easier? :)

Hehe. :)


> If not, since you're adamant about not changing the name, can we overload the function (e.g. make the new one return some results in "out" parameters), and deprecate the original overload?
>
> Finally, I'd just like to sum up that we seem to have two decisions on the scales: somehow solving the API incompatibilities, or introducing the new version as an entirely new module. The latter is a mess we really don't want to get into, so it'd need to be justified, and IMHO the incompatibilities don't seem to be as severe and unresolvable to warrant that mess.

Let us see where the discussion about command quoting ends up.  It is going to have an impact on most of the API, shell() included.

Lars
February 25, 2013
On Monday, 25 February 2013 at 19:28:33 UTC, Lars T. Kyllingstad wrote:
> My reasons were what I said in my other post:  In the time I have been using the 'environment' API -- that is, for 2 1/2 years (I checked) -- I don't think there is a *single* time when I've chosen environment[var] over environment.get(var, null).
>
> The thing about the process environment, as opposed to an associative array inside your own program, is that you can never be certain which variables are defined and which aren't.

Indeed. If a program expects a variable to be set when it isn't, the program should fail. Throwing in opIndex is one way to implement that.

> This means that you will almost *always* have to check whether a variable exists before using it, thus rendering opIndex() pretty much useless for most cases.

Check... and do what? Print a nicer error message?

> Furthermore, I really don't think it is too much to expect that a user of a systems language such as D checks the return values of functions that may return a 'null' value.

Expecting any sorts of things from the library user is not the way to go. It is the same reason why returning integer error codes has gone by way of history in favor of exception handling: checking all return values is cumbersome, it requires writing more code, many programmers don't do it, and the result is bad programs. The simplest code should also be correct, this is one of D's principles.

>> Speaking of shells, I noticed you hardcode cmd.exe in std.process2. That's another bug, it should look at the COMSPEC variable.
>
> Thanks, I didn't know that.  On POSIX, the -c switch is standard, and works on most, if not all, shells.  Can we assume that /C is equally standardised on Windows shells?

I believe so.
February 25, 2013
On 2/23/2013 6:58 PM, H. S. Teoh wrote:
> On Sat, Feb 23, 2013 at 06:46:13PM -0800, Jonathan M Davis wrote:
>> Possibly, but Walter takes a very dim view on most any code breakage,
>> even if it means simply changing a makefile to make your code work
>> again,
>
> I find this rather frustrating...

Consider the common complaint from numerous people that "my code breaks with every new release".

Even if the fix is "simple".

Just today, rdmd doesn't compile anymore.

February 25, 2013
On Monday, 25 February 2013 at 15:07:47 UTC, Vladimir Panteleev wrote:
> On Monday, 25 February 2013 at 06:41:32 UTC, Lars T. Kyllingstad wrote:
>> Sure, I can think of another example.  But I wouldn't read too much into this one; it was never meant as a demonstration of the "correct" way to open a web page.   It was just a simple example of spawnProcess() usage that uses a cross-platform application everyone's heard of.
>>
>> After all, you *could* argue this way about almost any kind of application which wasn't just invented for the sake of the example.  (In the last one, shouldn't we open the user's preferred word processor, etc?)
>
> The question is, what is the intent? Is it to just open some URL, or to specifically start Firefox? The same applies to the word processor case - if the document is in a file format understood by several applications, is the intent to simply open the document, or to open the document in that specific application?
>
> Now, the documentation clearly says that the example specifically launches Firefox. However, that doesn't mean that someone won't reach out for that example when hastily putting together an application that needs to open an URL. After all, it's at the top of the file, and they may not even know about the existence of the browse function which actually does what they intend.
>
> How about using "lynx -dump http://dlang.org/"? Dumping a text representation of a webpage is a feature specific to lynx, so the intent is clearer.

That is also incredibly obscure.  I'd venture a guess that only ~10% of D's user base have even heard of Lynx.  Everyone knows firefox, and will understand what the example is supposed to illustrate.  (I admit that the ls/grep examples will also be rather incomprehensible to someone not familiar with the *NIX command line, and I will replace them with something else.  The D toolchain, as you suggest below, is a very good idea.)

BTW, browse() should never have been added to std.process, in my opinion.  Maybe to some other utility module, but then it should at least be done right, and be properly documented.  What does it actually do?  There is no way to tell unless you read the source.  (And then, it turns out that it spawns a new process for the browser and returns immediately, but it does not return a process ID that you can poll or wait for.  Bah.)


>>> 2. (Nitpick) The grep example uses a POSIX quoting syntax (single quotes). Would be better to use double quotes, or pass as array to avoid one more needless OS-specific element.
>>
>> Actually, the quotes can just be removed altogether.
>
> OK, and now it's worse: your example uses syntax that's specific to std.process2. If you type that command in the shell, you'll get different behavior (the backslash will escape the . as a shell escape, not a RE escape).

[I am going to let slip here that you almost have me convinced with many of your arguments below, but I am still going to play devil's advocate for a bit.]

It is not worse.  It is a lot simpler, because the programmer does not need to know anything about the underlying platform.  They only need to know one rule:  If your arguments contain spaces, use the array functions.  I don't think the generic process-spawning functions in std.process should be bound by, or tied to, the syntax of whatever shell the programmer (or the end user) prefers.


[...]

>> Personally, I don't think they should be part of the public API.  They are inherently platform-specific, and we've tried to keep the module as platform-agnostic as possible.
>
> Constructing scripts is bound to be platform-specific. The current module version allows constructing batch files on POSIX.
>
> Here's a practical use case example for this feature: DMD uses the same syntax for response files on all platforms, and it follows the Windows command-line parsing rules. Currently, rdmd uses escapeWindowsArgument to build that response file on all platforms.

Point taken.


>> Besides, they are not really usable with any of the other functions, and I am afraid it will be interpreted that way if we make them public.
>
> This is actually a design problem in the new module, which I haven't discussed yet. Have a look at the very last example in the current std.process docs. How do you accomplish that correctly in the new version, without manually piping the inputs yourself? You can't.

I grudgingly admit that this is true.


>> [...]
>>
>> The way it is now, the rules (if you can call it that) are exceedingly simple, and they are the same on all platforms.  This has the added benefit of discouraging platform-dependent client code.
>
> OK, then picture the following situation.
>
> A user of the new module starts using the module, and invokes a specific command using the spawnProcess overload that takes it as a single string. Convenient, right? Then, as the program evolves, the string becomes an enum, then a config variable, which the user can adjust.
>
> Then, a end-user tries setting the config variable to a path that contains spaces, and everything breaks. Wrapping the path in quotes does not help either. Due to the way the function is designed, it is IMPOSSIBLE for the end-user to configure the application to launch a program located at a path containing spaces. To end-users, this comes off as a classical problem in badly written applications that don't handle command-line escaping properly.

Exposing the specifics of whatever programming language you are using to the end user?  I would just call that bad application programming.  If anything, you should be using one of the 'shell' functions in this case, not spawnProcess.


> This problem is as with any case of an interface which works in simple cases, but behaves unexpectedly in more complicated cases: it is bad design (convenience or not), and must be avoided.
>
> I suggest that either the overloads which take a single string be removed, or that they spawn a shell instead, and let the shell do the command-line splitting. Together with my command and filename escaping functions, they should allow the user to achieve any combination of executing commands with arbitrary punctuation in the program path or arguments, as well as redirecting the output to files (again, with correctly-escaped filenames) or other programs using the existing shell syntax present on both platforms.

You almost have me convinced that the single-string non-shell functions must go.  In the case of pipeProcess() and execute(), pipeShell() and shell() do the same job, with (arguably, still) less surprises.  Maybe it would then be a good idea to add a spawnShell() function to go with spawnProcess().

The escape*() functions need to be better documented.  Am I correct that they quote according to 'cmd.exe' rules on Windows and 'sh' rules on POSIX?

Lars
February 25, 2013
On Monday, 25 February 2013 at 19:38:59 UTC, Vladimir Panteleev wrote:
> On Monday, 25 February 2013 at 19:28:33 UTC, Lars T. Kyllingstad wrote:
>> This means that you will almost *always* have to check whether a variable exists before using it, thus rendering opIndex() pretty much useless for most cases.
>
> Check... and do what? Print a nicer error message?

That would depend on the application.


>> Furthermore, I really don't think it is too much to expect that a user of a systems language such as D checks the return values of functions that may return a 'null' value.
>
> Expecting any sorts of things from the library user is not the way to go. It is the same reason why returning integer error codes has gone by way of history in favor of exception handling: checking all return values is cumbersome, it requires writing more code, many programmers don't do it, and the result is bad programs. The simplest code should also be correct, this is one of D's principles.

Exceptions are designed to handle exceptional cases.  A missing environment variable isn't exceptional, it is commonplace.

Lars
February 25, 2013
On Mon, 25 Feb 2013 15:09:14 -0500, Lars T. Kyllingstad <public@kyllingen.net> wrote:


> Exceptions are designed to handle exceptional cases.  A missing environment variable isn't exceptional, it is commonplace.

+1

-Steve
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18