Jump to page: 1 2 3
Thread overview
Am I reading this wrong, or is std.getopt *really* this stupid?
Mar 23, 2018
H. S. Teoh
Mar 24, 2018
Chris Katko
Mar 24, 2018
Seb
Mar 24, 2018
rumbu
Mar 24, 2018
H. S. Teoh
Mar 24, 2018
H. S. Teoh
Mar 24, 2018
H. S. Teoh
Mar 25, 2018
Johannes Pfau
Mar 26, 2018
Adam D. Ruppe
Mar 25, 2018
Abdulhaq
Mar 25, 2018
Abdulhaq
Mar 24, 2018
Jon Degenhardt
Mar 25, 2018
Jonathan M Davis
Mar 25, 2018
Seb
Mar 25, 2018
Vladimir Panteleev
Mar 26, 2018
Adam D. Ruppe
Mar 25, 2018
H. S. Teoh
Mar 25, 2018
H. S. Teoh
Mar 25, 2018
Walter Bright
Mar 25, 2018
Rubn
Mar 25, 2018
Seb
Mar 25, 2018
Walter Bright
Mar 24, 2018
Adam D. Ruppe
March 23, 2018
I just ran into this seemingly small problem:

	void main(string[] args) {
		string[] searchPaths;
		getopt(args,
			"l", (string opt, string arg) {
				// searches through searchPaths
				openFile(arg);
			},
			"I", (string opt, string arg) {
				searchPaths ~= arg;
			},
			...
		);
	}

Running the program with:

	program -I /path/to -l myfile

causes a runtime error that 'myfile' cannot be found, even though it actually exists in /path/to/*.  I thought it was odd, since obviously the -I is parsed before the -l, so searchPaths should already be set when -l is seen, right?

Well, looking at the implementation of std.getopt turned up the disturbing fact that the program's argument list is actually scanned *multiple times*, one for each possible option(!).  Besides the bogonity that whether or not searchPaths will be set prior to finding -l depends on the order of arguments passed to getopt(), this also represents an O(n*m) complexity in scanning program arguments, where n = number of arguments and m = number of possible options.

And this is not to mention the fact that getoptImpl is *recursive template*.  Why, oh why?

Am I the only one who thinks the current implementation of getopt() is really stupid??  Can somebody please talk some sense into me, or point out something really obvious that I'm missing?


T

-- 
Государство делает вид, что платит нам зарплату, а мы делаем вид, что работаем.
March 23, 2018
On 3/23/18 7:29 PM, H. S. Teoh wrote:
> Well, looking at the implementation of std.getopt turned up the
> disturbing fact that the program's argument list is actually scanned
> *multiple times*, one for each possible option(!).  Besides the bogonity
> that whether or not searchPaths will be set prior to finding -l depends
> on the order of arguments passed to getopt(), this also represents an
> O(n*m) complexity in scanning program arguments, where n = number of
> arguments and m = number of possible options.
> 
> And this is not to mention the fact that getoptImpl is *recursive
> template*.  Why, oh why?
> 
> Am I the only one who thinks the current implementation of getopt() is
> really stupid??  Can somebody please talk some sense into me, or point
> out something really obvious that I'm missing?

Affirmative. The implementation is quadratic (including a removal of the option from the string). This is intentional, i.e. understood and acknowledged while I was working on it. Given that the function is only called once per run and with a number of arguments at most in the dozens, by the time its complexity becomes an issue the function is long beyond its charter.

This isn't the only instance of quadratic algorithms in Phobos. Quicksort uses an insertion sort - a quadratic algorithm - for 25 elements or fewer. That algorithm may do 600 comparisons in the worst case, and it's potentially that many for each group of 25 elements in a large array.

Spending time on improving the speed of getopt is unrecommended. Such work would add no value.


Andrei
March 24, 2018
On Saturday, 24 March 2018 at 03:04:41 UTC, Andrei Alexandrescu wrote:
> On 3/23/18 7:29 PM, H. S. Teoh wrote:
>> Well, looking at the implementation of std.getopt turned up the
>> disturbing fact that the program's argument list is actually scanned
>> *multiple times*, one for each possible option(!).  Besides the bogonity
>> that whether or not searchPaths will be set prior to finding -l depends
>> on the order of arguments passed to getopt(), this also represents an
>> O(n*m) complexity in scanning program arguments, where n = number of
>> arguments and m = number of possible options.
>> 
>> And this is not to mention the fact that getoptImpl is *recursive
>> template*.  Why, oh why?
>> 
>> Am I the only one who thinks the current implementation of getopt() is
>> really stupid??  Can somebody please talk some sense into me, or point
>> out something really obvious that I'm missing?
>
> Affirmative. The implementation is quadratic (including a removal of the option from the string). This is intentional, i.e. understood and acknowledged while I was working on it. Given that the function is only called once per run and with a number of arguments at most in the dozens, by the time its complexity becomes an issue the function is long beyond its charter.
>
> This isn't the only instance of quadratic algorithms in Phobos. Quicksort uses an insertion sort - a quadratic algorithm - for 25 elements or fewer. That algorithm may do 600 comparisons in the worst case, and it's potentially that many for each group of 25 elements in a large array.
>
> Spending time on improving the speed of getopt is unrecommended. Such work would add no value.
>
>
> Andrei

Is there a possibility of improving this function?

 - While quadratic, for low N, quadratic isn't a big deal. So at what point does quadratic for this function become "a problem"?

 - If it is a problem, what's stopping someone from improving it?

Last question though, is there any kind of list of features, and minor features and fixes that can or need to be done? Perhaps it already exists, but it seems like it'd be great to have a wiki of contribution sites (like this function) that someone could just browse and go "Hey, I know how to do X, maybe I'll take a crack at it." That way, devs who don't have time to improve something "low on the list" could still outsource it in a clear list instead of people who just happen to see it on the forum at the right place right time.
March 24, 2018
On Saturday, 24 March 2018 at 05:55:53 UTC, Chris Katko wrote:
> On Saturday, 24 March 2018 at 03:04:41 UTC, Andrei Alexandrescu wrote:
>> On 3/23/18 7:29 PM, H. S. Teoh wrote:
>>> Well, looking at the implementation of std.getopt turned up the
>>> disturbing fact that the program's argument list is actually scanned
>>> *multiple times*, one for each possible option(!).  Besides the bogonity
>>> that whether or not searchPaths will be set prior to finding -l depends
>>> on the order of arguments passed to getopt(), this also represents an
>>> O(n*m) complexity in scanning program arguments, where n = number of
>>> arguments and m = number of possible options.
>>> 
>>> And this is not to mention the fact that getoptImpl is *recursive
>>> template*.  Why, oh why?
>>> 
>>> Am I the only one who thinks the current implementation of getopt() is
>>> really stupid??  Can somebody please talk some sense into me, or point
>>> out something really obvious that I'm missing?
>>
>> Affirmative. The implementation is quadratic (including a removal of the option from the string). This is intentional, i.e. understood and acknowledged while I was working on it. Given that the function is only called once per run and with a number of arguments at most in the dozens, by the time its complexity becomes an issue the function is long beyond its charter.
>>
>> This isn't the only instance of quadratic algorithms in Phobos. Quicksort uses an insertion sort - a quadratic algorithm - for 25 elements or fewer. That algorithm may do 600 comparisons in the worst case, and it's potentially that many for each group of 25 elements in a large array.
>>
>> Spending time on improving the speed of getopt is unrecommended. Such work would add no value.
>>
>>
>> Andrei
>
> Is there a possibility of improving this function?
>
>  - While quadratic, for low N, quadratic isn't a big deal. So at what point does quadratic for this function become "a problem"?
>
>  - If it is a problem, what's stopping someone from improving it?
>
> Last question though, is there any kind of list of features, and minor features and fixes that can or need to be done? Perhaps it already exists, but it seems like it'd be great to have a wiki of contribution sites (like this function) that someone could just browse and go "Hey, I know how to do X, maybe I'll take a crack at it." That way, devs who don't have time to improve something "low on the list" could still outsource it in a clear list instead of people who just happen to see it on the forum at the right place right time.

Yes, Bugzilla is full of excellent ideas:

https://issues.dlang.org/buglist.cgi?component=phobos&list_id=220544&product=D&resolution=---

There are even some tags like "bootcamp" for someone who is looking to get started:

https://issues.dlang.org/buglist.cgi?component=phobos&keywords=bootcamp%2C%20preapproved&keywords_type=anywords&list_id=220545&product=D&query_format=advanced&resolution=---

We have also recently started to experiment with GitHub's new project dashboards. Currently they are tracking projects like improving the documentation, @safe-ty, DIP1000 etc.:

https://github.com/dlang/phobos/projects

DMD has a similar set which is based on Walter's recent post [1]

https://github.com/dlang/dmd/projects

Last, but not least there's a "Get involved" guide at the wiki:

https://wiki.dlang.org/Get_involved

As you couldn't find any of these pages, please let us know where you looked first, so that maybe we can make it easier for future people to find this information ;-)

[1] https://forum.dlang.org/post/p6oibo$1lmi$1@digitalmars.com
March 24, 2018
On Saturday, 24 March 2018 at 06:04:23 UTC, Seb wrote:

> Yes, Bugzilla is full of excellent ideas:
>
> https://issues.dlang.org/buglist.cgi?component=phobos&list_id=220544&product=D&resolution=---
>
> There are even some tags like "bootcamp" for someone who is looking to get started:
>
> https://issues.dlang.org/buglist.cgi?component=phobos&keywords=bootcamp%2C%20preapproved&keywords_type=anywords&list_id=220545&product=D&query_format=advanced&resolution=---
>
> We have also recently started to experiment with GitHub's new project dashboards. Currently they are tracking projects like improving the documentation, @safe-ty, DIP1000 etc.:
>
> https://github.com/dlang/phobos/projects
>
> DMD has a similar set which is based on Walter's recent post [1]
>
> https://github.com/dlang/dmd/projects
>
> Last, but not least there's a "Get involved" guide at the wiki:
>
> https://wiki.dlang.org/Get_involved
>
> As you couldn't find any of these pages, please let us know where you looked first, so that maybe we can make it easier for future people to find this information ;-)
>
> [1] https://forum.dlang.org/post/p6oibo$1lmi$1@digitalmars.com

I saw this kind of "call to arms" post spreading around the forum in the last days. Very nice to have some kind of plan, but before being involved, I really would like to know how this can be done on Windows. And please don't redirect me to the wiki, the information there is clearly outdated and Linux oriented (at least the test stuff)

https://forum.dlang.org/post/gxxfmrnezfrlodlhpiwe@forum.dlang.org
https://forum.dlang.org/post/tulbhulbeqqzofdxevcg@forum.dlang.org

Thanks.


March 24, 2018
On 03/24/2018 01:55 AM, Chris Katko wrote:
> On Saturday, 24 March 2018 at 03:04:41 UTC, Andrei Alexandrescu wrote:
>> On 3/23/18 7:29 PM, H. S. Teoh wrote:
>>> Well, looking at the implementation of std.getopt turned up the
>>> disturbing fact that the program's argument list is actually scanned
>>> *multiple times*, one for each possible option(!).  Besides the bogonity
>>> that whether or not searchPaths will be set prior to finding -l depends
>>> on the order of arguments passed to getopt(), this also represents an
>>> O(n*m) complexity in scanning program arguments, where n = number of
>>> arguments and m = number of possible options.
>>>
>>> And this is not to mention the fact that getoptImpl is *recursive
>>> template*.  Why, oh why?
>>>
>>> Am I the only one who thinks the current implementation of getopt() is
>>> really stupid??  Can somebody please talk some sense into me, or point
>>> out something really obvious that I'm missing?
>>
>> Affirmative. The implementation is quadratic (including a removal of the option from the string). This is intentional, i.e. understood and acknowledged while I was working on it. Given that the function is only called once per run and with a number of arguments at most in the dozens, by the time its complexity becomes an issue the function is long beyond its charter.
>>
>> This isn't the only instance of quadratic algorithms in Phobos. Quicksort uses an insertion sort - a quadratic algorithm - for 25 elements or fewer. That algorithm may do 600 comparisons in the worst case, and it's potentially that many for each group of 25 elements in a large array.
>>
>> Spending time on improving the speed of getopt is unrecommended. Such work would add no value.
>>
>>
>> Andrei
> 
> Is there a possibility of improving this function?

Most likely it can be improved in terms of features.

>   - While quadratic, for low N, quadratic isn't a big deal. So at what point does quadratic for this function become "a problem"?

At a point where a realistic benchmarks shows a need. Without a motivating measurement, making getopt faster would be a waste of time.

I mentioned another function: sort. For that, YES, the are ways of improving it. In fact, right after posting my message, I couldn't sleep thinking of a number of ways to improve the short sort part. We have precise benchmarks measuring the number of comparisons and swaps performed by our implementation of sort. Improving its performance lifts a lot of boats - many high-level algorithms use sort as an essential building block. There's a world of difference in impact of the speed of sort vs. speed of getopt.

Here are a few ideas for improving the small array sort part (the last mile of sort):

* Currently we switch to short sort when the number of elements to sort is smaller than max(32, 1024 / Elem.sizeof). Probably a better choice can be found through experimentation.

* Insertion sort does linear search in the already-sorted portion. Probably galloping search would fare better.

* Insertion sort starts from the end and grows the sorted portion down. Starting from the middle of the array and growing left and right simultaneously would slash the number of comparisons and swaps in half.

* There could be other algorithms better for short arrays, such as specialized versions of heapsort or smoothsort.

>   - If it is a problem, what's stopping someone from improving it?

Hopefully very little.

> Last question though, is there any kind of list of features, and minor features and fixes that can or need to be done? Perhaps it already exists, but it seems like it'd be great to have a wiki of contribution sites (like this function) that someone could just browse and go "Hey, I know how to do X, maybe I'll take a crack at it." That way, devs who don't have time to improve something "low on the list" could still outsource it in a clear list instead of people who just happen to see it on the forum at the right place right time.

Seb gave a great answer - thanks!


Andrei
March 24, 2018
On Sat, Mar 24, 2018 at 08:27:48AM -0400, Andrei Alexandrescu via Digitalmars-d wrote: [...]
> At a point where a realistic benchmarks shows a need. Without a motivating measurement, making getopt faster would be a waste of time.
[...]

Guys, for crying out loud, my original complaint was not *performance*, but that the (strange) choice of algorithm for getopt resulted in the very counterintuitive behaviour that the order options are processed depends on the order of option declarations rather than the order they were specified on the command-line. This makes it basically impossible to implement certain styles of option processing, such as that employed in the popular ImageMagick suite of tools, where it matters that options are processed in the order they are specified by the user, rather than some arbitrary (to a user who doesn't and shouldn't care to know the code) predetermined order.

My complaint about the quadratic algorithm was not in the fact that it's quadratic, but that it exhibited this strange (and annoying!) behaviour, especially since the saner (IMO) non-quadratic algorithm would have been the expected choice in the first place, that would *not* have had this problem. It felt almost like we went out of our way just to make things counterintuitive, with slowness added as a cherry on top.


T

-- 
"You are a very disagreeable person." "NO."
March 24, 2018
On Friday, 23 March 2018 at 23:29:48 UTC, H. S. Teoh wrote:
> I just ran into this seemingly small problem:

The way I'd do this is to only use getopt to build the lists, then actually process them externally. (lol adding another loop)

string[] searchPaths;
string[] files;

getopt(args,
  "l", &files,
  "I", &searchPaths
);

foreach(file; files)
  openFile(file);


then it is clear what order your operations are done in anyway, and you have a chance to perhaps report bad syntax before actually doing any real work.

Wouldn't it be weird for example if

$ cat foo.d --help


spat out the contents followed by the help?
March 24, 2018
On 03/24/2018 09:36 AM, H. S. Teoh wrote:
> On Sat, Mar 24, 2018 at 08:27:48AM -0400, Andrei Alexandrescu via Digitalmars-d wrote:
> [...]
>> At a point where a realistic benchmarks shows a need. Without a motivating
>> measurement, making getopt faster would be a waste of time.
> [...]
> 
> Guys, for crying out loud, my original complaint was not *performance*,
> but that the (strange) choice of algorithm for getopt resulted in the
> very counterintuitive behaviour that the order options are processed
> depends on the order of option declarations rather than the order they
> were specified on the command-line.

I'd have a difficult time interpreting the following as not performance-related:

> Well, looking at the implementation of std.getopt turned up the
> disturbing fact that the program's argument list is actually scanned
> *multiple times*, one for each possible option(!).  Besides the bogonity
> that whether or not searchPaths will be set prior to finding -l depends
> on the order of arguments passed to getopt(), this also represents an
> O(n*m) complexity in scanning program arguments, where n = number of
> arguments and m = number of possible options.

Anyhow. Right now the order of processing is the same as the lexical order in which flags are passed to getopt. There may be use cases for which that's the more desirable way to go about things, so if you author a PR to change the order you'd need to build an argument on why command-line order is better. FWIW the traditional POSIX doctrine makes behavior of flags independent of their order, which would imply the current choice is more natural.

> This makes it basically impossible
> to implement certain styles of option processing, such as that employed
> in the popular ImageMagick suite of tools, where it matters that options
> are processed in the order they are specified by the user, rather than
> some arbitrary (to a user who doesn't and shouldn't care to know the
> code) predetermined order.

This is an exaggeration. Yes you can't process with lambdas. You can always collect options first, process after. This is a well-supported use case.

> My complaint about the quadratic algorithm was not in the fact that it's
> quadratic, but that it exhibited this strange (and annoying!) behaviour,
> especially since the saner (IMO) non-quadratic algorithm would have been
> the expected choice in the first place, that would *not* have had this
> problem. It felt almost like we went out of our way just to make things
> counterintuitive, with slowness added as a cherry on top.

I want to be convinced. I think you'd need to build a better case on why you consider one behavior intuitive and the other counterintuitive.


Andrei
March 24, 2018
On Sat, Mar 24, 2018 at 12:11:18PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: [...]
> Anyhow. Right now the order of processing is the same as the lexical order in which flags are passed to getopt. There may be use cases for which that's the more desirable way to go about things, so if you author a PR to change the order you'd need to build an argument on why command-line order is better. FWIW the traditional POSIX doctrine makes behavior of flags independent of their order, which would imply the current choice is more natural.

So what about making this configurable?

And documented?  Last time I checked, this was not clearly stated in the docs.


[...]
> > My complaint about the quadratic algorithm was not in the fact that it's quadratic, but that it exhibited this strange (and annoying!) behaviour, especially since the saner (IMO) non-quadratic algorithm would have been the expected choice in the first place, that would *not* have had this problem. It felt almost like we went out of our way just to make things counterintuitive, with slowness added as a cherry on top.
> 
> I want to be convinced. I think you'd need to build a better case on why you consider one behavior intuitive and the other counterintuitive.
[...]

Honestly, I've wasted far too much time writing about this on the forum already.  In the time it took to argue about this, I could have already written my own version of getopt that does what I want, instead of fighting with strange design decisions in Phobos.  I'm not going to waste any more time arguing about this, since, after all, it *is* "just" getopt().

This was not the only issue I struggled with, as std.getopt has other design differences incompatible with Posix getopt() that makes it hard to support the original semantics of a previous C++ project ported to D. Yes, I could have used Posix getopt() from D, but that requires some ugly shim code, tons of toStringz/fromStringz, doesn't take advantage of things like automatic enum conversions, etc., which sux given that we're in D, not C++.

And given the defensiveness surrounding std.getopt, my conclusion can only be: dump std.getopt, roll my own.  It's sad, since in general Phobos design tends to be superior to its C++ counterparts.  But we then have warts like std.getopt that people refuse to acknowledge is a problem.  So be it.


T

-- 
In a world without fences, who needs Windows and Gates? -- Christian Surchi
« First   ‹ Prev
1 2 3