September 19, 2018
On 09/19/2018 02:26 AM, Vladimir Panteleev wrote:
> On Wednesday, 19 September 2018 at 05:49:41 UTC, Nick Sabalausky (Abscissa) wrote:
>> [...]
> 
> Someone mentioned in this thread that .NET runtime does do the long-path workaround automatically. One thing we could do is copy EXACTLY what C# is doing.
> 

This is a complete textbook example of the "appeal to authority" fallacy.

If an approach is valid, then it stands on its own merits regardless of whether or not Microsoft implemented it.

If an approach in invalid, then it fails on its own demerits regardless of whether or not Microsoft implemented it.

What MS has or hasn't implemented and released is completely irrelevant WRT validity and correctness.

What it *might* be useful for is as a starting point for further exploration. But that is all.

> However, there are still drawbacks to this:
> 
> - There is still the matter of overhead (one OS API call (GetCurrentDirectory) and at least one GC allocation (for the current directory buffer)).

If one extra OS API call + allocation per std.file API call is unacceptable, then explain how it is unacceptable. I disagree that it is significant enough to be unacceptable.

If a user needs to optimize their already-working-for-all-accepted-inputs application, then they are free to do so. I argue that building this into the standard library's default behaviour amounts to mandatory premature optimization, prioritizing premature optimization over correctness. Prove me wrong.

> - Using paths longer than MAX_PATH is an exceptional situation. Putting the workaround in the main code path penalizes 99.9% of use cases.

I have many filepaths on my system right now which exceed MAX_PATH in total length. I submit that this "penalty" you speak of is nothing more than a trivial performance boost at the expense of correctness. Furthermore, I submit that long paths which need extra optimization are MORE exceptional than long paths which do NOT need extra optimization.

> - The registry switch in newer Windows versions removes the need for this workaround, so systems with it enabled are penalized as well.

Using the Phobos-based workaround on a system WITH the longpath setting supported and enabled results in slightly reduced performance (which can be overridden and optimized when necessary). Note that it is possible (and very simple) to detect this situation and handle it optimally by skipping the workaround.

OTOH, NOT using the Phobos-based workaround on a system where the longpath setting is NOT supported *OR* NOT enabled results in erroneous behavior. Not something as trivial as slightly-degraded performance on IO access.

The superior default is clear: Use the workaround except where the workaround in known to be safe to omit.

> - There is still the matter regarding special filenames,

If you're referring to NUL, COM1, COM2, etc, then this is completely orthogonal.

> as well as whet her the expected behavior is really to succeed and create paths inaccessible to most software, instead of failing.

Ok, suppose we decide "Sure, we have reason to believe there may be a significant amount of software on Windows which fails to handle long paths and we want to ensure maximum compatibility with those admittedly broken programs." That's fine. I can get behind that. HOWEVER, that does NOT mean we should leave our APIs as they are, because currently, our APIs fail at that goal. Instead, what it really means is that our APIs should be designed to *REJECT* long paths with an appropriately meaningful error message - and a reasonable workaround - and NOT to blindly just pass them along as they currently do.

Either way, Phobos needs changed:

Do you believe D should prevent its own software from being broken on long paths? Then Phobos should be modified to detect and fix long paths.

Do you believe D should permit breakage on long paths and encourage its programs to play nicely with other non-D Windows software that is *also* broken on long paths? Then Phobos should be modified to detect and *reject* long paths.

Either way, the current Phobos behavior is clearly the worst of both worlds and needs modification.
September 19, 2018
> They are certainly going to be less expensive that actual filesystem operations that hit the physical disk, but it will still be an unwanted overhead in 99.9% of cases.
>
> In any case, the overhead is only one issue.

Seriously, checking the file path string *length* is above 260 characters to see if it needs to be fixed is not what I call an overhead.

And IF the path is indeed too long, IN THOSE CASES personally I'd prefer that the D standard library fixes the path in order to make the disk/file operation succeed, than having my application crash, because I didn't know I had to put a "version ( Windows )" fix somewhere in my code.

But hey, I may be wrong, software robustness and stability is often much overrated... ;)

September 19, 2018
On Wednesday, 19 September 2018 at 08:18:38 UTC, Nick Sabalausky (Abscissa) wrote:
>> Someone mentioned in this thread that .NET runtime does do the long-path workaround automatically. One thing we could do is copy EXACTLY what C# is doing.
>> 
>
> This is a complete textbook example of the "appeal to authority" fallacy.
>
> If an approach is valid, then it stands on its own merits regardless of whether or not Microsoft implemented it.
>
> If an approach in invalid, then it fails on its own demerits regardless of whether or not Microsoft implemented it.
>
> What MS has or hasn't implemented and released is completely irrelevant WRT validity and correctness.
>
> What it *might* be useful for is as a starting point for further exploration. But that is all.

No, absolutely not.

Microsoft is in charge of the implementation. We can't know that any deviation from Microsoft's algorithm will work in all situations, or all past/future implementations of the API. There is the considerable possibility that there are situations which we cannot foresee from our limited knowledge of the problem and Windows API implementation; on the other hand, Microsoft not only has complete knowledge of the implementation, but also controls its future. They have an incentive to keep the .NET algorithm working.

If we deviate from the .NET algorithm and D breaks (but not C#), it is our fault.

If we implement the .NET algorithm, then we are as good as C#. If it breaks, it's Microsoft's fault.

You cannot evaluate any intrinsic merit here because the result is beyond your control.

> If one extra OS API call + allocation per std.file API call is unacceptable, then explain how it is unacceptable. I disagree that it is significant enough to be unacceptable.

It is not unacceptable, but it is a drawback.

> If a user needs to optimize their already-working-for-all-accepted-inputs application, then they are free to do so. I argue that building this into the standard library's default behaviour amounts to mandatory premature optimization, prioritizing premature optimization over correctness. Prove me wrong.

You could extend this argument to any severity of workarounds. Where do you draw the line?

>> - Using paths longer than MAX_PATH is an exceptional situation. Putting the workaround in the main code path penalizes 99.9% of use cases.
>
> I have many filepaths on my system right now which exceed MAX_PATH in total length. I submit that this "penalty" you speak of is nothing more than a trivial performance boost at the expense of correctness. Furthermore, I submit that long paths which need extra optimization are MORE exceptional than long paths which do NOT need extra optimization.

Optimization is the least concern.

>> - The registry switch in newer Windows versions removes the need for this workaround, so systems with it enabled are penalized as well.
>
> Using the Phobos-based workaround on a system WITH the longpath setting supported and enabled results in slightly reduced performance (which can be overridden and optimized when necessary).

I'm more concerned about differences in behavior.

> OTOH, NOT using the Phobos-based workaround on a system where the longpath setting is NOT supported *OR* NOT enabled results in erroneous behavior.

I disagree that failure on paths exceeding MAX_PATH is necessarily erroneous behavior. The API reports an error given the user's path, so should Phobos.

> The superior default is clear: Use the workaround except where the workaround in known to be safe to omit.

We don't even have an algorithm for determining for sure when the workaround is needed.

>> - There is still the matter regarding special filenames,
>
> If you're referring to NUL, COM1, COM2, etc, then this is completely orthogonal.

Yes. How so? It is the same issue: paths with certain properties are valid on all platforms except on Windows. Phobos errors out when attempting to access/create them. A simple workaround is available: expand/normalize the path, prepend the UNC prefix, and use Unicode APIs.

>> as well as whet her the expected behavior is really to succeed and create paths inaccessible to most software, instead of failing.
>
> Ok, suppose we decide "Sure, we have reason to believe there may be a significant amount of software on Windows which fails to handle long paths and we want to ensure maximum compatibility with those admittedly broken programs." That's fine. I can get behind that. HOWEVER, that does NOT mean we should leave our APIs as they are, because currently, our APIs fail at that goal. Instead, what it really means is that our APIs should be designed to *REJECT* long paths with an appropriately meaningful error message - and a reasonable workaround - and NOT to blindly just pass them along as they currently do.

This is not possible, because you need to precisely know how the implementation will handle the path. Considering the implementation's behavior can be configured by the user, I don't think this is feasible.

> Either way, Phobos needs changed:
>
> Do you believe D should prevent its own software from being broken on long paths? Then Phobos should be modified to detect and fix long paths.
>
> Do you believe D should permit breakage on long paths and encourage its programs to play nicely with other non-D Windows software that is *also* broken on long paths? Then Phobos should be modified to detect and *reject* long paths.
>
> Either way, the current Phobos behavior is clearly the worst of both worlds and needs modification.

Sorry, I don't see how you're reaching that conclusion. Looks like a false dichotomy.

September 19, 2018
On 09/19/2018 02:55 AM, Vladimir Panteleev wrote:
> On Wednesday, 19 September 2018 at 06:34:33 UTC, Nick Sabalausky (Abscissa) wrote:
>> - Does it actually, necessarily perform those additional OS calls?
> 
> We need to expand relative paths to absolute ones, for which we need to fetch the current directory.
> 

So in other words, NO, it does NOT necessarily perform additional OS calls. It ONLY performs an additional call if the given path is relative *AND* if we've decided to not simply reject too-long relative paths outright (which I'd be fine with as a compromise. At least it would be well-defined and enforced with a meaningful message.)

>> - Is it really?
> 
> Is what really what? If you mean the memory allocation, we do need a buffer to store the current directory. We also need to canonicalize away things like \..\, though we may be able to get away with it without allocating.

So, in many cases, it's NOT really "a good deal of extra logic that performs additional OS calls and generates additional GC garbage".

>> - If it actually does, are those additional, necessarily OS calls prohibitively expensive?
> 
> They are certainly going to be less expensive that actual filesystem operations that hit the physical disk,

Sounds like QED to me. Especially if the alternative is silently incorrect behaviour on an entirely realistic subset of cases. (Realistic enough that both the thread's OP and the bug report's OP each ran into purely by accident.)

>but it will still be an unwanted 
> overhead in 99.9% of cases.
> 

That's an extremely exaggerated figure, and we've already established that the overhead is minor and often able to be elided. Weighted against the cost of incorrect behaviour on a subset of non-rejected inputs, I'd say that's a very clear "Yes, please!"

The extreme minority of currently-hypothetical cases which require minimal overhead for individual file I/O operations are free to low-level optimize themselves as-needed. Correct behaviour should never be sacrificed for minor performance tweaks when the minor performance tweak can still be obtained through other means if absolutely necessary.

> In any case, the overhead is only one issue.
> 

What's the other issue(s)?
September 19, 2018
On Wednesday, 19 September 2018 at 08:37:17 UTC, Nick Sabalausky (Abscissa) wrote:
> What's the other issue(s)?

Essentially they boil down to "it is impossible to prove the algorithm is correct" (for both detecting when the path fix is needed, and fixing the path). Forcing the path transformation can introduce regressions, or make the situation worse on systems where it's not needed.

September 19, 2018
On Wednesday, 19 September 2018 at 08:36:35 UTC, Vladimir Panteleev wrote:
>> If you're referring to NUL, COM1, COM2, etc, then this is completely orthogonal.
>
> Yes. How so? It is the same issue: paths with certain properties are valid on all platforms except on Windows. Phobos errors out when attempting to access/create them. A simple workaround is available: expand/normalize the path, prepend the UNC prefix, and use Unicode APIs.

I just remembered, there is a third class of paths with these properties: paths containing directory components that begin or end with spaces.

There are probably more... I think some special characters are also valid only in UNC paths.

September 19, 2018
On Wednesday, 19 September 2018 at 08:46:13 UTC, Vladimir Panteleev wrote:
> On Wednesday, 19 September 2018 at 08:36:35 UTC, Vladimir Panteleev wrote:
>>> If you're referring to NUL, COM1, COM2, etc, then this is completely orthogonal.
>>
>> Yes. How so? It is the same issue: paths with certain properties are valid on all platforms except on Windows. Phobos errors out when attempting to access/create them. A simple workaround is available: expand/normalize the path, prepend the UNC prefix, and use Unicode APIs.
>
> I just remembered, there is a third class of paths with these properties: paths containing directory components that begin or end with spaces.
>
> There are probably more... I think some special characters are also valid only in UNC paths.

BTW, something follows from the above:

write(`C:\` ~ (short path) ~  `con`) will fail

but:

write(`C:\` ~ (long path) ~ `con`) will succeed.

This is just one issue I've noticed... there's probably more lurking. This is why I think the whole idea is bankrupt.

September 19, 2018
On 09/19/2018 04:41 AM, Vladimir Panteleev wrote:
> On Wednesday, 19 September 2018 at 08:37:17 UTC, Nick Sabalausky (Abscissa) wrote:
>> What's the other issue(s)?
> 
> Essentially they boil down to "it is impossible to prove the algorithm is correct" (for both detecting when the path fix is needed, and fixing the path).

If you're referring to the inability to deterministically reason about just what in the h*ll MS's API's actually do, then I agree. But the problem is, it's equally true of all Win APIs. Only way to fix that is to omit Win support entirely.

Otherwise, I disagree. I think it is not only provable, but also unnecessary to prove simply because such proof has never been necessary for Phobos, and there is nothing inherent to this problem which is inherently more complicated than anything already existing in Phobos (you can even omit the questionable modules like std.xml, it all still holds). Otherwise, present counterexamples demonstrating the inherent ambiguity/non-provability.

> Forcing the path transformation can introduce regressions,

All phobos/compiler changes have the potential for regressions, plus we have unittests. Unless you can demonstrate how this necessarily goes above and beyond the risk from any other change in a way that cannot be sufficiently mitigated by tests, then the concern is irrelevant.

> or make the situation worse on systems where it's not needed.
> 

Provide an example where the situation is made worse.
September 19, 2018
On Wednesday, 19 September 2018 at 09:16:30 UTC, Nick Sabalausky (Abscissa) wrote:
>> Essentially they boil down to "it is impossible to prove the algorithm is correct" (for both detecting when the path fix is needed, and fixing the path).
>
> If you're referring to the inability to deterministically reason about just what in the h*ll MS's API's actually do, then I agree. But the problem is, it's equally true of all Win APIs. Only way to fix that is to omit Win support entirely.

It's not our job to fix it. Just provide a D interface to it, which already we do well.

> Otherwise, I disagree. I think it is not only provable, but also unnecessary to prove simply because such proof has never been necessary for Phobos, and there is nothing inherent to this problem which is inherently more complicated than anything already existing in Phobos (you can even omit the questionable modules like std.xml, it all still holds).

No, we are mucking with data on the way between the user's program and the OS, because we think we can fix it. Not only should we not be doing that in the first place, but even if we get it right, it might still not be what the user wants.

> Otherwise, present counterexamples demonstrating the inherent ambiguity/non-provability.

I don't understand what you mean here.

>> Forcing the path transformation can introduce regressions,
>
> All phobos/compiler changes have the potential for regressions, plus we have unittests. Unless you can demonstrate how this necessarily goes above and beyond the risk from any other change in a way that cannot be sufficiently mitigated by tests, then the concern is irrelevant.

This might be a change which we won't be able to back out of if it turns out to be a bad idea, because then we break other classes of programs that depend on this change. See https://forum.dlang.org/post/eepblrtjmqzbtopylfib@forum.dlang.org for an example.

>> or make the situation worse on systems where it's not needed.
>
> Provide an example where the situation is made worse.

1. A user is happily using D on a system where the workaround is not needed.
2. A new D version comes out, with the workaround forcibly enabled.
3. The user's program is now broken.

If you provide a specific implementation for the workaround you're envisioning, I could try to come up with more specific situations where it would fail. There's been lots of reasons mentioned in this thread where things can go wrong, and surely there will be more that we can't think of ahead of time.

September 19, 2018
On Wednesday, 19 September 2018 at 08:18:38 UTC, Nick Sabalausky (Abscissa) wrote:
> Instead, what it really means is that our APIs should be designed to *REJECT* long paths with an appropriately meaningful error message

On my Windows VM, I get:

C:\(long path here): The filename or extension is too long. (error 206)

This seems like a completely reasonable error message to me, so I think we're good there already.