phobo's std.file is completely broke! (page 8) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » phobo's std.file is completely broke! (page 8)

September 19, 2018

Re: phobo's std.file is completely broke!

Posted by Nick Sabalausky (Abscissa)
in reply to Patrick Schluter

Nick Sabalausky (Abscissa)

Posted in reply to Patrick Schluter

On 09/17/2018 11:27 AM, Patrick Schluter wrote:
> On Monday, 17 September 2018 at 12:37:13 UTC, Temtaime wrote:
>>
>> It's problem with phobos.
>> It should be able handle all the paths whatever length they have, on all the platforms without noising the user.
>>
>> Even with performance penalty, but it should.

This actually leads to an interesting point. Let's change gears for a moment to "API Design Theory"...

Suppose you implement API function X which takes Y as an argument. Question: Should X accept values (or types) of Y for which X fails? Answer: Ideally, no. At least to the extent realistically possible.

So what should a well-designed function X do when faced with known-to-be unsupported input Y?[1] Here are the possibilities[2]:

([1] "Unsupported" is defined here as "for the given input, the function does not correctly and successfully perform its intended goal".)

([2] Again, we're assuming "to the extent realistically possible" is still in play here.)

1. Add support for input Y to function X.
2. Reject input Y (via either: abort, throw or static compile error)
3. Accept input Y and allow one of the following to occur[3]: Silently fail to perform the expected task, silently do the wrong thing, or trigger an unclear error (assert/exception/compile) from deeper in the callstack. ([3] Note that which one of these three possibilities occurs is dependent on function X's exact implementation details.)

Of these three possibilities for a function X faced with unsupported input Y, the first two options are (in theory) acceptable[4]. The third possibility is absolutely not acceptable [to the extent realistically possible.]

([4] The library containing function X may impose additional restrictions on what is/isn't acceptable.)

So, what does this mean in our specific situation?

If good API design is followed, then any filesystem-based API, when running on Windows and faced with a path exceeding the local system's size limit, must do one of the following:

1. Modify Phobos to support long filepaths even on Windows versions below Win10 v1607.
2. Detect and reject any non-\\?\ path longer than MAX_PATH-12 bytes[5].
3. Provide a very good justification why possibilities #1 and #2 are sufficiently unrealistic/problematic.

([5] Note that the following still technically satisfy possibility #2 since they do not involve passing unsupported input to the given function, but they could still be optionally determined unacceptable if desired: A. Rejecting all paths longer than MAX_PATH-12 bytes, even if \\?\-based. B. Rejecting too-long paths even on Win10 v1607+ with LongPathsEnabled set in the registry.)

> No, that's completely nuts!
> A library, especially a standard library, should not introduce new limitations, but pampering over the limitations of the platform is not the right thing to do.

This is debatable. Why, exactly, is pampering over pre-Win10-v1607's maximum non-\\?\ filepath length a bad thing? Exactly what problems does it cause?

> If the platforms API is piling POS, there's nothing a sane library can do about.

That is patently untrue. It might be true in specific circumstances, but it is not generally true. If you believe it to be true in this specific case, then please explain *how*/*why* there is nothing the library can do about it.

> If your app writes to a FAT12 formatted floppy disk you don't expect the library to implement code to alleviate its limitation, like 8+3 filenames or fixed number of files in the root directory.

The general rule of thumb is: "Typical situations should work as expected, atypical situations should be possible." Therefore, please explain *ANY* one of the following:

1. How writing to a FAT12 formatted floppy disk qualifies as a typical situation.

2. How abstracting over the MAX_PATH limitation makes writing to a FAT12 formatted floppy impossible.

3. How my claim of "Typical situations should work as expected, atypical situations should be possible" is fundamentally wrong for D.

September 19, 2018

Re: phobo's std.file is completely broke!

Posted by Vladimir Panteleev
in reply to Nick Sabalausky (Abscissa)

Vladimir Panteleev

Posted in reply to Nick Sabalausky (Abscissa)

On Wednesday, 19 September 2018 at 05:49:41 UTC, Nick Sabalausky (Abscissa) wrote:
> This actually leads to an interesting point. Let's change gears for a moment to "API Design Theory"...
>
> Suppose you implement API function X which takes Y as an argument. Question: Should X accept values (or types) of Y for which X fails? Answer: Ideally, no. At least to the extent realistically possible.
>
> So what should a well-designed function X do when faced with known-to-be unsupported input Y?[1] Here are the possibilities[2]:
>
> ([1] "Unsupported" is defined here as "for the given input, the function does not correctly and successfully perform its intended goal".)

I don't think that accurately describes the situation. Considering how there is now a toggle which changes the API implementation's behavior, the limitation is no longer really a part of the API, but a part of the implementation.

Consider, also, other implementations of the Win32 API (Wine and ReactOS), as well as possible future versions of Windows (which may do away with this limitation entirely).

> 1. Add support for input Y to function X.
> 2. Reject input Y (via either: abort, throw or static compile error)
> 3. Accept input Y and allow one of the following to occur[3]: Silently fail to perform the expected task, silently do the wrong thing, or trigger an unclear error (assert/exception/compile) from deeper in the callstack. ([3] Note that which one of these three possibilities occurs is dependent on function X's exact implementation details.)
>
> Of these three possibilities for a function X faced with unsupported input Y, the first two options are (in theory) acceptable[4]. The third possibility is absolutely not acceptable [to the extent realistically possible.]

This is, again, flawed logic. It is unreasonable to expect what the implementation of the underlying API will do, because it is outside of our control. Consider that this discussion would have occurred just before Microsoft added the registry switch to allow long path names. The solution would have been "obvious"; the reality is that it is an implementation we do not control and can change in the future at whim. At best, we can treat the API at face value and don't attempt to work around implementation quirks.

Consider also the case of special filenames like "con" or "prn". Creating these using the standard API is not allowed, but this can also be bypassed with UNC paths. All the arguments in favor of making Phobos support paths longer than MAX_PATH (using the UNC prefix) seem to also favor detecting and supporting these reserved file names. But is doing so really Phobos' burden?

We simply cannot define our API in terms of what we think the underlying implementation supports or doesn't.

>> No, that's completely nuts!
>> A library, especially a standard library, should not introduce new limitations, but pampering over the limitations of the platform is not the right thing to do.
>
> This is debatable. Why, exactly, is pampering over pre-Win10-v1607's maximum non-\\?\ filepath length a bad thing? Exactly what problems does it cause?

https://forum.dlang.org/post/bqsjebjxuljlqusaobst@forum.dlang.org

>> If the platforms API is piling POS, there's nothing a sane library can do about.
>
> That is patently untrue. It might be true in specific circumstances, but it is not generally true. If you believe it to be true in this specific case, then please explain *how*/*why* there is nothing the library can do about it.

It is not Phobos' job to work around quirks in implementations beyond our control which can change at any moment.

> The general rule of thumb is: "Typical situations should work as expected, atypical situations should be possible."

Operating on paths longer than MAX_PATH is not a typical situation.

September 19, 2018

Re: phobo's std.file is completely broke!

Posted by Vladimir Panteleev
in reply to Vladimir Panteleev

Vladimir Panteleev

Posted in reply to Vladimir Panteleev

On Wednesday, 19 September 2018 at 06:05:38 UTC, Vladimir Panteleev wrote:
> [...]

One more thing:

There is the argument that the expected behavior of Phobos functions creating filesystems objects with long paths is to succeed and create those files. However, this results in filesystem objects that most software will fail to access (everyone needs to also use the long paths workaround).

One point of view is that the expected behavior is that the functions succeed. Another point of view is that Phobos should not allow programs to create files and directories with invalid paths. Consider, e.g. that a user writes a program that creates a large tree of deeply nested filesystem objects. When they are done and wish to delete them, their file manager fails and displays an error. The user's conclusion? D sucks because it corrupts the filesystem and creates objects they can't operate with.

September 19, 2018

Re: phobo's std.file is completely broke!

Posted by Paolo Invernizzi
in reply to Vladimir Panteleev

Paolo Invernizzi

Posted in reply to Vladimir Panteleev

On Wednesday, 19 September 2018 at 06:05:38 UTC, Vladimir Panteleev wrote:

> Operating on paths longer than MAX_PATH is not a typical situation.

https://forum.rejectedsoftware.com/groups/rejectedsoftware.dub/thread/1499/

The worst situation was node.js on Windows,  anyway...

September 19, 2018

Re: phobo's std.file is completely broke!

Posted by Vladimir Panteleev
in reply to Nick Sabalausky (Abscissa)

Vladimir Panteleev

Posted in reply to Nick Sabalausky (Abscissa)

On Wednesday, 19 September 2018 at 05:49:41 UTC, Nick Sabalausky (Abscissa) wrote:
> [...]

Someone mentioned in this thread that .NET runtime does do the long-path workaround automatically. One thing we could do is copy EXACTLY what C# is doing.

The rationale being that:
- .NET is made by Microsoft
- The Windows API's filesystem implementation is made by Microsoft
- Given that these two are made by the same party, it's reasonable to assume that the .NET authors authoritatively "knew what they were doing" when implementing the workaround.
- The algorithm used by .NET is very likely to be supported by the API (even future implementations), as well as third-party implementations of the API.

However, there are still drawbacks to this:

- There is still the matter of overhead (one OS API call (GetCurrentDirectory) and at least one GC allocation (for the current directory buffer)).
- Using paths longer than MAX_PATH is an exceptional situation. Putting the workaround in the main code path penalizes 99.9% of use cases.
- The registry switch in newer Windows versions removes the need for this workaround, so systems with it enabled are penalized as well.
- There is still the matter regarding special filenames, as well as whether the expected behavior is really to succeed and create paths inaccessible to most software, instead of failing.

September 19, 2018

Re: phobo's std.file is completely broke!

Posted by Vladimir Panteleev
in reply to Paolo Invernizzi

Vladimir Panteleev

Posted in reply to Paolo Invernizzi

On Wednesday, 19 September 2018 at 06:16:21 UTC, Paolo Invernizzi wrote:
> On Wednesday, 19 September 2018 at 06:05:38 UTC, Vladimir Panteleev wrote:
>
>> Operating on paths longer than MAX_PATH is not a typical situation.
>
> https://forum.rejectedsoftware.com/groups/rejectedsoftware.dub/thread/1499/
>
> The worst situation was node.js on Windows,  anyway...

Not sure that's actually MAX_PATH related... cmd has a very low limit on command line length. Ran into this myself:

https://github.com/VerySleepy/tests/blob/721e52fcb14d8134394264586a4fe92e73574059/scripts/toolchains_download.cmd#L8

September 19, 2018

Re: phobo's std.file is completely broke!

Posted by Jonathan Marler
in reply to Vladimir Panteleev

Jonathan Marler

Posted in reply to Vladimir Panteleev

On Wednesday, 19 September 2018 at 06:11:22 UTC, Vladimir Panteleev wrote:
> On Wednesday, 19 September 2018 at 06:05:38 UTC, Vladimir Panteleev wrote:
>> [...]
>
> One more thing:
>
> There is the argument that the expected behavior of Phobos functions creating filesystems objects with long paths is to succeed and create those files. However, this results in filesystem objects that most software will fail to access (everyone needs to also use the long paths workaround).
>
> One point of view is that the expected behavior is that the functions succeed. Another point of view is that Phobos should not allow programs to create files and directories with invalid paths. Consider, e.g. that a user writes a program that creates a large tree of deeply nested filesystem objects. When they are done and wish to delete them, their file manager fails and displays an error. The user's conclusion? D sucks because it corrupts the filesystem and creates objects they can't operate with.

I was wanting to reply with something similar:)

My 2 cents..whatever it's worth.  Vladimir has expressed most if not all the points I would have brought up.  Abscissa did bring up a good idea to help users support long filenames, but I agree with Vladimir that this should be "opt-in".  Provide a function in phobos for it, plus, it lets them cache the result AND infinitely better, the developer knows what's going on.  What drives me mad is when you have library writers who try to "protect" you from the underlying system by translating everything you do into what they "think" you're trying to do.  This will inevitably result in large complex adaptation layers as both the underlying system and the front-facing API change over time with unwieldy maintenance burden.  An opt-in solution doesn't have this problem because you've kept each solution orthogonal rather than developing a translation layer that needs to be able to determine what the underlying system does or does not support.  This is a fundamental example of encapsulation, the filesystem library should be it's own component with the windows filesystem workaround being an optional "add-on" that the filesystem library doesn't need to know about.  This workaround could look like an extra function in phobos...or you could even write a module that wraps std.file and does the translation on a per-call basis.

September 19, 2018

Re: phobo's std.file is completely broke!

Posted by Nick Sabalausky (Abscissa)
in reply to Vladimir Panteleev

Nick Sabalausky (Abscissa)

Posted in reply to Vladimir Panteleev

On 09/19/2018 12:04 AM, Vladimir Panteleev wrote:
> On Wednesday, 19 September 2018 at 01:50:54 UTC, Nick Sabalausky (Abscissa) wrote:
>> And at least for me, moving from Windows to Linux would have been a LOT harder if it weren't for the OS abstractions that are already in Phobos.
> 
> It's one thing to call unlink on POSIX and RemoveFileW on Windows. 

Granted.

> Another is adding a good deal of extra logic that performs additional OS calls and generates additional GC garbage to work around API problems even on systems that don't need it.

- Is it really?

- Does it actually, necessarily perform those additional OS calls?

- If it actually does, are those additional, necessarily OS calls prohibitively expensive? (Note that this is being compared to the theoretical minimum of successfully performing the same desired operation on the same data via the WinAPI, and not compared to the software which fails to perform appropriate checks for invalid input.)

- How have you determined that the "additional GC garbage" required to "work around API problems" is still significant "even on systems that don't need it"?

September 19, 2018

Re: phobo's std.file is completely broke!

Posted by Vladimir Panteleev
in reply to Nick Sabalausky (Abscissa)

Vladimir Panteleev

Posted in reply to Nick Sabalausky (Abscissa)

On Wednesday, 19 September 2018 at 06:34:33 UTC, Nick Sabalausky (Abscissa) wrote:
> - Does it actually, necessarily perform those additional OS calls?

We need to expand relative paths to absolute ones, for which we need to fetch the current directory.

> - Is it really?

Is what really what? If you mean the memory allocation, we do need a buffer to store the current directory. We also need to canonicalize away things like \..\, though we may be able to get away with it without allocating.

> - If it actually does, are those additional, necessarily OS calls prohibitively expensive?

They are certainly going to be less expensive that actual filesystem operations that hit the physical disk, but it will still be an unwanted overhead in 99.9% of cases.

In any case, the overhead is only one issue.

September 19, 2018

Re: phobo's std.file is completely broke!

Posted by Ecstatic Coder
in reply to Vladimir Panteleev

Ecstatic Coder

Posted in reply to Vladimir Panteleev

On Wednesday, 19 September 2018 at 05:32:47 UTC, Vladimir Panteleev wrote:
> On Wednesday, 19 September 2018 at 05:24:24 UTC, Ecstatic Coder wrote:
>> None would ever be, considering you obviously have decided to ignore such a simple solution to the 260 character limit...
>
> Add "ad hominem" to your pile of fallacies, I guess.

Now I will, thanks :)

Once again, this forum proves to be very effective at removing any motivation from D users to get involved and contribute to the D language.

That's probably one of the keys of its success...

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation