June 06, 2013
On Thursday, June 06, 2013 11:09:29 Walter Bright wrote:
> On 6/6/2013 10:50 AM, Jonathan M Davis wrote:
> > Some modules have needed been redone. Some still do. But we already _did_
> > rework std.path. We agreed that we liked the new API, and it's been
> > working
> > great. It's one thing to revisit an API that's been around since before we
> > had ranges or a review process. It's an entirely different thing to be
> > constantly reworking entire modules. I think that we need _very_ strong
> > justification to redesign a module that we already put through the review
> > process. And I really don't think that we have it here.
> 
> I think we're in violent agreement.

Yes. I was replying in support of your argument rather than replying directly to Dylan.

> An example of a strong justification for a redo is, for example, conversion to use ranges. std.zip needs that treatment.

Agreed.

- Jonathan M Davis
June 06, 2013
On Thursday, June 06, 2013 13:53:51 Steven Schveighoffer wrote:
> On Thu, 06 Jun 2013 13:50:13 -0400, Walter Bright
> 
> <newshound2@digitalmars.com> wrote:
> > For example, what about symlinks?
> 
> Path operations should not require a real filesystem. They are string manipulations, nothing more.
> 
> There is huge value in that.

Agreed, but symlinks highlight the fact that there is a difference between paths being equal and paths referring to the same file.

- Jonathan M Davis
June 06, 2013
On 6/6/13 2:13 PM, Jonathan M Davis wrote:
>> An example of a strong justification for a redo is, for example, conversion
>> to use ranges. std.zip needs that treatment.
>
> Agreed.

Key to success for Path: somehow get it on the ranges bandwagon :o).

Andrei
June 06, 2013
On Thursday, June 06, 2013 14:38:41 Andrei Alexandrescu wrote:
> On 6/6/13 2:13 PM, Jonathan M Davis wrote:
> >> An example of a strong justification for a redo is, for example,
> >> conversion
> >> to use ranges. std.zip needs that treatment.
> > 
> > Agreed.
> 
> Key to success for Path: somehow get it on the ranges bandwagon :o).

LOL. Well, given that strings are _already_ ranges, that wouldn't help it anywhere near as much as it does with other cases of code breakage, since std.path is already quite range-ready.

- Jonathan M Davis
June 06, 2013
On 2013-06-06 17:27:28 +0000, Walter Bright <newshound2@digitalmars.com> said:

> That doesn't work for case sensitivity/insensitivity differences nor does it work for drive letters like "C:" (which don't exist on Apple systems, hence they can afford to dismiss them).

Have you never opened a local file in a windows web browser and took a look at the URL? The drive letter is there.

	file:///c:/path/to/the%20file.txt

The drive letter is simply the first part of the path on Windows.


> But there's no getting around the fact that "File" and "file" are different paths under Windows, and are the same under Linux.

Actually, it doesn't depend on Linux or Windows or OS X. It depends on the filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive HFS+, etc. If you assume a specific case sensitivity setting by looking at the OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole issue about which locale to use for Unicode case-insensitive comparisons. I'd bet that different filesystems choose different approaches to this tricky problem.

So there's no way to normalize for case-sensitivity just by looking at a path or a URL, even if you know on which OS you're on. If you want to know for sure whether two paths are the same, or what is the normalized path, you need to ask the filesystem at some point. Anything else is based on fragile assumptions.


-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca/

June 06, 2013
On 6/6/2013 1:02 PM, Michel Fortin wrote:
> On 2013-06-06 17:27:28 +0000, Walter Bright <newshound2@digitalmars.com> said:
>
>> That doesn't work for case sensitivity/insensitivity differences nor does it
>> work for drive letters like "C:" (which don't exist on Apple systems, hence
>> they can afford to dismiss them).
>
> Have you never opened a local file in a windows web browser and took a look at
> the URL? The drive letter is there.
>
>      file:///c:/path/to/the%20file.txt
>
> The drive letter is simply the first part of the path on Windows.

I didn't know that, but that doesn't make it a canonical path. It just combines the notion of url with a path.


>> But there's no getting around the fact that "File" and "file" are different
>> paths under Windows, and are the same under Linux.
>
> Actually, it doesn't depend on Linux or Windows or OS X. It depends on the
> filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive
> HFS+, etc. If you assume a specific case sensitivity setting by looking at the
> OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has
> Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole
> issue about which locale to use for Unicode case-insensitive comparisons. I'd
> bet that different filesystems choose different approaches to this tricky problem.
>
> So there's no way to normalize for case-sensitivity just by looking at a path or
> a URL, even if you know on which OS you're on. If you want to know for sure
> whether two paths are the same, or what is the normalized path, you need to ask
> the filesystem at some point. Anything else is based on fragile assumptions.

It may be a bug, and I personally try to never depend on path code that is case sensitive or not, but I bet there's a *lot* of code out there that makes those assumptions.

BTW, Windows still has only erratic support for using / as path separators, even in the system commands. Not even the "DIR" command can deal with it.

June 06, 2013
On Thu, 06 Jun 2013 16:25:58 -0400, Walter Bright <newshound2@digitalmars.com> wrote:

> BTW, Windows still has only erratic support for using / as path separators, even in the system commands. Not even the "DIR" command can deal with it.
>

We don't program using DIR.  That is irrelevant.  (not contesting that Windows doesn't work well with '/', just that DIR, or any other command line tool, is evidence)

-Steve
June 06, 2013
On 6/6/2013 1:54 PM, Steven Schveighoffer wrote:
> On Thu, 06 Jun 2013 16:25:58 -0400, Walter Bright <newshound2@digitalmars.com>
> wrote:
>
>> BTW, Windows still has only erratic support for using / as path separators,
>> even in the system commands. Not even the "DIR" command can deal with it.
>>
>
> We don't program using DIR.  That is irrelevant.  (not contesting that Windows
> doesn't work well with '/', just that DIR, or any other command line tool, is
> evidence)

The fact that DIR, probably the most widely used command in Windows, doesn't support it is indicative.

I've also noticed Windows file dialog boxes not supporting it, and those are supposed to be standard components.

DIR is used in .bat files and makefiles, it is certainly used in programming.

June 06, 2013
On 2013-06-06 20:25:58 +0000, Walter Bright <newshound2@digitalmars.com> said:

> On 6/6/2013 1:02 PM, Michel Fortin wrote:
>> Have you never opened a local file in a windows web browser and took a look at
>> the URL? The drive letter is there.
>> 
>>      file:///c:/path/to/the%20file.txt
>> 
>> The drive letter is simply the first part of the path on Windows.
> 
> I didn't know that, but that doesn't make it a canonical path. It just combines the notion of url with a path.

It's not a canonical path, but it's a platform-neutral representation of a path. You can perform the same operations with a URL (including regular expressions) irrespective the underlying OS.

I was replying initially to your claim that there was no portable way to represent a path. I don't think the definition of a "portable path" needs to include any notion of canonical, because not even non-portable paths can be canonical these days.


>> Actually, it doesn't depend on Linux or Windows or OS X. It depends on the
>> filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive
>> HFS+, etc. If you assume a specific case sensitivity setting by looking at the
>> OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has
>> Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole
>> issue about which locale to use for Unicode case-insensitive comparisons. I'd
>> bet that different filesystems choose different approaches to this tricky problem.
>> 
>> So there's no way to normalize for case-sensitivity just by looking at a path or
>> a URL, even if you know on which OS you're on. If you want to know for sure
>> whether two paths are the same, or what is the normalized path, you need to ask
>> the filesystem at some point. Anything else is based on fragile assumptions.
> 
> It may be a bug, and I personally try to never depend on path code that is case sensitive or not, but I bet there's a *lot* of code out there that makes those assumptions.

That's a good way to deal with paths (don't assume anything). And I'd bet even case-sensitive filesystems differ in behaviour when presented with different normalization of Unicode (using pre-combined characters vs. combining ones).

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca/

June 06, 2013
On Thu, Jun 06, 2013 at 02:38:41PM -0400, Andrei Alexandrescu wrote:
> On 6/6/13 2:13 PM, Jonathan M Davis wrote:
> >>An example of a strong justification for a redo is, for example, conversion to use ranges. std.zip needs that treatment.
> >
> >Agreed.
> 
> Key to success for Path: somehow get it on the ranges bandwagon :o).
[...]

Hmm. Let's see:

	assert(isInputRange!Path);
	version(Windows)
		auto p = Path(`..\blah\blah\..\bluh`);
	else version(Linux)
		auto p = Path(`../blah/blah/../bluh`);

	// I'm assuming auto normalization; if you don't like that,
	// pretend I also wrote this line:
	//	p.normalize();

	assert(p.equals([
		"..",
		"blah",
		"bluh"
	]);

What about that? ;-)

While the above may *look* attractive, it's actually a minefield full of pitfalls. Consider this directory tree in Posix:

	/home/user/test
	/home/user/test/symlink -> /home/user/real/1
	/home/user/test/real
	/home/user/test/real/1/myfile
	/home/user/test/real/2/anotherfile

Let's say the current working directory is /home/user. Now consider this:

	auto p = Path(`test/symlink/../2/anotherfile`);
	assert(std.path.exists(p));	// should this work?

The only way the above can actually work is if normalization queries the filesystem. That is to say, it is NOT mere string manipulations.

However, *should* normalization always check the filesystem? What if the program is constructing a list of paths that it's going to create, which don't exist in the filesystem yet? Then normalization will fail, even though the paths are valid.

Conclusion: correct path normalization depends on intent, which only the programmer knows -- the library can't possibly figure this out without being told. (And I haven't even started getting into OS-dependent path manipulation yet... what should Path(`C:\Program Files\abc.def`) do on a Posix system?) IOW, the programmer *already* has to know about system-dependent details of paths, so I'm not sure what value Path is really adding. At least, I'm not finding it compelling enough to eschew plain old string manipulations.

Besides, should glob patterns like "/home/user/prog/*/*.d" be Path's or strings? What about path regexes? Should Path export a whole suite of parallel methods for constructing such patterns? One can always interconvert to/from strings, of course, but if we'd started out with strings in the first place, we wouldn't need any conversions. The OS ultimately takes only strings anyway, so is there really a need to insert a convert to/from Path in between?

I do see a lot of value in providing *functions* for manipulating path strings (normalizations, parsing path components, splitting file extensions, etc.), but I've a hard time with encapsulating a path string in an opaque object when it doesn't really give that much more value. If you *really* like the idea of Path, nothing stops you from writing one yourself, and have it implicitly convert to string so that you can pass it directly to OS functions that take paths. I just don't see value in requiring Phobos functions to only take Path objects.


T

-- 
WINDOWS = Will Install Needless Data On Whole System -- CompuMan