June 06, 2013
On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad wrote:
> On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
>> I should have said "makes it easier to be platform independent". Normalization is done automatically on comparison.
>
> Yes, p1 == p2 sure looks nice, but unbeknownst to the API user, it comes at the cost of several memory allocations, and it does not perform a case-insensitive comparison on Windows in its current form.  (Should it?  I dunno.)

It doesn't do any allocations that the user won't have to do anyways. Paths have to be normalized before comparison; not doing so isn't correct behavior. Eg, the strings `foo../bar` != `bar`, yet they're equivalent paths. Path encapsulates the behavior. So it's the difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;
June 06, 2013
On 2013-06-06 15:36:15 +0000, Walter Bright <newshound2@digitalmars.com> said:

> 8. There really isn't any such thing as a portable path representation. It's more than just \ vs /. There are the drive prefixes in Windows that have no analog in Linux. Sometimes case matters in Linux, where it would be ignored under Windows. There are 8.3 issues sometimes. The only thing you can do is come up with a subset of what works across systems, and then of course you have to go back to using strings when you need to access D:\foo\abc.c

Actually, there is one portable representation for paths: URLs. More specifically "file:" URLs if we're limiting ourselves to filesystem paths. Relative URLs should probably count too.

But otherwise, that's all true. To correctly normalize a path, you need to know which underlying filesystem is in use. Today's operating systems can mix and match case-sensitive, case-preserving, and case-insensitive filesystems, different restrictions on file names, and sometime have obscure restrictions/normalization when using old APIs on newer filesystenm. You can't really normalize a path without making a lot of assumptions.

Of course, that's not an argument for or against having a path object to encapsulate the differences. But I'd tend to say that what the path object can do is more limited than one might think at first glance.

As a side note, Apple is currently asking application developers to use URLs instead of raw paths to local files. Using URLs makes it possible for instance to attach "bookmarks" keys on path (in the query string) that can more or less automatically punch a hole in the sandbox when accessing a file (which can expire or be revoked). Pretty much all recent Cocoa APIs take url objects instead of path strings.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca/

June 06, 2013
On 6/6/2013 9:00 AM, Dylan Knutson wrote:
>> 1. Making a more 'palatable' interface is pretty much chasing rainbows. It
>> really isn't better, it is just different. In many ways, it is worse because
>> it cannot hope to duplicate the rich interface available for strings.
>
> .toString ?
>
>> 2. APIs that deal with filenames take strings and return strings, not Path
>> objects. Your code gets littered with path and filename components that are
>> sometimes Paths and sometimes strings and sometimes both.
>
> As for APIs that return strings, a `Path toPath(string)` function could be added
> in std.path? Another solution would be to migrate the parts of Phobos that use
> path strings to using actual paths. They could be overloaded with a counterpart
> that also takes a string, but the toPath function would be pretty useful here.

Yes, your code becomes littered with conversions. Ugh.


>> 3. Every time you deal with a filename or path, you have to decide whether to
>> use a Path or a string. This may seem like a small thing, but when writing a
>> lot of code to deal with paths, this becomes a fracking annoyance.
>
> If there should only be one API used, I'd suggest just use Path.

Except that just doesn't work out in practice. An awful lot uses strings, and again, people want to use the incredibly rich string manipulation code out there on paths.


> the more I realize how little
> code would break, and how easy it'd be to fix that.

That's been used to justify every code breakage. And yet, people eschew using D because of constant code breakage. It must stop.


> It even takes less chars :-P and it only allocates on Path == Path and Path ==
> string comparison. Which would have been done manually anyways.

Doing memory allocation to do == is a bad idea. People intuitively think of == as a cheap operation.


> Well, that's not so much a limitation of Path or path functions as much as it is
> with the operating systems themselves. You still run into that with strings. I'm
> not trying to do anything groundbreaking, just abstract away the concept of a
> path so it's easy to write larger applications.

But it isn't easier to use a Path object. That's one of the things I discovered when using them - it's never easier.


> Good practice says don't worry about the implementation of what you can't see.

Yeah, well, you said that == allocates memory under the hood, which is surprising behavior. Real programs definitely worry about the implementation.


> If the programmer is worried about the speed of the abstraction, deal with that
> separately.

Yes, he goes back to using strings.

June 06, 2013
On Thursday, 6 June 2013 at 16:24:11 UTC, Walter Bright wrote:
>> As for APIs that return strings, a `Path toPath(string)` function could be added
>> in std.path? Another solution would be to migrate the parts of Phobos that use
>> path strings to using actual paths. They could be overloaded with a counterpart
>> that also takes a string, but the toPath function would be pretty useful here.
>
> Yes, your code becomes littered with conversions. Ugh.

As opposed to the rest of the conventions that Phobos uses?

>>
>> If there should only be one API used, I'd suggest just use Path.
>
> Except that just doesn't work out in practice. An awful lot uses strings, and again, people want to use the incredibly rich string manipulation code out there on paths.

Hence subtyping.

>
>> the more I realize how little
>> code would break, and how easy it'd be to fix that.
>
> That's been used to justify every code breakage. And yet, people eschew using D because of constant code breakage. It must stop.

Well, it comes down to are we willing to marginally break code for the sake of a better API. D and Phobos aren't considered stable by any standard; I don't think we should treat them like they're set in stone. Also, deprecation gives developers plenty of time to update their code (if they have to at all).

>
>> It even takes less chars :-P and it only allocates on Path == Path and Path ==
>> string comparison. Which would have been done manually anyways.
>
> Doing memory allocation to do == is a bad idea. People intuitively think of == as a cheap operation.

It only allocates if buildNormalPath allocates. And if you aren't using buildNormalPath in the first place before comparing strings, you're comparing paths wrong.

>> Well, that's not so much a limitation of Path or path functions as much as it is
>> with the operating systems themselves. You still run into that with strings. I'm
>> not trying to do anything groundbreaking, just abstract away the concept of a
>> path so it's easy to write larger applications.
>
> But it isn't easier to use a Path object. That's one of the things I discovered when using them - it's never easier.

Projects such as Dub, Vibe, and to an extent Tango disagree.

>> Good practice says don't worry about the implementation of what you can't see.
>
> Yeah, well, you said that == allocates memory under the hood, which is surprising behavior. Real programs definitely worry about the implementation.

Well, they shouldn't. Profile code first, see where the hotspots are, and fix those. I'd be very surprised if path comparison and manipulation is so heavily used, it becomes a slow spot for programs. And if it does, that's not the fault of the Path struct itself, but rather of the underlying functions it uses.

>
>> If the programmer is worried about the speed of the abstraction, deal with that
>> separately.
>
> Yes, he goes back to using strings.

See above; I can't think of any use case for paths where they account for a considerable amount of run time.
June 06, 2013
On Thursday, 6 June 2013 at 16:03:15 UTC, Andrei Alexandrescu wrote:
> [...]
>
>> 8. There really isn't any such thing as a portable path representation.
>> It's more than just \ vs /. There are the drive prefixes in Windows that
>> have no analog in Linux. Sometimes case matters in Linux, where it would
>> be ignored under Windows. There are 8.3 issues sometimes. The only thing
>> you can do is come up with a subset of what works across systems, and
>> then of course you have to go back to using strings when you need to
>> access D:\foo\abc.c
>
> That is actually an argument in favor of good encapsulation, not against.

The proposed API change does not introduce good encapsulation.  It introduces a super-thin wrapper around a built-in type, and replaces free functions with methods, for what gain?
June 06, 2013
On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson <tcdknutson@gmail.com> wrote:

> On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad wrote:
>> On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
>>> I should have said "makes it easier to be platform independent". Normalization is done automatically on comparison.
>>
>> Yes, p1 == p2 sure looks nice, but unbeknownst to the API user, it comes at the cost of several memory allocations, and it does not perform a case-insensitive comparison on Windows in its current form.  (Should it?  I dunno.)
>
> It doesn't do any allocations that the user won't have to do anyways. Paths have to be normalized before comparison; not doing so isn't correct behavior. Eg, the strings `foo../bar` != `bar`, yet they're equivalent paths. Path encapsulates the behavior. So it's the difference between
>
> buildNormalizedPath(s1) == buildNormalizedPath(s2);
>
> and
>
> p1 == p2;

This can be done without allocations.

-Steve
June 06, 2013
On Thursday, 6 June 2013 at 16:14:31 UTC, Dylan Knutson wrote:
> On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad wrote:
> It doesn't do any allocations that the user won't have to do anyways. Paths have to be normalized before comparison; not doing so isn't correct behavior. Eg, the strings `foo../bar` != `bar`, yet they're equivalent paths. Path encapsulates the behavior. So it's the difference between
>
> buildNormalizedPath(s1) == buildNormalizedPath(s2);
>
> and
>
> p1 == p2;

To me, at least, the first one practically screams "expensive operation", whereas the second one does the exact opposite.
June 06, 2013
On Thursday, 6 June 2013 at 17:13:10 UTC, Steven Schveighoffer wrote:
> On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson <tcdknutson@gmail.com> wrote:
>
>> It doesn't do any allocations that the user won't have to do anyways. Paths have to be normalized before comparison; not doing so isn't correct behavior. Eg, the strings `foo../bar` != `bar`, yet they're equivalent paths. Path encapsulates the behavior. So it's the difference between
>>
>> buildNormalizedPath(s1) == buildNormalizedPath(s2);
>>
>> and
>>
>> p1 == p2;
>
> This can be done without allocations.

I know.  There are a few additions that I've been planning to make for std.path for the longest time, I just haven't found the time to do so yet.  Specifically, I want to add a couple of functions that deal with ranges of path segments rather than full path strings.

The first one is a lazy "path normaliser":

  assert (equal(pathNormalizer(["foo", "bar", "..", "baz"]),
                ["foo", "bar", "baz"]));

With this, non-allocating path comparison is easy.  The verbose version of p1 == p2, which could be wrapped for convenience, is then:

  equal(pathNormalizer(pathSplitter(p1)),
        pathNormalizer(pathSplitter(p2)))

You can also use filenameCmp() as a predicate to equal() to make the comparison case-insensitive on OSes where this is expected.  Very general and composable, and easily wrappable.

The second thing I'd like to add is an overload of buildPath() that takes a range of path segments.  (Then buildNormalizedPath(p) can also be implemented as buildPath(pathNormalizer(p)).)

Maybe now is a good time to get this done. :)
June 06, 2013
On 6/6/2013 9:23 AM, Michel Fortin wrote:
> Actually, there is one portable representation for paths: URLs. More
> specifically "file:" URLs if we're limiting ourselves to filesystem paths.
> Relative URLs should probably count too.

That doesn't work for case sensitivity/insensitivity differences, nor does it work for drive letters like "C:" (which don't exist on Apple systems, hence they can afford to dismiss them).

In D source code, we deal with this with the convention that package and module names must be lower case. But there's no getting around the fact that "File" and "file" are different paths under Windows, and are the same under Linux.

There is no generic abstraction to account for that - the programmer must be aware of it and adjust as appropriate for his application.
June 06, 2013
On Thu, 06 Jun 2013 13:25:56 -0400, Lars T. Kyllingstad <public@kyllingen.net> wrote:

> On Thursday, 6 June 2013 at 17:13:10 UTC, Steven Schveighoffer wrote:
>> On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson <tcdknutson@gmail.com> wrote:
>>
>>> It doesn't do any allocations that the user won't have to do anyways. Paths have to be normalized before comparison; not doing so isn't correct behavior. Eg, the strings `foo../bar` != `bar`, yet they're equivalent paths. Path encapsulates the behavior. So it's the difference between
>>>
>>> buildNormalizedPath(s1) == buildNormalizedPath(s2);
>>>
>>> and
>>>
>>> p1 == p2;
>>
>> This can be done without allocations.
>
> I know.  There are a few additions that I've been planning to make for std.path for the longest time, I just haven't found the time to do so yet.  Specifically, I want to add a couple of functions that deal with ranges of path segments rather than full path strings.
>
> The first one is a lazy "path normaliser":
>
>    assert (equal(pathNormalizer(["foo", "bar", "..", "baz"]),
>                  ["foo", "bar", "baz"]));
>
> With this, non-allocating path comparison is easy.  The verbose version of p1 == p2, which could be wrapped for convenience, is then:
>
>    equal(pathNormalizer(pathSplitter(p1)),
>          pathNormalizer(pathSplitter(p2)))
>
> You can also use filenameCmp() as a predicate to equal() to make the comparison case-insensitive on OSes where this is expected.  Very general and composable, and easily wrappable.

Great!  I'd highly suggest pathEqual which takes two ranges of dchar and does the composition and OS-specific comparison for you.

-Steve