June 06, 2013
On Thursday, 6 June 2013 at 10:48:54 UTC, Lars T. Kyllingstad wrote:
> On Thursday, 6 June 2013 at 10:32:36 UTC, Regan Heath wrote:
>> On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad <public@kyllingen.net> wrote:
>>
>>> On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad wrote:
>>>>
>>>> [...]
>>>
>>> Let me add some more to this.  To justify the addition of such a type, it needs to pull its own weight.  For added value, it could do one or both of the following:
>>
>> Does System.IO.DirectoryInfo:
>> http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx
>>
>> Add sufficient value to justify it's existence to your mind?
>>
>> vs just having System.IO.Directory:
>> http://msdn.microsoft.com/en-us/library/system.io.directory.aspx
>
> They add great value, but that is a completely different discussion, as these are more similar to std.file.DirEntry.  The added value is mainly in the performance benefits; for example,
>
>     if (exists(f) && isFile(f) && timeLastModified(f) < d) ...
>
> requires three filesystem lookups (stat() calls), whereas
>
>     auto de = dirEntry(f);
>     if (de.exists && de.isFile && de.timeLastModified < d) ...
>
> is just one.
>
> I see no such benefit in the proposed Path type.

Path and dirEntry are different modules with different goals to fulfill. I don't think it's appropriate to compare a module whose function is path manipulation with one whose is querying filesystem information.
June 06, 2013
On Thursday, 6 June 2013 at 14:39:03 UTC, Dylan Knutson wrote:
> [...]
>
> I don't think that there'll be any performance improvements by making in place modification functions. Considering under the hood the path object is just a string, and that string's reference needs to be changed with each modification, I don't see how manipulation can be made faster.

Why does _path have to be an immutable string?  It could just as well be a char[], or it could be templated on the character type.

> [...]
>
> The more I think about it, the more partial I am to removing the existing string methods in std.path. At most, using a Path object increases number of characters typed by 6 (`Path()`). And even then, chances are you'll be saving characters as method names can be simplified to remove `path` from them: buildNormalizedPath -> normalized, isValidPath -> isValid, etc. Even with user code breaking, 1) D isn't exactly considered a stable language quite yet; I'm sure that users expect code breakage with each new release, and 2) it's trivial to convert code that uses the string based API to the object based API.

I know D isn't 100% stable yet, but bear in mind that this module was introduced no more than two years ago, as part of the (still-ongoing) effort to revamp the old modules from the D1 days.  It was accepted with a unanimous vote after a comprehensive review by the D community.  And already you want another breaking redesign?  I am strongly opposed to this.
June 06, 2013
On Thursday, 6 June 2013 at 14:54:25 UTC, Dylan Knutson wrote:
> On Thursday, 6 June 2013 at 10:48:54 UTC, Lars T. Kyllingstad
>> They add great value, but that is a completely different discussion, as these are more similar to std.file.DirEntry.  [...]
>
> Path and dirEntry are different modules with different goals to fulfill. I don't think it's appropriate to compare a module whose function is path manipulation with one whose is querying filesystem information.

Which is why my first sentence said "that is a completely different discussion".
June 06, 2013
On 6/4/2013 11:27 PM, Dylan Knutson wrote:
> I'd like to open up the idea of Path being an object in std.path. I've submitted
> a pull (https://github.com/D-Programming-Language/phobos/pull/1333) that adds a
> Path struct to std.path, "which exposes a much more palatable interface to path
> string manipulation".


I've succumbed to the temptation to do this several times over the years.

I always wind up backing it out and going back to strings.

The objections have all been already mentioned by others in this thread. I understand the motivation for doing it, it seems like a great idea, but I am strongly opposed to it.

To repeat the objections:

1. Making a more 'palatable' interface is pretty much chasing rainbows. It really isn't better, it is just different. In many ways, it is worse because it cannot hope to duplicate the rich interface available for strings.

2. APIs that deal with filenames take strings and return strings, not Path objects. Your code gets littered with path and filename components that are sometimes Paths and sometimes strings and sometimes both.

3. Every time you deal with a filename or path, you have to decide whether to use a Path or a string. This may seem like a small thing, but when writing a lot of code to deal with paths, this becomes a fracking annoyance.

4. An awful lot of path manipulation is done using string functions. Ever do regexes on paths? I do. But regex deals with strings, not Path objects. Ditto for the rest of the universe of code that deals with strings.

5. You wind up with two parallel universes of functions to deal with paths - one dealing with strings, one with Paths, oh, and a third universe of crap that deals with mixed strings and Paths.

6. If you try not to do (5), you break all existing code.

7. People like writing paths as "/etc/hosts", not Path("/etc/hosts"). People will not stand for a Path constructor that winds up allocating memory so it can rewrite the string in a canonical path representation.

8. There really isn't any such thing as a portable path representation. It's more than just \ vs /. There are the drive prefixes in Windows that have no analog in Linux. Sometimes case matters in Linux, where it would be ignored under Windows. There are 8.3 issues sometimes. The only thing you can do is come up with a subset of what works across systems, and then of course you have to go back to using strings when you need to access D:\foo\abc.c

9. People think about paths in terms of strings, not Path objects. Adding an abstraction layer always produces the feeling of "what is it doing, is it allocating memory, is it slow, is it doing something clever that I don't need/want?". This is cognitive baggage, and interferes with writing clear, correct code.


I've written a lot of cross-platform path code, I've tried the Path object thing multiple times, and I wrote the original std.path, and it uses strings because of my experience.
June 06, 2013
On Thursday, 6 June 2013 at 15:24:09 UTC, Lars T. Kyllingstad wrote:
> On Thursday, 6 June 2013 at 14:39:03 UTC, Dylan Knutson wrote:
>> [...]
>>
>> I don't think that there'll be any performance improvements by making in place modification functions. Considering under the hood the path object is just a string, and that string's reference needs to be changed with each modification, I don't see how manipulation can be made faster.
>
> Why does _path have to be an immutable string?  It could just as well be a char[], or it could be templated on the character type.
>
>> [...]
>>
>> The more I think about it, the more partial I am to removing the existing string methods in std.path. At most, using a Path object increases number of characters typed by 6 (`Path()`). And even then, chances are you'll be saving characters as method names can be simplified to remove `path` from them: buildNormalizedPath -> normalized, isValidPath -> isValid, etc. Even with user code breaking, 1) D isn't exactly considered a stable language quite yet; I'm sure that users expect code breakage with each new release, and 2) it's trivial to convert code that uses the string based API to the object based API.
>
> I know D isn't 100% stable yet, but bear in mind that this module was introduced no more than two years ago, as part of the (still-ongoing) effort to revamp the old modules from the D1 days.  It was accepted with a unanimous vote after a comprehensive review by the D community.  And already you want another breaking redesign?  I am strongly opposed to this.

Well, keep in mind that D 2 years ago was a different beast. AFAIK, D only recently got `alias X this`, which solves 90% of breakage problems when passing around Paths.
FWIW, having Path be an object adds consistency with the rest of Phobos, which has many entities which could be expressed as primitives, expressed as objects. To name a few, DateTime is an object, File is an object, and DirEntry is an object. Yes, they could be described as integers, or a pointer, or a string, but it's less cognitive load on the developer to recognize them as separate types.
June 06, 2013
On Thursday, 6 June 2013 at 15:36:17 UTC, Walter Bright wrote:
> I've succumbed to the temptation to do this several times over the years.
>
> I always wind up backing it out and going back to strings.

As another data point (which may or may not be relevant for the discussion here), the LLVM system/support library was initially based on Path objects, but recently has been rewritten to use raw strings: http://llvm.org/docs/doxygen/html/namespacellvm_1_1sys_1_1path.html

David
June 06, 2013
On Thursday, 6 June 2013 at 15:41:51 UTC, Dylan Knutson wrote:
> FWIW, having Path be an object adds consistency with the rest of Phobos, which has many entities which could be expressed as primitives, expressed as objects. To name a few, DateTime is an object, File is an object, and DirEntry is an object. Yes, they could be described as integers, or a pointer, or a string, but it's less cognitive load on the developer to recognize them as separate types.

"Reducing cognitive load" is not the main reason these are objects.  DateTime lumps together no less than six integers.  File adds automatic resource management via reference counting.  DirEntry caches file information to avoid repeated filesystem lookups.  And so on.
June 06, 2013
On Thursday, 6 June 2013 at 15:36:17 UTC, Walter Bright wrote:
> I've succumbed to the temptation to do this several times over the years.
>
> I always wind up backing it out and going back to strings.
>
> The objections have all been already mentioned by others in this thread. I understand the motivation for doing it, it seems like a great idea,
Yay!
> but I am strongly opposed to it.
Oh.
>
> To repeat the objections:
>
> 1. Making a more 'palatable' interface is pretty much chasing rainbows. It really isn't better, it is just different. In many ways, it is worse because it cannot hope to duplicate the rich interface available for strings.

.toString ?

> 2. APIs that deal with filenames take strings and return strings, not Path objects. Your code gets littered with path and filename components that are sometimes Paths and sometimes strings and sometimes both.

As for APIs that return strings, a `Path toPath(string)` function could be added in std.path? Another solution would be to migrate the parts of Phobos that use path strings to using actual paths. They could be overloaded with a counterpart that also takes a string, but the toPath function would be pretty useful here.

> 3. Every time you deal with a filename or path, you have to decide whether to use a Path or a string. This may seem like a small thing, but when writing a lot of code to deal with paths, this becomes a fracking annoyance.

If there should only be one API used, I'd suggest just use Path.

> 4. An awful lot of path manipulation is done using string functions. Ever do regexes on paths? I do. But regex deals with strings, not Path objects. Ditto for the rest of the universe of code that deals with strings.

Path implicitly converts to a string.

> 5. You wind up with two parallel universes of functions to deal with paths - one dealing with strings, one with Paths, oh, and a third universe of crap that deals with mixed strings and Paths.

Well, I didn't say this in my OP, but I did a few comments back: I'm more partial to deprecating the string API and moving to Path. I didn't think many would go for this, but the more I think about it, the more I realize how little code would break, and how easy it'd be to fix that.

> 6. If you try not to do (5), you break all existing code.

> 7. People like writing paths as "/etc/hosts", not Path("/etc/hosts"). People will not stand for a Path constructor that winds up allocating memory so it can rewrite the string in a canonical path representation.

string s = "/etc/hosts"
Path s = "/etc/hosts"

It even takes less chars :-P and it only allocates on Path == Path and Path == string comparison. Which would have been done manually anyways.

> 8. There really isn't any such thing as a portable path representation. It's more than just \ vs /. There are the drive prefixes in Windows that have no analog in Linux. Sometimes case matters in Linux, where it would be ignored under Windows. There are 8.3 issues sometimes. The only thing you can do is come up with a subset of what works across systems, and then of course you have to go back to using strings when you need to access D:\foo\abc.c

Well, that's not so much a limitation of Path or path functions as much as it is with the operating systems themselves. You still run into that with strings. I'm not trying to do anything groundbreaking, just abstract away the concept of a path so it's easy to write larger applications.

> 9. People think about paths in terms of strings, not Path objects. Adding an abstraction layer always produces the feeling of "what is it doing, is it allocating memory, is it slow, is it doing something clever that I don't need/want?". This is cognitive baggage, and interferes with writing clear, correct code.

It's easy to think about a path as a string for trivial code. Once the application uses paths in a nontrivial manner, people write wrappers around path functions anyways. Type safety is very useful.
Good practice says don't worry about the implementation of what you can't see. If the programmer is worried about the speed of the abstraction, deal with that separately. FWIW, the Path wrapper doesn't allocate unless it needs to :-)
June 06, 2013
On 6/6/13 11:36 AM, Walter Bright wrote:
> To repeat the objections:

Now with devil's advocate interjections:

> 1. Making a more 'palatable' interface is pretty much chasing rainbows.
> It really isn't better, it is just different. In many ways, it is worse
> because it cannot hope to duplicate the rich interface available for
> strings.

Subtyping (Path is a subtype of string by means of alias this) should make getting from paths to strings easy, and getting back from strings to paths one constructor call away (which adds correctness).

> 2. APIs that deal with filenames take strings and return strings, not
> Path objects. Your code gets littered with path and filename components
> that are sometimes Paths and sometimes strings and sometimes both.

Subtyping should make it easy to pass paths to APIs that expect strings.

> 3. Every time you deal with a filename or path, you have to decide
> whether to use a Path or a string. This may seem like a small thing, but
> when writing a lot of code to deal with paths, this becomes a fracking
> annoyance.

If there's a reward for using paths the annoyance factor may be reduced.

> 4. An awful lot of path manipulation is done using string functions.
> Ever do regexes on paths? I do. But regex deals with strings, not Path
> objects. Ditto for the rest of the universe of code that deals with
> strings.

Subtyping should take care of this.

> 5. You wind up with two parallel universes of functions to deal with
> paths - one dealing with strings, one with Paths, oh, and a third
> universe of crap that deals with mixed strings and Paths.

Subtyping makes one way easy and constructors make the other way affordable. Again, this comes back to perceived gains that compensate for the shortcomings.

> 6. If you try not to do (5), you break all existing code.

Only "half".

> 7. People like writing paths as "/etc/hosts", not Path("/etc/hosts").
> People will not stand for a Path constructor that winds up allocating
> memory so it can rewrite the string in a canonical path representation.

Lazy canonicalization may help.

> 8. There really isn't any such thing as a portable path representation.
> It's more than just \ vs /. There are the drive prefixes in Windows that
> have no analog in Linux. Sometimes case matters in Linux, where it would
> be ignored under Windows. There are 8.3 issues sometimes. The only thing
> you can do is come up with a subset of what works across systems, and
> then of course you have to go back to using strings when you need to
> access D:\foo\abc.c

That is actually an argument in favor of good encapsulation, not against.

> 9. People think about paths in terms of strings, not Path objects.
> Adding an abstraction layer always produces the feeling of "what is it
> doing, is it allocating memory, is it slow, is it doing something clever
> that I don't need/want?". This is cognitive baggage, and interferes with
> writing clear, correct code.

I'm not sure whether the generalization holds.


Andrei
June 06, 2013
On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
> I should have said "makes it easier to be platform independent". Normalization is done automatically on comparison.

Yes, p1 == p2 sure looks nice, but unbeknownst to the API user, it comes at the cost of several memory allocations, and it does not perform a case-insensitive comparison on Windows in its current form.  (Should it?  I dunno.)


> This isn't just conjecture either; there are D programs in the wild that abstract away path strings because it's easier to deal with them that way.
> I didn't want to force paths passed in to be valid, because the programmer might want an invalid path passed around for whatever reason.

As others have pointed out, there are examples of the opposite too.


> You came off as quite constructive; thank you :-)

:)