Thread overview
DirEntry on Windows - wstring variant?
Oct 24, 2014
dcrepid
Oct 24, 2014
Jonathan M Davis
Oct 25, 2014
dcrepid
Oct 25, 2014
Jonathan M Davis
October 24, 2014
As a Windows programmer using D, I find a number of questionable things with D's focus on using string everywhere. It's not a huge deal to add in UTF-8 to UTF-16 mapping in certain areas, but when it comes to working with a lot of data and Windows API calls, the less needless conversions the better.

I like the DirEntries (std.file) approach to traversing files and folders in a directory (almost as nice as C++14's <filesystem>), but I think its a bit odd that native-OS strings aren't used in D here.  Sure, I get that having a fairly consistent programming interface may make using the language easier for certain programmers, but if you're using D with Windows, then you will be made well aware of the incompatibilities between D's strings and the Windows API (unless you always use ASCII I suppose).

Anyway, I'm curious if proposing changes to those interfaces is worthwhile, or if I should just modify it for my own purposes and leave the standard library be.

P.S. Its a shame to keep running into Unicode issues with D and Windows, and sometimes its a bit discouraging. Right before I peeked into DirEntry, I worked a bit on a workaround for stdio.File's unicode problems (a documented bug thats 2+ years old). I remember trying D a while back and giving up because optlink was choking on paths. And just yesterday it choked on what the %PATH% environment variable was set to, so I had to clear that before running it.
October 24, 2014
On Friday, October 24, 2014 21:06:37 dcrepid via Digitalmars-d-learn wrote:
> As a Windows programmer using D, I find a number of questionable things with D's focus on using string everywhere. It's not a huge deal to add in UTF-8 to UTF-16 mapping in certain areas, but when it comes to working with a lot of data and Windows API calls, the less needless conversions the better.
>
> I like the DirEntries (std.file) approach to traversing files and folders in a directory (almost as nice as C++14's <filesystem>), but I think its a bit odd that native-OS strings aren't used in D here.  Sure, I get that having a fairly consistent programming interface may make using the language easier for certain programmers, but if you're using D with Windows, then you will be made well aware of the incompatibilities between D's strings and the Windows API (unless you always use ASCII I suppose).
>
> Anyway, I'm curious if proposing changes to those interfaces is worthwhile, or if I should just modify it for my own purposes and leave the standard library be.
>
> P.S. Its a shame to keep running into Unicode issues with D and Windows, and sometimes its a bit discouraging. Right before I peeked into DirEntry, I worked a bit on a workaround for stdio.File's unicode problems (a documented bug thats 2+ years old). I remember trying D a while back and giving up because optlink was choking on paths. And just yesterday it choked on what the %PATH% environment variable was set to, so I had to clear that before running it.

I don't know. The expectation is generally that programs will use string and that wstring will be used only in the rare cases that you have to interact directly with the Windows API. When it was suggested previously that the various functions in std.file be templatized on string type to support other string types, it was decided that that was unnecessary code bloat and not worth it.

Also, given how DirEntry works internally, I'd definitely be inclined to argue that it would be too much of a mess to support wstring unless it's by simply converting the name to a wstring when requested (which is kind of pointless, since you can just do to!wstring on the name if that's what you want). Making it support wstring directly would involve a lot of code duplication, and it would increase the memory footprint, because the structs involved would then have to hold the path and whatnot as both a string and wstring. So, I question that it's at all worth it to try and make dirEntries support wstring. And we definitely don't want to encourage the use of wstring. It's there for when you need it (which is great), but programs really should be using string if they don't actually need to use wstring or dstring.

- Jonathan M Davis

October 25, 2014
On Friday, 24 October 2014 at 22:53:15 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:
>
> Also, given how DirEntry works internally, I'd definitely be inclined to argue
> that it would be too much of a mess to support wstring unless it's by simply
> converting the name to a wstring when requested (which is kind of pointless,
> since you can just do to!wstring on the name if that's what you want). Making
> it support wstring directly would involve a lot of code duplication, and it
> would increase the memory footprint, because the structs involved would then
> have to hold the path and whatnot as both a string and wstring. So, I question
> that it's at all worth it to try and make dirEntries support wstring.

I would suggest that the string be kept as wstring inside the DirEntry structure, rather than converting twice as you suggest. Then a decision can be made as to whether .name() returns a string or wstring. If backwards compatibility is a concern, then it could be converted to a string on that call. It would break the nothrow promise that way, though. Adding something like .wname() would work here for getting the native wstring, I suppose.

Another alternative is to have a union of string and wstring, and a bool indicating how strings are handled internally. Of course, the .name and .wname properties would need to check it and convert depending on how it is stored.  Its not pretty, but its just another possibility.

The whole point is that there is a lot of wasted time doing the UTF16-UTF8 conversions when using these library functions.

> And we
> definitely don't want to encourage the use of wstring. It's there for when you
> need it (which is great), but programs really should be using string if they
> don't actually need to use wstring or dstring.

I get that wstring on a whole is ugly, but its the native unicode string type in Windows.  If someone is doing serious work on Windows, wstring will eventually need to be used.  It'd be nice to keep the abstraction of string at every level of a program, but in Windows its impossible. The standard library, even if it was comprehensive enough, will never cover every corner case where strings are needed.  Whether using the Windows API, COM, or interfacing with other Windows libraries, wstring will still rear its ugly head.

But, idealism aside, there are good reasons for keeping the pathname in its native format on Windows:
- If a program is processing lots of files, there's going to be a lot of wasted cycles doing those wstring->string conversions.
- Doing anything more with the files, besides listing them, will probably result in a string->wstring conversion during a call to Windows for opening or querying information about the file = more cycles wasted
- Additionally, Windows has a peculiar way of handling long pathnames that requires a "\\?\" prefix, and only works with the unicode versions of its functions. This also makes the pathname uniquely OS-specific..

Anyway, some things to think about.
October 25, 2014
On Saturday, October 25, 2014 01:11:26 dcrepid via Digitalmars-d-learn wrote:
> On Friday, 24 October 2014 at 22:53:15 UTC, Jonathan M Davis via Anyway, some things to think about.

DirEntry and all of the related functions and types would need quite a bit of rewriting to do what you're suggesting, and most folks aren't going to be using the Windows API enough for it to matter that std.file operates on string rather than wstring. And much as the Windows API unfortunately uses wchar all over the place, the functions that are being used internally in dirEntries and company and using static arrays of wchar, so it's not like using wstring instead of string would avoid any allocations. All it would do would be to avoid decoding and re-encoding the Unicode from UTF-16 to UTF-8. Additionally, the cost of the file operations would dwarf the cost of any allocations anyway. So, I'm not in the least bit convinced that altering anything in std.file to use wstring would be of much benefit, and all previous suggestions to support wstring anywhere in std.file have been shot down (and iin at least one case, it was by Walter Bright, and he works on Windosw primarily).

- Jonathan M Davis