Thread overview
[Issue 8020] New: std.stdio can't open UTF16 file names in Windows
May 03, 2012
Oleg Kuporosov
May 03, 2012
Walter Bright
May 03, 2012
Oleg Kuporosov
May 03, 2012
Dmitry Olshansky
May 04, 2012
Oleg Kuporosov
May 04, 2012
Dmitry Olshansky
Jul 06, 2012
Denis Shelomovskij
May 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8020

           Summary: std.stdio can't open UTF16 file names in Windows
           Product: D
           Version: unspecified
          Platform: All
        OS/Version: Windows
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: Oleg.Kuporosov@gmail.com


--- Comment #0 from Oleg Kuporosov <Oleg.Kuporosov@gmail.com> 2012-05-03 00:02:15 PDT ---
File() and p/open() assume to receive only ASCII or UTF8 file names.
Windows is supporting UTF16 file systems so portability is limited only
by ASCII names.

We probably may have these API receiving wstring also to satisfy this enhancement.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8020


Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla@digitalmars.com


--- Comment #1 from Walter Bright <bugzilla@digitalmars.com> 2012-05-03 00:51:22 PDT ---
UTF8 supports the full unicode set, not just ASCII.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8020



--- Comment #2 from Oleg Kuporosov <Oleg.Kuporosov@gmail.com> 2012-05-03 04:54:32 PDT ---
Problem is Windows isn't supporting UTF8. So created file in some 3rd party app
with UTF16 name will not match UTF8 name by std.stdio.
http://d.puremagic.com/issues/show_bug.cgi?id=7648 clearly shows that, even
I think it is not a bug, just OS limitation.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 03, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8020


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh@gmail.com


--- Comment #3 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-05-03 07:33:42 PDT ---
I assumed it just transcodes UTF-8 into UTF-16 before trying to contact the OS on win32. Apparently that's not the case.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8020



--- Comment #4 from Oleg Kuporosov <Oleg.Kuporosov@gmail.com> 2012-05-04 06:05:24 PDT ---
Dmitry, we should not assume the name string is in UTF8, it may be also some
another 8-bit code page in being supported in Windows, like 125x and so on.
Such encoding should be done by application itself.
What I think is to have File/open/popen( wstring, string mode ) which should
care about UTF16 names. Surprisingly I found some links in DMC includes to
_wfopen receiving wchar_t which should exacly help here.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 04, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8020



--- Comment #5 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-05-04 07:48:07 PDT ---
(In reply to comment #4)
> Dmitry, we should not assume the name string is in UTF8, it may be also some another 8-bit code page in being supported in Windows, like 125x and so on. Such encoding should be done by application itself.

Nope, char is UTF-8 codeunit period. See TDPL, language spec etc.
Legacy one-byte encodings should be transfered in bytes/ubytes whatever. BTW
NTFS is UTF-16 (or subset of it).

> What I think is to have File/open/popen( wstring, string mode ) which should care about UTF16 names. Surprisingly I found some links in DMC includes to _wfopen receiving wchar_t which should exacly help here.

Then someone just needs rig current std.file to call toUTF16/toUTFz (see std.uni) and forward the result to the right _wfopen on win32. UTF-16 been the defacto standard in Windows for a looong time. This is all is just embarracing.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 06, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8020


Denis Shelomovskij <verylonglogin.reg@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |verylonglogin.reg@gmail.com
            Version|unspecified                 |D2
         Resolution|                            |DUPLICATE
           Severity|enhancement                 |major


--- Comment #6 from Denis Shelomovskij <verylonglogin.reg@gmail.com> 2012-07-06 11:54:47 MSD ---
(In reply to comment #5)
> Then someone just needs rig current std.file to call toUTF16/toUTFz...

std.file works good with non-ASCII strings. This is std.stdio issue.

> ...and forward the result to the right _wfopen...

And std.file uses plain WinAPI, not its buggy wrapper from Digital Mars C runtime.

> ...This is all is just embarracing.

Yes, but std.stdio is even worse than you think (e.g. it can be 100x slower than direct C function calls as bearophile noted about rawWrite).

*** This issue has been marked as a duplicate of issue 7648 ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------