Thread overview
A hat-trick of File bugs
Jun 26, 2004
Arcane Jill
Jun 26, 2004
Ben Hinkle
Jun 26, 2004
Sean Kelly
Jun 26, 2004
J C Calvarese
June 26, 2004
Hi,

I encountered three bugs yesterday. One of them crept in with DMD 0.93. For reference, my file opening procedure is:

#  Stream s = new BufferedStream(new File(filename, FileMode.In));

BUG 1 - Introduced in DMD 0.93. It appears that if the filename is an exact multiple of sixteen bytes, then a spurious 'ÿ' character is appended to the filename. That is, you think you're opening "foo". The function actually tries to open "fooÿ". (You need a longer filename to see the effect, but you get the idea). I believe this bug is caused by appending an uninitialized char to the end of the filename, in an attempt to make it null-terminated for the benefit of underlying C-functions. Uninitialized chars now contain 0xFF. In combination with BUG 2, this would cause the observed symptoms.

BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN attempts to open the wrong file, but this time for a different reason. The reason is that the path parameter to new File() is passed in in UTF-8, but Windows filenames are in UTF-16. It is a bug for new File() /not/ to convert from UTF-8 to UTF-16 (on Windows) before attempting the open. The filename parameter to CreateFile(...) is an LPCTSTR, which really should use 16-bit wide characters in this case.

Please note that the REASON why the default value for uninitialized chars was changed to 0xFF was /precisely/ to catch situations like this. 0xFF is illegal in UTF-8. *IF* you had attempted to convert from UTF-8 to UTF-16, and a trailing 0xFF happened to be found, it WOULD rightly have caused a UTF-8 conversion exception. As it was, Windows interpretted the 0xFF as a WINDOWS-1252 encoded character (which is why we got 'ÿ' - its codepoint equals U+00FF).

BUG 3 - Even though I was attempting to open a file for READING, an empty file got CREATED. Like - you try to open "foo" for reading, and "fooÿ" gets created. Surely FileMode.In out to mean "do not create"?

Arcane Jill

PS. I don't have newsgroup access, so I can't report these anywhere else, unless the bug-reporting forum has a web interface too. But these would seem to be sufficiently pervasive that they are likely to affect a whole lot of other people too. Particularly given that much discussion of late has been about streams, it seemed relevant to mention it here.



June 26, 2004
Arcane Jill wrote:

> Hi,
> 
> I encountered three bugs yesterday. One of them crept in with DMD 0.93. For reference, my file opening procedure is:
> 
> #  Stream s = new BufferedStream(new File(filename, FileMode.In));
> 
> BUG 1 - Introduced in DMD 0.93. It appears that if the filename is an exact multiple of sixteen bytes, then a spurious '' character is appended to the filename. That is, you think you're opening "foo". The function actually tries to open "foo". (You need a longer filename to see the effect, but you get the idea). I believe this bug is caused by appending an uninitialized char to the end of the filename, in an attempt to make it null-terminated for the benefit of underlying C-functions. Uninitialized chars now contain 0xFF. In combination with BUG 2, this would cause the observed symptoms.

Now I see why the std.string unittests fail in toStringz. I had assumed my
phobos was messed up but the failures were exactly about tacking on a
trailing FF. Line 227 and 228 in string.d allocate a new char[] and copy
the string over and assume char.init is 0. I had to put
 copy[string.length] = 0;
Even then I think the string unittests still failed later on (I stopped
trying to fix std.string then assuming something else was wrong).

> BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN attempts to open the wrong file, but this time for a different reason. The reason is that the path parameter to new File() is passed in in UTF-8, but Windows filenames are in UTF-16. It is a bug for new File() /not/ to convert from UTF-8 to UTF-16 (on Windows) before attempting the open. The filename parameter to CreateFile(...) is an LPCTSTR, which really should use 16-bit wide characters in this case.
> 
> Please note that the REASON why the default value for uninitialized chars was changed to 0xFF was /precisely/ to catch situations like this. 0xFF is illegal in UTF-8. *IF* you had attempted to convert from UTF-8 to UTF-16, and a trailing 0xFF happened to be found, it WOULD rightly have caused a UTF-8 conversion exception. As it was, Windows interpretted the 0xFF as a WINDOWS-1252 encoded character (which is why we got '' - its codepoint equals U+00FF).

you are right. line 1404 in std.stream should call something like file.toMBSz instead of toStringz.

> BUG 3 - Even though I was attempting to open a file for READING, an empty file got CREATED. Like - you try to open "foo" for reading, and "foo" gets created. Surely FileMode.In out to mean "do not create"?

Interesting. It looks like on linux it errors. Seems reasonable to have Windows do the same.

> Arcane Jill
> 
> PS. I don't have newsgroup access, so I can't report these anywhere else, unless the bug-reporting forum has a web interface too. But these would seem to be sufficiently pervasive that they are likely to affect a whole lot of other people too. Particularly given that much discussion of late has been about streams, it seemed relevant to mention it here.

in general bug reports should go to the bugs newsgroup.

June 26, 2004
In article <cbj8kn$1pvm$1@digitaldaemon.com>, Arcane Jill says...
>
>BUG 1 - Introduced in DMD 0.93. It appears that if the filename is an exact multiple of sixteen bytes, then a spurious 'ÿ' character is appended to the filename. That is, you think you're opening "foo". The function actually tries to open "fooÿ". (You need a longer filename to see the effect, but you get the idea). I believe this bug is caused by appending an uninitialized char to the end of the filename, in an attempt to make it null-terminated for the benefit of underlying C-functions. Uninitialized chars now contain 0xFF. In combination with BUG 2, this would cause the observed symptoms.

Already fixed in my update version.

>BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN attempts to open the wrong file, but this time for a different reason. The reason is that the path parameter to new File() is passed in in UTF-8, but Windows filenames are in UTF-16. It is a bug for new File() /not/ to convert from UTF-8 to UTF-16 (on Windows) before attempting the open. The filename parameter to CreateFile(...) is an LPCTSTR, which really should use 16-bit wide characters in this case.

So far I'm only calling CreateFileA (can't use CreateFile because macros don't work in D).  I'll add a wchar version that calls CreateFileW.

>BUG 3 - Even though I was attempting to open a file for READING, an empty file got CREATED. Like - you try to open "foo" for reading, and "fooÿ" gets created. Surely FileMode.In out to mean "do not create"?

Haven't addressed the truncate, etc, flags yet.  But you're right, this operation should fail.  To that end, would it make more sense to throw an exception or return a bit and add an isOpen method?


Sean


June 26, 2004
"Arcane Jill" <Arcane_member@pathlink.com> escribió en el mensaje
news:cbj8kn$1pvm$1@digitaldaemon.com
|
| ...
|
| BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN
| attempts to open the wrong file, but this time for a different reason. The
| reason is that the path parameter to new File() is passed in in UTF-8, but
| Windows filenames are in UTF-16. It is a bug for new File() /not/ to
convert
| from UTF-8 to UTF-16 (on Windows) before attempting the open. The filename
| parameter to CreateFile(...) is an LPCTSTR, which really should use 16-bit
wide
| characters in this case.
|
| Please note that the REASON why the default value for uninitialized chars
was
| changed to 0xFF was /precisely/ to catch situations like this. 0xFF is
illegal
| in UTF-8. *IF* you had attempted to convert from UTF-8 to UTF-16, and a
trailing
| 0xFF happened to be found, it WOULD rightly have caused a UTF-8 conversion
| exception. As it was, Windows interpretted the 0xFF as a WINDOWS-1252
encoded
| character (which is why we got 'ÿ' - its codepoint equals U+00FF).
|
| ...
|
| Arcane Jill
|

See my post "(fix) Re: unicode filenames: std.stream.File and std.path.listdir" in digitalmars.D.bugs on June 8th. There I attached a fixed stream.d which addresses that situation for Windows. I didn't know there could be such a problem in Linux, and I certainly don't know how to fix it. It's up to Walter now to fix std.stream in Phobos.

| PS. I don't have newsgroup access, so I can't report these anywhere else,
unless
| the bug-reporting forum has a web interface too. But these would seem to
be
| sufficiently pervasive that they are likely to affect a whole lot of other
| people too. Particularly given that much discussion of late has been about
| streams, it seemed relevant to mention it here.

Of course there's web interface for the bugs ng.

-----------------------
Carlos Santander Bernal


June 26, 2004
Arcane Jill wrote:
...
> 
> PS. I don't have newsgroup access, so I can't report these anywhere else, unless
> the bug-reporting forum has a web interface too. But these would seem to be
> sufficiently pervasive that they are likely to affect a whole lot of other
> people too. Particularly given that much discussion of late has been about
> streams, it seemed relevant to mention it here.

Try out this:
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs

Each Digital Mars newsgroup group has a corresponding web interface. Here's a list:
http://www.digitalmars.com/drn-bin/wwwnews?*


-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/