Thread overview | |||||||
---|---|---|---|---|---|---|---|
|
June 26, 2004 A hat-trick of File bugs | ||||
---|---|---|---|---|
| ||||
Hi, I encountered three bugs yesterday. One of them crept in with DMD 0.93. For reference, my file opening procedure is: # Stream s = new BufferedStream(new File(filename, FileMode.In)); BUG 1 - Introduced in DMD 0.93. It appears that if the filename is an exact multiple of sixteen bytes, then a spurious 'ÿ' character is appended to the filename. That is, you think you're opening "foo". The function actually tries to open "fooÿ". (You need a longer filename to see the effect, but you get the idea). I believe this bug is caused by appending an uninitialized char to the end of the filename, in an attempt to make it null-terminated for the benefit of underlying C-functions. Uninitialized chars now contain 0xFF. In combination with BUG 2, this would cause the observed symptoms. BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN attempts to open the wrong file, but this time for a different reason. The reason is that the path parameter to new File() is passed in in UTF-8, but Windows filenames are in UTF-16. It is a bug for new File() /not/ to convert from UTF-8 to UTF-16 (on Windows) before attempting the open. The filename parameter to CreateFile(...) is an LPCTSTR, which really should use 16-bit wide characters in this case. Please note that the REASON why the default value for uninitialized chars was changed to 0xFF was /precisely/ to catch situations like this. 0xFF is illegal in UTF-8. *IF* you had attempted to convert from UTF-8 to UTF-16, and a trailing 0xFF happened to be found, it WOULD rightly have caused a UTF-8 conversion exception. As it was, Windows interpretted the 0xFF as a WINDOWS-1252 encoded character (which is why we got 'ÿ' - its codepoint equals U+00FF). BUG 3 - Even though I was attempting to open a file for READING, an empty file got CREATED. Like - you try to open "foo" for reading, and "fooÿ" gets created. Surely FileMode.In out to mean "do not create"? Arcane Jill PS. I don't have newsgroup access, so I can't report these anywhere else, unless the bug-reporting forum has a web interface too. But these would seem to be sufficiently pervasive that they are likely to affect a whole lot of other people too. Particularly given that much discussion of late has been about streams, it seemed relevant to mention it here. |
June 26, 2004 Re: A hat-trick of File bugs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | Arcane Jill wrote: > Hi, > > I encountered three bugs yesterday. One of them crept in with DMD 0.93. For reference, my file opening procedure is: > > # Stream s = new BufferedStream(new File(filename, FileMode.In)); > > BUG 1 - Introduced in DMD 0.93. It appears that if the filename is an exact multiple of sixteen bytes, then a spurious '' character is appended to the filename. That is, you think you're opening "foo". The function actually tries to open "foo". (You need a longer filename to see the effect, but you get the idea). I believe this bug is caused by appending an uninitialized char to the end of the filename, in an attempt to make it null-terminated for the benefit of underlying C-functions. Uninitialized chars now contain 0xFF. In combination with BUG 2, this would cause the observed symptoms. Now I see why the std.string unittests fail in toStringz. I had assumed my phobos was messed up but the failures were exactly about tacking on a trailing FF. Line 227 and 228 in string.d allocate a new char[] and copy the string over and assume char.init is 0. I had to put copy[string.length] = 0; Even then I think the string unittests still failed later on (I stopped trying to fix std.string then assuming something else was wrong). > BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN attempts to open the wrong file, but this time for a different reason. The reason is that the path parameter to new File() is passed in in UTF-8, but Windows filenames are in UTF-16. It is a bug for new File() /not/ to convert from UTF-8 to UTF-16 (on Windows) before attempting the open. The filename parameter to CreateFile(...) is an LPCTSTR, which really should use 16-bit wide characters in this case. > > Please note that the REASON why the default value for uninitialized chars was changed to 0xFF was /precisely/ to catch situations like this. 0xFF is illegal in UTF-8. *IF* you had attempted to convert from UTF-8 to UTF-16, and a trailing 0xFF happened to be found, it WOULD rightly have caused a UTF-8 conversion exception. As it was, Windows interpretted the 0xFF as a WINDOWS-1252 encoded character (which is why we got '' - its codepoint equals U+00FF). you are right. line 1404 in std.stream should call something like file.toMBSz instead of toStringz. > BUG 3 - Even though I was attempting to open a file for READING, an empty file got CREATED. Like - you try to open "foo" for reading, and "foo" gets created. Surely FileMode.In out to mean "do not create"? Interesting. It looks like on linux it errors. Seems reasonable to have Windows do the same. > Arcane Jill > > PS. I don't have newsgroup access, so I can't report these anywhere else, unless the bug-reporting forum has a web interface too. But these would seem to be sufficiently pervasive that they are likely to affect a whole lot of other people too. Particularly given that much discussion of late has been about streams, it seemed relevant to mention it here. in general bug reports should go to the bugs newsgroup. |
June 26, 2004 Re: A hat-trick of File bugs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | In article <cbj8kn$1pvm$1@digitaldaemon.com>, Arcane Jill says... > >BUG 1 - Introduced in DMD 0.93. It appears that if the filename is an exact multiple of sixteen bytes, then a spurious 'ÿ' character is appended to the filename. That is, you think you're opening "foo". The function actually tries to open "fooÿ". (You need a longer filename to see the effect, but you get the idea). I believe this bug is caused by appending an uninitialized char to the end of the filename, in an attempt to make it null-terminated for the benefit of underlying C-functions. Uninitialized chars now contain 0xFF. In combination with BUG 2, this would cause the observed symptoms. Already fixed in my update version. >BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN attempts to open the wrong file, but this time for a different reason. The reason is that the path parameter to new File() is passed in in UTF-8, but Windows filenames are in UTF-16. It is a bug for new File() /not/ to convert from UTF-8 to UTF-16 (on Windows) before attempting the open. The filename parameter to CreateFile(...) is an LPCTSTR, which really should use 16-bit wide characters in this case. So far I'm only calling CreateFileA (can't use CreateFile because macros don't work in D). I'll add a wchar version that calls CreateFileW. >BUG 3 - Even though I was attempting to open a file for READING, an empty file got CREATED. Like - you try to open "foo" for reading, and "fooÿ" gets created. Surely FileMode.In out to mean "do not create"? Haven't addressed the truncate, etc, flags yet. But you're right, this operation should fail. To that end, would it make more sense to throw an exception or return a bit and add an isOpen method? Sean |
June 26, 2004 Re: A hat-trick of File bugs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | "Arcane Jill" <Arcane_member@pathlink.com> escribió en el mensaje news:cbj8kn$1pvm$1@digitaldaemon.com | | ... | | BUG 2 - Opening a file whose filename contains non-ASCII characters AGAIN | attempts to open the wrong file, but this time for a different reason. The | reason is that the path parameter to new File() is passed in in UTF-8, but | Windows filenames are in UTF-16. It is a bug for new File() /not/ to convert | from UTF-8 to UTF-16 (on Windows) before attempting the open. The filename | parameter to CreateFile(...) is an LPCTSTR, which really should use 16-bit wide | characters in this case. | | Please note that the REASON why the default value for uninitialized chars was | changed to 0xFF was /precisely/ to catch situations like this. 0xFF is illegal | in UTF-8. *IF* you had attempted to convert from UTF-8 to UTF-16, and a trailing | 0xFF happened to be found, it WOULD rightly have caused a UTF-8 conversion | exception. As it was, Windows interpretted the 0xFF as a WINDOWS-1252 encoded | character (which is why we got 'ÿ' - its codepoint equals U+00FF). | | ... | | Arcane Jill | See my post "(fix) Re: unicode filenames: std.stream.File and std.path.listdir" in digitalmars.D.bugs on June 8th. There I attached a fixed stream.d which addresses that situation for Windows. I didn't know there could be such a problem in Linux, and I certainly don't know how to fix it. It's up to Walter now to fix std.stream in Phobos. | PS. I don't have newsgroup access, so I can't report these anywhere else, unless | the bug-reporting forum has a web interface too. But these would seem to be | sufficiently pervasive that they are likely to affect a whole lot of other | people too. Particularly given that much discussion of late has been about | streams, it seemed relevant to mention it here. Of course there's web interface for the bugs ng. ----------------------- Carlos Santander Bernal |
June 26, 2004 Re: A hat-trick of File bugs | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | Arcane Jill wrote: ... > > PS. I don't have newsgroup access, so I can't report these anywhere else, unless > the bug-reporting forum has a web interface too. But these would seem to be > sufficiently pervasive that they are likely to affect a whole lot of other > people too. Particularly given that much discussion of late has been about > streams, it seemed relevant to mention it here. Try out this: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs Each Digital Mars newsgroup group has a corresponding web interface. Here's a list: http://www.digitalmars.com/drn-bin/wwwnews?* -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/ |
Copyright © 1999-2021 by the D Language Foundation