Jump to page: 1 2 3
Thread overview
phobos/tango on win32: please drop ANSI "support"
Feb 14, 2007
Lionello Lunesu
Feb 14, 2007
BCS
Feb 14, 2007
Lionello Lunesu
Feb 14, 2007
BCS
Feb 14, 2007
Kirk McDonald
Feb 14, 2007
Walter Bright
Feb 15, 2007
Lionello Lunesu
Feb 15, 2007
kris
Feb 15, 2007
Lionello Lunesu
Feb 15, 2007
Frits van Bommel
Feb 15, 2007
Lionello Lunesu
Feb 15, 2007
Lars Ivar Igesund
Feb 15, 2007
Lionello Lunesu
Feb 15, 2007
Lars Ivar Igesund
Feb 15, 2007
Walter Bright
Feb 15, 2007
Lionello Lunesu
Feb 15, 2007
Walter Bright
Feb 15, 2007
Sean Kelly
Feb 15, 2007
Walter Bright
Feb 15, 2007
Sean Kelly
Feb 15, 2007
Lionello Lunesu
Feb 20, 2007
Don Clugston
Feb 15, 2007
Todor Totev
Feb 15, 2007
Thomas Kuehne
February 14, 2007
Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode.

I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.

There, I've said it.

L.
February 14, 2007
Lionello Lunesu wrote:
> Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode.
> 
> I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.
> 
> There, I've said it.
> 
> L.

Do you mean ASCII?
February 14, 2007
"BCS" <BCS@pathlink.com> wrote in message news:eqvgkt$ubi$1@digitalmars.com...
> Lionello Lunesu wrote:
>> Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode.
>>
>> I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.
>>
>> There, I've said it.
>>
>> L.
>
> Do you mean ASCII?

No, definitely not ASCII.. What does the A stand for in RegisterClassA, CreateWindowA, CreateFileA, etc.  in the Windows API?  W = Wide, 'wchar', but what's A?

From MSDN:

...with the specific "A" (ANSI) or "W" (wide, Unicode)...

L.


February 14, 2007
Lionello Lunesu wrote:
> Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode.
> 
> I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.

The "useWfuncs" only happens for Windows 9x (including Me). All Windows 9x systems are 8 bit internally, and even if you use the W interface, they are internally converted to 8 bits anyway.
February 14, 2007
Lionello Lunesu wrote:
> "BCS" <BCS@pathlink.com> wrote in message 
>>
>>Do you mean ASCII?
> 
> 
> No, definitely not ASCII.. What does the A stand for in RegisterClassA, CreateWindowA, CreateFileA, etc.  in the Windows API?  W = Wide, 'wchar', but what's A?
> 
> From MSDN:
> 
> ....with the specific "A" (ANSI) or "W" (wide, Unicode)...
> 
> L. 
> 
> 
Hm.. haven't heard of that before.
February 14, 2007
BCS wrote:
> Lionello Lunesu wrote:
> 
>> Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode.
>>
>> I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.
>>
>> There, I've said it.
>>
>> L.
> 
> 
> Do you mean ASCII?

Perhaps these would be edifying:

http://en.wikipedia.org/wiki/Windows_code_page
http://en.wikipedia.org/wiki/Windows-1252

In short, when someone says "ANSI" in reference to Windows or the Windows API, they mean the stuff in the above articles (which isn't actually an ANSI standard at all). Those are flat 8-bit encodings, and storing them in a UTF-8 datatype will only cause grief. As Lionello points out, modern versions of Windows use UTF-16 internally. (Although originally it was just UCS-2, and most Windows fonts don't know about anything beyond those two bytes.)

I agree with Lionello: UTF-8 is a terrible thing to call the Windows API with. When dealing with the Windows API in D, it is best to stick with wchar[].

-- 
Kirk McDonald
http://kirkmcdonald.blogspot.com
Pyd: Connecting D and Python
http://pyd.dsource.org
February 15, 2007
Walter Bright wrote:
> Lionello Lunesu wrote:
>> Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode.
>>
>> I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.
> 
> The "useWfuncs" only happens for Windows 9x (including Me). All Windows 9x systems are 8 bit internally, and even if you use the W interface, they are internally converted to 8 bits anyway.

Yes, they will be converted to "8 bits", but not to utf8. They will be converted to whatever code-page the thread's currently using, which is what's supposed to be done. That's my point: both Phobos and Tango pass utf8 to ANSI (..A) versions of Windows' functions, which is not correct.

You should either convert the utf8 to the correct code-page for passing to WhatEverA(..), or convert it to utf16 and pass it to WhatEverW(..). The last one is much easier: a fixed, straightforward conversion (no need to know about code-pages) that also happens to be efficient for Windows 2000 and up.

As for UseWFuncs: I don't like it because the check is done at run-time. It's allover the place, practically doubles all Win32 code, not to mention the imports / obj-size. More importantly, for the reasons mentioned above, I don't think it's necessary.

L.
February 15, 2007
Lionello Lunesu wrote:
> Walter Bright wrote:
> 
>> Lionello Lunesu wrote:
>>
>>> Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode.
>>>
>>> I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.
>>
>>
>> The "useWfuncs" only happens for Windows 9x (including Me). All Windows 9x systems are 8 bit internally, and even if you use the W interface, they are internally converted to 8 bits anyway.
> 
> 
> Yes, they will be converted to "8 bits", but not to utf8. They will be converted to whatever code-page the thread's currently using, which is what's supposed to be done. That's my point: both Phobos and Tango pass utf8 to ANSI (..A) versions of Windows' functions, which is not correct.

Regarding Tango, it uses the WindowsA functions only if -verion=Win32SansUnicode is configured. This switch is for supporting certain older environments, but does /not/ imply that code-pages are supported in Tango. There has never been an intent to do so.

For code-page support, we currently suggest using a library such as ICU to do the appropriate conversions.
February 15, 2007
Lionello Lunesu wrote:
> Walter Bright wrote:
>> The "useWfuncs" only happens for Windows 9x (including Me). All Windows 9x systems are 8 bit internally, and even if you use the W interface, they are internally converted to 8 bits anyway.
> 
> Yes, they will be converted to "8 bits", but not to utf8. They will be converted to whatever code-page the thread's currently using, which is what's supposed to be done. That's my point: both Phobos and Tango pass utf8 to ANSI (..A) versions of Windows' functions, which is not correct.
> 
> You should either convert the utf8 to the correct code-page for passing to WhatEverA(..),

It does convert to the correct code-page. See std.windows.charset.toMBSz().

> or convert it to utf16 and pass it to WhatEverW(..). The last one is much easier: a fixed, straightforward conversion (no need to know about code-pages)

This just does not work under Win9x, because most of the 'W' functions are not supported. (Also, Win9x internally converts the few 'W' functions it does support right back to 'A'.)

> that also happens to be efficient for Windows 2000 and up.

Under Windows NT, 2000, and up, the 'W' functions *are* called.

> As for UseWFuncs: I don't like it because the check is done at run-time.

It has to be done at runtime, because that's the only way to make it work between different Windows versions.

> It's allover the place, practically doubles all Win32 code, not to mention the imports / obj-size. More importantly, for the reasons mentioned above, I don't think it's necessary.

There's no hope for it unless all support for Win9x is dropped.
February 15, 2007
> I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.

Actually Microsoft are heading this way themselves.
See this blog post: http://blogs.msdn.com/michkap/archive/2005/10/02/476213.aspx
In short - Microsoft are not developing W/A APIs anymore.
Also, if you look at their latest software you'll notice that they are using
MSLU (Microsoft layer for UNICODE) so that their UNICODE programs can run on 9x.

My personal experience is that our customers don't even use Windows 2000.
Everyone is using XP for desktops and x64 for servers.
So what is your opinion? Do you need to support a 9x version of a program
for living?
Todor
« First   ‹ Prev
1 2 3