Thread overview
Multibyte support on Windows, Phobos vs Tango, which is right?
Apr 09, 2008
yidabu
Apr 10, 2008
yidabu
Ansi Win32 API implementation only moudles in Tango
Apr 10, 2008
yidabu
Apr 10, 2008
Bill Baxter
Apr 11, 2008
yidabu
Apr 11, 2008
yidabu
April 09, 2008
Multibyte support on Windows, Phobos vs Tango, which is right ?

1  Phobos has toMBSz function for Converts the UTF-8 string s into a null-terminated string in a Windows
   8-bit character set.
   like this:

    char* toMBSz(char[] s, uint codePage = 0)
    {
        // Only need to do this if any chars have the high bit set
        foreach (char c; s)
        {
            if (c >= 0x80)
            {
                //do convert
            }
        }
        return std.string.toStringz(s);
    }

   Tango has not this function, is it necessary ?

2  Is toMBSz(char[]) same as char[] ~ '\0' ?

    for example, FileCreateA

    Phobos way:
    char[] name;
    CreateFileA(toMBSz(name) ...)

    Tango way:
    char[] name;
    FileCreateA( name ~ '\0' ...)

    Is toMBSz(char[]) always same as char[] ~ '\0' ?
    Is toMBSz("Chinese汉语"c) always same as "Chinese汉语"c ~ '\0' ?

    If Phobos is right, too many bugs in Tango, Tango use char[] ~ '\0' everywhere for calling A version Windows API!


3   Phobos zip vs Tango Zip

    I used Phobos zip module, it works fine, a trick is zip.ArchiveMember.name should be locale encode for multibyte environment.

    Tango way:
    char[][] files = [r"D:\Chinese中文.txt"];
    createArchive(r"test.zip", Method.Deflate, files);

    cause Exception:
    object.Exception: cannot encode character "20013" in codepage 437.

    Tango seems lacks multibyte support on Windows,
    and have not run special unittests for multibyte environment on Windows before publish a new vesion.




-- 
yidabu <yidabu.nospam@gmail.com>
D语言 中文支持(D Chinese Support)
http://www.d-programming-language-china.org/
http://bbs.d-programming-language-china.org/
http://dwin.d-programming-language-china.org/
http://scite4d.d-programming-language-china.org/
April 10, 2008
On Wed, 9 Apr 2008 23:51:59 -0800
"Kris" <foo@bar.com> wrote:

> Yidabu:
> 
> Tango has a multi-platform API based around Unicode, thus it is not biased for windows, linux, or darwin. All the items you mention appear to be reasonably specific to Win32, so keep that in mind when reading this reply:
> 
> 
> 1) You'll find something functionally similar in tango.sys.win32.CodePage
> 
> 
> 2) Like many O/S, Tango expects file names to be Unicode. This helps makes the library portable. On Win32 the blahW() functions are used, with utf8 to utf16 conversion applied internally, except when you explicitly stipulate the version=Win32SansUnicode compiler option. If you do that, Tango currently does no internal conversion for file names. In short, if you explicitly disable Unicode support within the library then you currently need to handle Win32 code-page conversion yourself (see #1). This might be a problem if you're running Tango on Win95 or an old Win32S hybrid
> 
> 
> 3) you have a recent ticket open for this specific issue, and it is somewhat related to #2 above. By default, Tango should happily handle Unicode names in a portable manner between O/S. Your ticket has identified a problem with the zip package, which does need to be fixed. Perhaps you'd like to try fixing the bug in the zip package yourself? Tango is open-source, and patches are always welcome. If you'd like to add some more multibyte testcases to the codebase, we'd certainly be happy to run them.
> 
> 
> Hope that helps
> 
> 
> 
> 
> "yidabu" <yidabu.nospam@gmail.com> wrote in message news:20080410071434.587eb8e9.yidabu.nospam@gmail.com...
> > Multibyte support on Windows, Phobos vs Tango, which is right ?
> >
> > 1  Phobos has toMBSz function for Converts the UTF-8 string s into a
> > null-terminated string in a Windows
> >   8-bit character set.
> >   like this:
> >
> >    char* toMBSz(char[] s, uint codePage = 0)
> >    {
> >        // Only need to do this if any chars have the high bit set
> >        foreach (char c; s)
> >        {
> >            if (c >= 0x80)
> >            {
> >                //do convert
> >            }
> >        }
> >        return std.string.toStringz(s);
> >    }
> >
> >   Tango has not this function, is it necessary ?
> >
> > 2  Is toMBSz(char[]) same as char[] ~ '\0' ?
> >
> >    for example, FileCreateA
> >
> >    Phobos way:
> >    char[] name;
> >    CreateFileA(toMBSz(name) ...)
> >
> >    Tango way:
> >    char[] name;
> >    FileCreateA( name ~ '\0' ...)
> >
> >    Is toMBSz(char[]) always same as char[] ~ '\0' ?
> >    Is toMBSz("Chinese汉语"c) always same as "Chinese汉语"c ~ '\0' ?
> >
> >    If Phobos is right, too many bugs in Tango, Tango use char[] ~ '\0'
> > everywhere for calling A version Windows API!
> >
> >
> > 3   Phobos zip vs Tango Zip
> >
> >    I used Phobos zip module, it works fine, a trick is
> > zip.ArchiveMember.name should be locale encode for multibyte environment.
> >
> >    Tango way:
> >    char[][] files = [r"D:\Chinese中文.txt"];
> >    createArchive(r"test.zip", Method.Deflate, files);
> >
> >    cause Exception:
> >    object.Exception: cannot encode character "20013" in codepage 437.
> >
> >    Tango seems lacks multibyte support on Windows,
> >    and have not run special unittests for multibyte environment on Windows
> > before publish a new vesion.
> >

Kris,
    Thanks for you reply.

    1) I know the CodePage module, the issue is Tango does not use it for conversion file names.

    2) since pass (char[] ~ '\0') to Ansi Win32 API is not the right way, Why not instead of Phobos way ?
    Does pass toMBsz(char[]) to Ansi Win32 API influence on the library portable?
    Does Ansi Win32 API infulence on the library portalbe (My code is Unicode, just Ansi Win32API need local codepage encode, not me:) ?

    Some Tango modules only have Ansi Win32 API implementation, what Tango users can do ? copy the modue to somewhere, modify (char[] ~ '\0') to toMBSz(char[]) before use this module?

   3) Since tango pass (char[] ~ '\0') to Ansi Win32 API everywhere, sometimes, it is diffcult to debug the code.

   Thank Tango team for the exciting Library you offered to all of us.









-- 

yidabu <yidabu.nospam@gmail.com>
DWin http://www.dsource.org/projects/dwin

D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/
April 10, 2008
On Thu, 10 Apr 2008 01:23:35 -0800
"Kris" <foo@bar.com> wrote:

> 
> "yidabu" wrote in message
> 
> > Some Tango modules only have Ansi Win32 API implementation
> 
> If this is true, then please write a ticket for it noting the module(s) in question

I've written a funciton to find the modules:

import  dwin.text.pcre.RegExp;
import  tango.text.Util;
import  tango.io.File;
import  tango.util.log.Trace;

FileScan findAnsiWinAPI(char[] path)
{
    auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\(");
    auto scan = new FileScan;
    scan
    (
        path,
        (FilePath fp, bool isDir)
        {
            if(isDir) return true;
            if(fp.suffix != ".d") return false;
            auto content = cast(char[]) (new File(fp)).read;
            if(auto m = regex.execute(content))
            {
                if(!content.containsPattern(m[1] ~ "W"))
                {
                    Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W");
                    return true;
                }
            }
            return false;
        }
    );

    return scan;
}

void main()
{
    char[] path = r"path\to\tango\tango\";
    auto fs = findAnsiWinAPI(path);

}



the result is :

tango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains GetLogicalDriveStringsW tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileW tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains CreateFileMappingW tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/sys/Process.d contains CreateProcessA, but not contains CreateProcessW tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains LoadLibraryW


-- 

yidabu <yidabu.nospam@gmail.com>
DWin http://www.dsource.org/projects/dwin

D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/
April 10, 2008
yidabu wrote:
> On Thu, 10 Apr 2008 01:23:35 -0800
> "Kris" <foo@bar.com> wrote:
> 
>> "yidabu" wrote in message
>>
>>> Some Tango modules only have Ansi Win32 API implementation
>> If this is true, then please write a ticket for it noting the module(s) in question
> 
> I've written a funciton to find the modules:
> 
> import  dwin.text.pcre.RegExp;
> import  tango.text.Util;
> import  tango.io.File;
> import  tango.util.log.Trace;
> 
> FileScan findAnsiWinAPI(char[] path)
> {
>     auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\(");
>     auto scan = new FileScan;
>     scan
>     (
>         path,
>         (FilePath fp, bool isDir)
>         {
>             if(isDir) return true;
>             if(fp.suffix != ".d") return false;
>             auto content = cast(char[]) (new File(fp)).read;
>             if(auto m = regex.execute(content))
>             {
>                 if(!content.containsPattern(m[1] ~ "W"))
>                 {
>                     Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W");
>                     return true;
>                 }
>             }
>             return false;
>         }
>     );
> 
>     return scan;
> }
> 
> void main()
> {
>     char[] path = r"path\to\tango\tango\";
>     auto fs = findAnsiWinAPI(path);
> 
> }
> 
> 
> 
> the result is :

You, sir (or ma'am), are hard core.  And I applaud that.

> tango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains GetLogicalDriveStringsW

I don't think it's posssible for a logical drive to have non-ascii characters is it?  So that should be ok.

> tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileW

It only creates a few specially named files, which are always ascii names.  ("CONIN$\0", "CONOUT$\0", "CONOUT$\0")


> tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains CreateFileMappingW

Passes null in for all string parameters, so shouldn't matter that it's just using the A version.

> tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but not
contains CreateSemaphoreW

Ditto for these.  They use null for the string params.

> tango/tango/sys/Process.d contains CreateProcessA, but not contains CreateProcessW

*THIS* looks like it could be a genuine problem.  So someone more familiar with the code should take a closer look.

> tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains LoadLibraryW

This looks potentially problematic too.

--bb
April 11, 2008
On Fri, 11 Apr 2008 08:44:55 +0900
Bill Baxter <dnewsgroup@billbaxter.com> wrote:

> yidabu wrote:
> > On Thu, 10 Apr 2008 01:23:35 -0800
> > "Kris" <foo@bar.com> wrote:
> > 
> >> "yidabu" wrote in message
> >>
> >>> Some Tango modules only have Ansi Win32 API implementation
> >> If this is true, then please write a ticket for it noting the module(s) in question
> > 
> > I've written a funciton to find the modules:
> > 
> > import  dwin.text.pcre.RegExp;
> > import  tango.text.Util;
> > import  tango.io.File;
> > import  tango.util.log.Trace;
> > 
> > FileScan findAnsiWinAPI(char[] path)
> > {
> >     auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\(");
> >     auto scan = new FileScan;
> >     scan
> >     (
> >         path,
> >         (FilePath fp, bool isDir)
> >         {
> >             if(isDir) return true;
> >             if(fp.suffix != ".d") return false;
> >             auto content = cast(char[]) (new File(fp)).read;
> >             if(auto m = regex.execute(content))
> >             {
> >                 if(!content.containsPattern(m[1] ~ "W"))
> >                 {
> >                     Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W");
> >                     return true;
> >                 }
> >             }
> >             return false;
> >         }
> >     );
> > 
> >     return scan;
> > }
> > 
> > void main()
> > {
> >     char[] path = r"path\to\tango\tango\";
> >     auto fs = findAnsiWinAPI(path);
> > 
> > }
> > 
> > 
> > 
> > the result is :
> 
> You, sir (or ma'am), are hard core.  And I applaud that.
> 
> > tango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains GetLogicalDriveStringsW
> 
> I don't think it's posssible for a logical drive to have non-ascii characters is it?  So that should be ok.
> 
> > tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileW
> 
> It only creates a few specially named files, which are always ascii names.  ("CONIN$\0", "CONOUT$\0", "CONOUT$\0")
> 
> 
> > tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains CreateFileMappingW
> 
> Passes null in for all string parameters, so shouldn't matter that it's just using the A version.
> 
> > tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but not
> contains CreateSemaphoreW
> 
> Ditto for these.  They use null for the string params.
> 
> > tango/tango/sys/Process.d contains CreateProcessA, but not contains CreateProcessW
> 
> *THIS* looks like it could be a genuine problem.  So someone more familiar with the code should take a closer look.
> 
> > tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains LoadLibraryW
> 
> This looks potentially problematic too.
> 
> --bb


I'll copy your words to Tango ticket :)


-- 

yidabu <yidabu.nospam@gmail.com>
DWin http://www.dsource.org/projects/dwin

D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/
April 11, 2008
On Fri, 11 Apr 2008 19:57:48 +0800
yidabu <yidabu.nospam@gmail.com> wrote:

> On Fri, 11 Apr 2008 08:44:55 +0900
> Bill Baxter <dnewsgroup@billbaxter.com> wrote:
> 
> > yidabu wrote:
> > > On Thu, 10 Apr 2008 01:23:35 -0800
> > > "Kris" <foo@bar.com> wrote:
> > > 
> > >> "yidabu" wrote in message
> > >>
> > >>> Some Tango modules only have Ansi Win32 API implementation
> > >> If this is true, then please write a ticket for it noting the module(s) in question
> > > 
> > > I've written a funciton to find the modules:
> > > 
> > > import  dwin.text.pcre.RegExp;
> > > import  tango.text.Util;
> > > import  tango.io.File;
> > > import  tango.util.log.Trace;
> > > 
> > > FileScan findAnsiWinAPI(char[] path)
> > > {
> > >     auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\(");
> > >     auto scan = new FileScan;
> > >     scan
> > >     (
> > >         path,
> > >         (FilePath fp, bool isDir)
> > >         {
> > >             if(isDir) return true;
> > >             if(fp.suffix != ".d") return false;
> > >             auto content = cast(char[]) (new File(fp)).read;
> > >             if(auto m = regex.execute(content))
> > >             {
> > >                 if(!content.containsPattern(m[1] ~ "W"))
> > >                 {
> > >                     Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W");
> > >                     return true;
> > >                 }
> > >             }
> > >             return false;
> > >         }
> > >     );
> > > 
> > >     return scan;
> > > }
> > > 
> > > void main()
> > > {
> > >     char[] path = r"path\to\tango\tango\";
> > >     auto fs = findAnsiWinAPI(path);
> > > 
> > > }
> > > 
> > > 
> > > 
> > > the result is :
> > 
> > You, sir (or ma'am), are hard core.  And I applaud that.
> > 
> > > tango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains GetLogicalDriveStringsW
> > 
> > I don't think it's posssible for a logical drive to have non-ascii characters is it?  So that should be ok.
> > 
> > > tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileW
> > 
> > It only creates a few specially named files, which are always ascii names.  ("CONIN$\0", "CONOUT$\0", "CONOUT$\0")
> > 
> > 
> > > tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains CreateFileMappingW
> > 
> > Passes null in for all string parameters, so shouldn't matter that it's just using the A version.
> > 
> > > tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but not
> > contains CreateSemaphoreW
> > 
> > Ditto for these.  They use null for the string params.
> > 
> > > tango/tango/sys/Process.d contains CreateProcessA, but not contains CreateProcessW
> > 
> > *THIS* looks like it could be a genuine problem.  So someone more familiar with the code should take a closer look.
> > 
> > > tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains LoadLibraryW
> > 
> > This looks potentially problematic too.
> > 
> > --bb
> 
> 
> I'll copy your words to Tango ticket :)
> 

ticket for this:

http://www.dsource.org/projects/tango/ticket/1035

-- 

yidabu <yidabu.nospam@gmail.com>
DWin http://www.dsource.org/projects/dwin

D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/