Thread overview
Fix bugs caused by encoding in the DMD compiler under Windows
May 08, 2023
mm
May 10, 2023
mm
May 10, 2023
mm
May 10, 2023
mm
May 08, 2023

This post should have been posted to the DMD compiler area, but I struggled for 3 hours and couldn't get there. I'll try posting here to see if I can successfully post it

修正dmd编译器在windows下编码导致的bug
Fix bugs caused by encoding in the DMD compiler under Windows

以下问题在 dmd 2.103.1 99.1 100.1版本都存在
The following issues exist in all versions of dmd 2.103.1, 99.1, and 100.1

一般linux使用utf8不会出现这个问题
Normally, using utf8 on Linux does not cause this issue

只有windows才会出现. 当win10以上系统Windows ANSI code page = utf8时该问题也不会出现
Only Windows will appear This issue will not occur when Windows ANSI code page=utf8 is used on systems above win10

由于和linux 系统表现不一致 所以把这问题定义为bug
Due to inconsistent performance with the Linux system, this issue is defined as a bug

下面来复现这个bug 然后修复它
Now let's reproduce this bug and fix it

假设:
Assumption:

系统Windows ANSI code page != utf8
System Windows ANSI code page != utf8

有2个源码文件 a.d
There are two source code files, a.d 你好.d

a.d 文件内容如下:
a.d The file content is as follows:

import 你好;


此时我们 cmd.exe 下输入
At this point, we cmd.exe Lower input

dmd a.d //失败无法找到 你好.d (乱码)
//Failed to find 你好. d (garbled code)

之所以出现这个问题是因为dmd 访问文件的时候需要把文件名称 转换为utf16
The reason for this issue is that when dmd accesses files, it needs to convert the file name to utf16

但是dmd转换 参数出现了错误
But there was an error in the dmd conversion parameters

下面来修复该问题:
Let's fix this issue:

1
1.1 打开 ..\dmd\dmd\common\string.d
open ..\dmd\dmd\common\string.d

1.2 查找 toWStringz
search for toWStringz

1.3 修改如下:
Modify as follows:

version(Windows) wchar[] toWStringz(const(char)[] narrow, ref SmallBuffer!wchar buffer) nothrow
{
//import core.sys.windows.winnls : CP_ACP, MultiByteToWideChar;
import core.sys.windows.winnls : CP_UTF8, MultiByteToWideChar;
// assume filenames encoded in system default Windows ANSI code page
//enum CodePage = CP_ACP;
enum CodePage = CP_UTF8;

1.4 保存 并编译dmd
Save and compile dmd


此时输入dmd a.d 完成ok
At this point, enter dmd a.d to complete OK

此时输入dmd 你好.d 失败
At this point, enter 你好.d failed at this time

原因是cmd的编码使用的是ANSI 他使用 toWStringz转换的参数也有问题 不能再使用这个函数
The reason is that the encoding of cmd uses ANSI, and there are also issues with the parameters converted using toWStringz. This function cannot be used anymore

下面修正问题
Fix the problem

2
2.1 打开 ..\dmd\dmd\common\string.d
open ..\dmd\dmd\common\string.d

2.2 添加函数 如下:
Add functions :

// 使用windows api 互相转换编码
// Using the Windows API to convert encoding to and from each other
version(Windows) char* Encodingconversion(const(char)* buffer,int CodePage,int toCodePage )
{
import core.sys.windows.winnls : MultiByteToWideChar,WideCharToMultiByte;
import core.stdc.string : strlen;

    int bufferlen = cast(int)strlen(buffer);

int utf16len = MultiByteToWideChar(CodePage, 0, buffer,bufferlen, null, cast(int) 0);
wchar[] utf16 = new wchar[utf16len];
utf16len = MultiByteToWideChar(CodePage, 0, buffer, bufferlen, utf16.ptr, utf16len);


int len=WideCharToMultiByte(toCodePage, 0, utf16.ptr, cast(int)utf16len, null, 0, null, null);

char* utfx= cast(char*)new char[len];

WideCharToMultiByte(toCodePage, 0, utf16.ptr, cast(int)utf16len, utfx, len, null, null);
utfx[len]='\0';

return utfx;

}
2.3 保存..
Save ..

2.4 打开 ..\dmd\dmd\mars.d
open ..\dmd\dmd\mars.d

2.5 查找 main(int
search for main(int

2.6 修改如下:
Modify as follows:

extern (C) int main(int argc, char** argv)
{
    bool lowmem = false;
    foreach (i; 1 .. argc)
    {
        if (strcmp(argv[i], "-lowmem") == 0)
        {
            lowmem = true;
            break;
        }
    }
    if (!lowmem)
    {
        __gshared string[] disable_options = [ "gcopt=disable:1" ];
        rt_options = disable_options;
        mem.disableGC();
    }
version(Windows)
{
	//不要把该代码放在上面的循环体
	//Do not place this code in the loop body above


	//当 { lowmem == true  }  时会出错误
	//When {lowmem==true}, an error will occur
	foreach (i; 0 .. argc)
   		{
		import dmd.common.string;
		import core.sys.windows.winnls : GetACP,CP_UTF8;
		int CodePage=GetACP();
		if(CodePage!= CP_UTF8)
		{
			argv[i]=Encodingconversion(argv[i] , CodePage,cast(int)CP_UTF8);
		}
	}
}
    // initialize druntime and call _Dmain() below
    return _d_run_main(argc, argv, &_Dmain);
}

2.7 保存
Save


dmd 你好.d 链接失败 link failure

原因是dmd输出的命令编码有问题
The reason is that there is an issue with the encoding of the command output by DMD

2.8 打开 ..\dmd\dmd\link.d
open ..\dmd\dmd\link.d

2.9 查找 executecmd
search for executecmd

 找到   find:
 private int executecmd(const(char)* cmd, const(char)* args)

 修改为  Modify to:
 private int executecmd1(const(char)* cmd, const(char)* args)

2.10 在修改代码的上方 加入函数 :
Add functions above the modified code:

 private int executecmd(const(char)* cmd, const(char)* args)
{
       //编译器调用外部连接器cmd 必须把utf8编码转换为Windows ANSI code
       //The compiler must convert utf8 encoding to Windows ANSI code when calling external connector cmd
    import std.stdio;
import dmd.common.string;
import core.stdc.string : strlen;
import core.sys.windows.winnls : GetACP,CP_UTF8;

int CodePage=GetACP();
if(CodePage!= CP_UTF8)
{
	char* args1=Encodingconversion(args ,cast(int)CP_UTF8, CodePage);
	char* cmd1=Encodingconversion(cmd ,cast(int)CP_UTF8, CodePage);
	return executecmd1(cmd1,args1);
}
return executecmd1(cmd,args);
}

2.11 保存 并编译 编译器 dmd
Save and compile dmd


此时在cmd.exe
At this point, in cmd.exe

此时输入dmd a.d 完成ok
At this point, enter dmd a.d to complete OK

此时输入dmd 你好.d 完成ok
At this point, enter dmd Hello. d Complete OK

bug修复完成了问题
The bug has been fixed and the problem has been resolved


另外说一个问题 应该是标准库的问题
Another issue should be with the standard library

以下问题在windows dmd 2.103.1 版本都存在
The following issues exist in Windows DMD version 2.103.1

extern (C) int main(int argc, char** argv)
{
argv[i] ///编码 == 当前系统编码
argv[i] ///编码 == Encoding ==Current system code
}
extern (D) int main(string[] argv)
{
argv[i] //编码 == utf8
argv[i] //Encoding ==utf8
}
extern (C++) int main(int argc, char** argv)
{
argv[i] //不是编码问题了,是数据不可用 .
//It's not a coding issue anymore, it's data unavailable
}

May 09, 2023
Okay lets go through all the proposed changes:

1.1
1.2
1.3

Okay reviewing where toWStringz is being used, yes, toWStringz should be converted to use CP_UTF8 not CP_ACP. Everything that uses it is either based upon cli or D source which is going to be UTF-8.

https://github.com/dlang/dmd/blob/be151e6d854c0df8af7ee88b6f380b6283ea824f/compiler/src/dmd/common/string.d#L136



2.1
2.2
2.3

Not needed.



2.4
2.5
2.6
2.7

These are unnecessary. The processing there only occurs for -lowmem switch.

Druntime will retrieve the CLI arguments separately and convert from UTF-16 before calling the D user main function (including for dmd).

https://github.com/dlang/dmd/blob/master/druntime/src/rt/dmain2.d#L268

Everything there is ok.



2.8
2.9
2.10

CreateProcessA has not been upgraded to include UTF-8 support, so that will need to be swapped out to be UTF-16.

https://github.com/dlang/dmd/blob/master/compiler/src/dmd/link.d#L892

Simple call into toWStringz will convert post OutBuffer no problem.



https://issues.dlang.org/show_bug.cgi?id=23906
May 10, 2023
但是在我的计算机上不进行以下修改的话无法使用

But it cannot be used on my computer without making the following modifications


cmd.exe
dmd 你好.d     //It cannot be compiled



> 2.1
> 2.2
> 2.3
>
>
>
>
> 2.4
> 2.5
> 2.6
> 2.7
>
>
>
>
> 2.8
> 2.9
> 2.10
>


只修改  Only modify
1.1  1.2 1.3
只可以编译  Can only be compiled
dmd a.d





May 10, 2023
我又试了一次,不进行 2.1 --2.10行不通的
I tried again, it won't work without going through 2.1 to 2.10


E:\Users\mm\Desktop\code\d\新建文件夹>dmd 我.d
Error: cannot find input file `我.d`
import path[0] = E:\cx\Programming\Complier_interpretr_Actuat\d\dmd.2\windows\bi
n64\..\..\src\phobos
import path[1] = E:\cx\Programming\Complier_interpretr_Actuat\d\dmd.2\windows\bi
n64\..\..\src\druntime\import

当修改1.1 --1.3后 就会出现以上问题的
The above issues will occur after modifying 1.1 to 1.3
May 10, 2023
extern (C) int main(int argc, char** argv)
{
argv[i] ///编码 == 当前系统编码
argv[i] /// Encoding ==Current system code
                                    这种风格不是utf8
                                    This style is not utf8
}


extern (D) int main(string[] argv)
{
argv[i] //编码 == utf8
argv[i] //Encoding ==utf8
                              只有这种风格 才是utf8
                              Only this style is utf8
}


extern (C++) int main(int argc, char** argv)
{
argv[i] //不是编码问题了,是数据不可用 .
//It's not a coding issue anymore, it's data unavailable
}

你肯定没有预料到 事情如上
You certainly didn't expect things to go like this
dmd使用的是
DMD uses
extern (C) int main(int argc, char** argv)
argv  //Encoding == Current system code

所以会出错  So there will be an error