Thread overview
[Issue 23179] Unicode in symbol names in DLLs breaks MSVC linker
Jun 12, 2022
kinke
Jun 12, 2022
Richard Cattermole
Jun 12, 2022
Dlang Bot
Jun 13, 2022
Richard Cattermole
Jan 26, 2023
Walter Bright
Jan 26, 2023
Richard Cattermole
Jan 28, 2023
Richard Cattermole
June 12, 2022
https://issues.dlang.org/show_bug.cgi?id=23179

kinke <kinke@gmx.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kinke@gmx.net

--- Comment #1 from kinke <kinke@gmx.net> ---
To be clear, we're talking about linker directives (cmdline option strings) embedded in COFF object files. LDC uses UTF8 encoding for these (IIRC), and those do work with the LLD linker, but don't with the MS linker. So I *guess* the MS linker expects some other encoding.

--
June 12, 2022
https://issues.dlang.org/show_bug.cgi?id=23179

--- Comment #2 from Richard Cattermole <alphaglosined@gmail.com> ---
After a bunch of hunting wrt. GetProcAddress, it seems Microsoft does not intend for exports to support anything other than ANSI. There are no A/W versions of this function which based upon consistency means that it only takes ANSI.

Which gets us back to the fact that we will probably need to sanitize mangling to not include Unicode, at least on Windows.

--
June 12, 2022
https://issues.dlang.org/show_bug.cgi?id=23179

Dlang Bot <dlang-bot@dlang.rocks> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |pull

--- Comment #3 from Dlang Bot <dlang-bot@dlang.rocks> ---
@rikkimax created dlang/dmd pull request #14207 "[DO NOT MERGE] Fix Issue 23179 - Unicode in symbol names in DLLs breaks MSVC linker" fixing this issue:

- Fix Issue 23179 - Unicode in symbol names in DLLs breaks MSVC linker

https://github.com/dlang/dmd/pull/14207

--
June 13, 2022
https://issues.dlang.org/show_bug.cgi?id=23179

--- Comment #4 from Richard Cattermole <alphaglosined@gmail.com> ---
Created attachment 1854
  --> https://issues.dlang.org/attachment.cgi?id=1854&action=edit
Attempted fix as patch

After talking with kinke, we have decided to wait for this to appear in the wild before fixing.

I've attached my proposed fix as a patch, in case something happens to my fork with the branch containing it.

If you experience this please do reply!

--
January 26, 2023
https://issues.dlang.org/show_bug.cgi?id=23179

Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |bugzilla@digitalmars.com
         Resolution|---                         |WONTFIX

--- Comment #5 from Walter Bright <bugzilla@digitalmars.com> ---
There are other limitations on names we accept on Windows, such as the file names being insensitive to case. This has tripped up a handful of people, but people do accept it for what it is. It's not an onerous limitation.

If the Microsoft linker fails at Unicode characters, so be it. Turning them into hex makes the mangled names even uglier and longer. Demangling them also becomes another problem.

I suggest to just let Microsoft worry about this issue. They'll probably eventually fix their linker anyway. It's not worth us fixing it, then unfixing it when MS updates their linker.

So WONTFIX.

--
January 26, 2023
https://issues.dlang.org/show_bug.cgi?id=23179

--- Comment #6 from Richard Cattermole <alphaglosined@gmail.com> ---
They won't eventually fix this.

It permeates the kernel and WinAPI as well.

It is an intentional limitation that occasionally becomes an issue on other platforms as well. Other languages like Rust use Punycode for encoding Unicode.

I picked hex for my implementation because it's easy to encode and also decode.

So making this WONTFIX not only prevents statically binding against c/c++ code but it also leaves people who have Unicode names in symbols with no option to compile their existing codebases as DLLs.

--
January 28, 2023
https://issues.dlang.org/show_bug.cgi?id=23179

--- Comment #7 from Richard Cattermole <alphaglosined@gmail.com> ---
Okay I may end up eating my words on this one.

I can't reproduce on VS 2022.

But what I can get on dmd&ldc rather than VC is:

```
   Creating library test.lib and object test.exp
test.exp : error LNK2001: unresolved external symbol _µ
  Hint on symbols that are defined and could potentially match:
    _µ
test.exe : fatal error LNK1120: 1 unresolved externals
```

So something isn't right, will need to review this at some other point in time and file a different bug report if I can figure out what is going on there.

--
February 07
https://issues.dlang.org/show_bug.cgi?id=23179

Richard Cattermole <alphaglosined@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://issues.dlang.org/sh
                   |                            |ow_bug.cgi?id=19418

--