reduce mangled name sizes via link-time symbol renaming

Jan 25, 2018

Timothee Cour

Jan 26, 2018

Johannes Pfau

Jan 26, 2018

Seb

Jan 27, 2018

timotheecour

could a solution like proposed below be adapted to automatically reduce size of long symbol names? It allows final object files to be smaller; eg see the problem this causes: * String Switch Lowering: http://forum.dlang.org/thread/p4d777$1vij$1@digitalmars.com caution: NSFW! contains huge mangled symbol name! * http://lists.llvm.org/pipermail/lldb-dev/2018-January/013180.html "[lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols") ``` main.d: void foo_test1(){ } void main(){ foo_test1(); } dmd -c libmain.a ld -r libmain.a -o libmain2.a -alias _D4main9foo_test1FZv _foobar -unexported_symbol _D4main9foo_test1FZv # or : via `-alias_list filename` #NOTE: dummy.d only needed because somehow dmd needs at least one object file or source file, a static library is somehow not enough (dmd bug?) dmd -of=main2 libmain2.a dummy.d nm main2 | grep _foobar # ok ./main2 # ok ``` NOTE: to automate this process it could find all symbol names > threshold and apply a mapping form long mangled names to short aliases (eg: object_file_name + incremented_counter), that file with all the mappings can be supplied for a demangler (eg for lldb/gdb debugging etc)

January 26, 2018

Re: reduce mangled name sizes via link-time symbol renaming

Posted by Johannes Pfau
in reply to Timothee Cour

Permalink

Johannes Pfau

Posted in reply to Timothee Cour

Permalink

Am Thu, 25 Jan 2018 14:24:12 -0800
schrieb Timothee Cour <thelastmammoth@gmail.com>:

> could a solution like proposed below be adapted to automatically reduce size of long symbol names?
> 
> It allows final object files to be smaller; eg see the problem this causes:
> 
> * String Switch Lowering:
> http://forum.dlang.org/thread/p4d777$1vij$1@digitalmars.com
> caution: NSFW! contains huge mangled symbol name!
> * http://lists.llvm.org/pipermail/lldb-dev/2018-January/013180.html
> "[lldb-dev] Huge mangled names are causing long delays when loading
> symbol table symbols")
> 
> 
> ```
> main.d:
> void foo_test1(){ }
> void main(){ foo_test1(); }
> 
> dmd -c libmain.a
> 
> ld -r libmain.a -o libmain2.a -alias _D4main9foo_test1FZv _foobar
> -unexported_symbol _D4main9foo_test1FZv
> # or : via `-alias_list filename`
> 
> #NOTE: dummy.d only needed because somehow dmd needs at least one object file or source file, a static library is somehow not enough (dmd bug?)
> 
> dmd -of=main2 libmain2.a dummy.d
> 
> nm main2 | grep _foobar # ok
> 
> ./main2 # ok
> ```
> 
> NOTE: to automate this process it could find all symbol names > threshold and apply a mapping form long mangled names to short aliases (eg: object_file_name + incremented_counter), that file with all the mappings can be supplied for a demangler (eg for lldb/gdb debugging etc)

What is the benefit of using link-time renaming (a linker specific feature) instead of directly renaming the symbol in the compiler? We could be quite radical and hash all symbols > a certain threshold. As long as we have a hash function with strong enough collision resistance there shouldn't be any problem.

AFAICS we only need the mapping hashed_name ==> full name for debugging. So maybe we can simply stuff the full, mangled name somehow into dwarf debug information? We can even keep dwarf debug information in external files and support for this is just being added to GCCs libbacktrace, so even stack traces could work fine.

-- Johannes

On Friday, 26 January 2018 at 07:34:50 UTC, Johannes Pfau wrote: > Am Thu, 25 Jan 2018 14:24:12 -0800 > schrieb Timothee Cour <thelastmammoth@gmail.com>: > >> [...] > > What is the benefit of using link-time renaming (a linker specific feature) instead of directly renaming the symbol in the compiler? We could be quite radical and hash all symbols > a certain threshold. As long as we have a hash function with strong enough collision resistance there shouldn't be any problem. > > AFAICS we only need the mapping hashed_name ==> full name for debugging. So maybe we can simply stuff the full, mangled name somehow into dwarf debug information? We can even keep dwarf debug information in external files and support for this is just being added to GCCs libbacktrace, so even stack traces could work fine. > > -- Johannes I thought LDC is already doing this with -hashtres? https://github.com/ldc-developers/ldc/pull/1445

On Friday, 26 January 2018 at 08:44:26 UTC, Seb wrote: >> What is the benefit of using link-time renaming (a linker specific feature) instead of directly renaming the symbol in the compiler? We could be quite radical and hash all symbols > a certain threshold. As long as we have a hash function with strong enough collision resistance there shouldn't be any problem. >> -- Johannes > > I thought LDC is already doing this with -hashtres? > > https://github.com/ldc-developers/ldc/pull/1445 * What i suggested doesn't require any hashing, so it can produce minimal symbol size with 0 risk of collision, in fact optimally minimum symbol size if we wanted to (using an incremented counter i to remap the i'th symbol) * -hashtres is still experimental, and doesn't work with phobos, and has a lower bound on symbol size since it's using a hash; it has other limitations as you can see in https://github.com/ldc-developers/ldc/pull/1445#issue-149189001 * a potential extension of this proposal is to do it not at link time but at compile time, where we'd maintain (in memory) the mapping long_mangle=>short_mangle and serialize it to a file in case we'd like to support separate compilation.

Forums