This post is intended to shed some light on shared-library details, common pitfalls, and compare Posix and Windows. It's LDC-centric. I'm trying not to go into too many details, but that's hard. :)
Posix
I'm focusing on ELF here (Linux, BSDs, …), but Apple's Mach-O seems to work analogously (except for no shared druntime/Phobos support for macOS with DMD yet).
For symbols to be accessible from other binaries, they need to be 'dynamic symbols', which can e.g. be inspected via objdump -T <binary>
(or readelf --dyn-syms <binary>
). A symbol becomes a dynamic symbol if both of these requirements are met:
- object file: default (ELF) symbol visibility (
STV_DEFAULT
), not hidden (STV_HIDDEN
) - binary: exporting these symbols at link-time via
--export-dynamic
(all default-visibility symbols) or selectively via--export-dynamic-symbol[-list]
(linker support varies)--export-dynamic
is the default setting when linking a shared library- but not for executables (DMD adds it implicitly to the linker command, LDC doesn't)
With LDC, the object-file symbol visibility is controlled by -fvisibility={public,hidden}
(analogous to gcc/clang - controlling the default visibility for all symbol definitions), as well as explicit export
D visibility and the @ldc.attributes.hidden
UDA.
The compiler doesn't need to know whether an external symbol is going to be provided by some object file/static library or a shared library at link-time (no 'import' complications at all; -dllimport
is ignored on Posix). On Posix, static and shared libs are mostly interchangeable, everything 'just works'.
Unifying duplicate dynamic symbols across the whole process
One important aspect is that the dynamic loader 'unifies' dynamic symbols if multiple binaries define it (probably using the first encountered symbol). So dynamic-symbol addresses are identical in these binaries, and the binaries operate on the same shared state (data symbols).
Say we have a D executable statically linked against the concurrency
dub library, and a shared D library that contains its own concurrency
library (linked statically into the shared library). If both binaries export their concurrency
symbols as dynamic symbols, there's effectively a single shared concurrency
state for the whole process. So you don't need to link executable and shared library against a shared concurrency
library to e.g. have a single globalStopSource
instance for the whole process.
If there are multiple versions of the same library in the whole process (duplicate static libs), a potentially surprising pitfall is that module constructors, CRT constructors etc. are still invoked once per containing binary, so multiple times (and operating on the same data). This can be even more surprising if the static libs are compiled differently, e.g., via extra version
s for the static concurrency
lib linked into the shared library, but loading the shared library then invoking the module constructors from the executable (if the ModuleInfo
data symbol is a dynamic symbol in both binaries, or the module constructor function itself). [We've had such a case at Symmetry, so I'm not pulling this out of thin air.]
Common practices
AFAIK, one usually doesn't bother with selective exports via -fvisibility=hidden
, just compiling with default -fvisibility=public
and thus exporting ~everything. @hidden
is handy for symbols that need to be DSO-local (to be resolved inside the same binary only, not 'imported' or unified/preempted), but that's an exceptional use case (LDC's druntime has a few of these).
For D in particular, the stack traces in druntime depend on dynamic symbols - the function names are only resolved if the function is a dynamic symbol [while file+line infos are derived from the DWARF debuginfos]. So using -L--export-dynamic
for linking executables isn't uncommon (default for DMD) to resolve function names from the executable too. The downside is that it prevents the linker from stripping unused symbols - dynamic symbols aren't stripped, and accordingly neither are any non-dynamic symbols that they reference.
Another D-specific aspect is that if a process consists of multiple D binaries, they must share a single shared druntime [compiled with -version=Shared
for some important diffs between static and shared druntime variants]. So if e.g. a D executable comes with plugins support (loading shared D libraries at runtime), the executable needs to be linked with -link-defaultlib-shared
explicitly (-link-defaultlib-shared
is the default when linking a shared library via -shared
), to link against the shared druntime and Phobos libraries [separate for LDC, not a single merged libphobos2.so
as for DMD].
Windows
On Windows, we are back in the stone age. Some limitations/differences:
- Binaries cannot export more than 64K symbols.
- When linking a DLL implicitly (i.e., not loading it manually at runtime and looking up the symbol address via
GetProcAddress()
), you don't link against the .so/dylib directly as on Posix, but have to use a separate 'import library' generated by the linker (mylib.dll
with import librarymylib.lib
). - You can't link a DLL and have some symbols resolved at load-time (to be provided by the loading process). All symbols need to be resolved at link-time.
- The loader doesn't take care of resolving references to symbols exported from other binaries; the compiler needs to do it manually at runtime. Accordingly, no automatic 'unifying' of duplicate exported symbols.
With that ridiculous 64K-symbols limit, it's clear that we cannot default to -fvisibility=public
on Windows, otherwise you wouldn't be able to link any binary with more than 64K symbol definitions. [At Symmetry, we have a fat shared library, which on Linux has more than 600K dynamic symbols; on Windows, we explicitly export a handful of symbols only.] So one needs to either resort to selective export
s (e.g., for plugins with a small number of exported functions only), or use a higher number of smaller shared libraries explicitly compiled with -fvisibility=public
(such as the druntime and Phobos DLLs).
Exports
There's no concept of object-file visibilities in COFF. Instead, what happens is that the compiler embeds linker directives in the object file if a symbol defined in that object file is to be exported (/EXPORT:foo
). AFAIK, you can't override or tweak this at link-time later (as possible on Posix via --export-dynamic…
), so this is all controlled at compile-time already. If there are exported symbols/linker directives, the linker automatically generates an import library for the linked executable/DLL.
Imports
While on Posix there's no explicit importing, on Windows things are totally different - if you want to directly access a symbol defined in another binary, you need to use the import-symbol indirection (symbol foo
needs to be resolved as *__imp_foo
- at runtime, as __imp_foo
is set by the system at startup).
The export
visibility on Windows serves two purposes:
- For the object file defining an
export
ed symbol, it causes the symbol to be dllexported from every binary that object file is linked into. - In other object files referencing that symbol, the symbol is dllimported, unless the object file has been compiled together (in the same compiler invocation) with the object file that exports it. The assumption here is that all of the object files produced in a single compiler invocation are linked together, not ending up in different binaries. E.g., if you compile a static library in a single compiler invocation, and export a symbol explicitly, then all produced object files that don't define the symbol reference it directly without dllimport (so to be resolved inside the same binary at link-time). So you don't have to use a .di header to replace an
export
definition with a declaration - if the module defining the symbol isn't part of the current compilation (not a root module, only D-imported), it's dllimported automatically.
Functions
For functions, the import libraries fortunately contain trampolines (with the original function names). When calling some foo
function exported by another binary, you can link that binary's associated import library, which provides a foo
trampoline, which (presumably) loads __imp_foo
and jumps to that address. So calling/accessing some function in another binary doesn't require any extra handling from the compiler.
Note that the function addresses will diverge across binaries (as &foo
might be a trampoline specific to the current binary), unlike on Posix. [For LDC, I've had to adapt a single druntime unittest, where the function identity/address mattered.] And well, you're going through a trampoline instead of calling the function directly, so this might come with a tiny performance penalty.
Data
Data symbols on the other hand are a problem - trampolines aren't an option because the indirection needs to be loaded at runtime (so we need to run code for that, can't just access some __imp_foo
directly). In essence, the compiler needs to know in advance if a data symbol will be imported from some other binary, and then replace foo
by *__imp_foo
. That's pretty simple in function bodies.
[References to such dllimported data symbols in static data initializers on the other hand are a pain. E.g., if an object file defines a TypeInfo for some struct defined in another DLL, and that TypeInfo.initializer.ptr
needing to be set to the dllimported init symbol. LDC keeps track of such references per object-file and emits a CRT constructor which performs the required 'relocations' manually, at runtime.]
Note that there's no support for exporting/importing TLS symbols at all (in C++ neither). Again, something that just works on Posix. [IIRC, I've only had to adapt a single TLS variable in druntime for now though, using a function returning a ref instead.]
Compared to C++, the situation is trickier for D, as we have a bunch of implicit data symbols, like ModuleInfos, init symbols and way more commonly used (and complicated!) TypeInfos.
Keeping things reasonably simple with -dllimport={none,all,defaultLibsOnly}
The main problem on Windows is that the compiler needs to know in advance if a data symbol will be imported from some other binary. While you could provide the compiler with a fine-grained list of modules/packages that are to be treated as external (ending up in another binary), I've decided to go with a simpler scheme for LDC, focusing on 2 use cases:
- Building every library as its own shared library. For a dub project, this would be building every direct and indirect dependency as its own separate shared library (not really feasible with dub today). Similar to a Linux distro package manager with a central set of shared libraries.
- This is what LDC defaults to with
-shared
, for symmetry with Posix. - Similar to how it just works on Posix: export everything (
-fvisibility=public
), and import all (extern(D)
) data symbols that aren't defined in a compiled root module (-dllimport=all
). No need for a carefully manually craftedexport
library interface. This works best if compiling each library with a single compiler invocation (all modules contained in the shared library), but isn't a requirement [then potentially dllimporting data symbols exported in separately compiled object files, with a linker warning 'importing locally defined symbol' - probably a slight performance penalty]. - And also similar to Posix, there's a single state per library, because each library is present only once in the whole process (no duplicate static libraries with their own separate states).
- With many smaller DLLs, the 64K symbols-limit should be manageable.
- This is what LDC defaults to with
- A process consisting of few larger shared libraries, each with few selective/explicit
export
s only (-fvisibility=hidden
), but automatically importing all data symbols from druntime and Phobos (-dllimport=defaultLibsOnly
- basically treating a module as binary-external if starting withstd.
,core.
orldc.
).- When linking a static library into such a binary, it must have been compiled with matching visibility options (
-fvisibility=hidden -dllimport=defaultLibsOnly
). Somewhat similar to how you have to compile C(++) code ending up in a shared Posix library with-fPIC
.
- When linking a static library into such a binary, it must have been compiled with matching visibility options (
This makes it possible to use shared libraries on Windows quite painlessly, all controlled by the -fvisibility
and -dllimport
compile options, and optionally the D export
visibility + @hidden
UDA.
What isn't supported is, for example, a dub project where some deps are built as shared library (without selective/explicit export
s), and others as static libraries. Say, only using the concurrency
dub dependency as a shared library exporting everything (to have a single process-global state for that library on Windows too), and linking everything else statically. That would require more fine-grained control over binary-external modules, with an according combinatorial explosion (something like -dllimport=std.*,core.*,ldc.*,concurrency.*
).
Templates
Similar to gcc/clang's -fvisibility-inlines-hidden
, you can use LDC's -linkonce-templates
to NOT export any instantiated symbols, so that each binary comes with its own instantiated state and functions.
On Windows, without -linkonce-templates
, there's again the problem of importing instantiated data symbols. Such a symbol can be instantiated and defined (possibly exported) in multiple binaries, plus there's template-codegen-culling mechanism in the frontend. For somewhat predictable behavior, I've chosen to do a sort of 'lightweight' -linkonce-templates
for instantiated data symbols, if the template declaration is in a binary-external module. This means that there's one such instantiated data symbol for each Windows binary that references it. A simplified example: if Phobos declares a template with some counter global, and multiple binaries compiled with -dllimport=defaultLibsOnly
instantiate it identically, they'll all have their own counter globals. Again, on Posix, the loader unifies the instantiated data symbol, everything just works. [More infos: https://github.com/ldc-developers/ldc/issues/3931]
Example: SIL
For a project at Symmetry, we currently have the following architecture, working on both Linux (DMD and LDC) and Windows (LDC only):
- a bunch of thin frontends (executables and shared libraries),
- the core as a single fat shared library, with a handful of explicit
export
ed functions (and something akin to a.di
header as shared-lib interface), implicitly linked against all frontends, and - a bunch of plugins (shared libraries) which can be loaded dynamically at runtime, each with a dozen (or so) explicitly
export
ed functions (resolved viaGetProcAddress
/dlsym
)
On Windows, everything (except for prebuilt druntime and Phobos DLLs) is compiled with -fvisibility=hidden -dllimport=defaultLibsOnly
. All binaries share some base dub dependencies that are all linked statically.
This is an evolution from a prior approach, where we had a smaller core with about 25 plugins, and linked that core statically into every frontend. The static libraries duplication (base dub dependencies) was much worse then, causing a much higher overall bundle size. So we extracted the core as separate shared library and now link most former plugins statically into that core.
Handling non-unified separate states on Windows can be a pain: https://github.com/symmetryinvestments/concurrency/pull/88
The full bundle consists of about 200 dub libraries/executables, so the alternative of building every dub dependency as its own shared library with (on Windows) -fvisibility=public -dllimport=all
doesn't seem too attractive and hasn't been tested yet; it would surely be a huge challenge. :)
My 2 cents
As is hopefully clear by now, it's archaic Windows which complicates matters enormously wrt. shared libraries. My strong opinion on this is that the D language itself shouldn't cater to its limitations - we try to do our best (with reasonable effort) to make things work on Windows too (Rainer Schütze has been working on adopting the LDC scheme to DMD, some things landed already), but the OS is just too primitive to handle all cases without too much Windows-only effort (like adding our own D-specific extra indirection for all symbols to implement a unified state, or wrapping TLS variables with functions - all stuff the compiler could do, but just for a crappy operating system?).