May 07, 2023
https://issues.dlang.org/show_bug.cgi?id=23903

          Issue ID: 23903
           Summary: Demangling produces exponentially long output
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Keywords: mangling
          Severity: major
          Priority: P1
         Component: druntime
          Assignee: nobody@puremagic.com
          Reporter: dlang-bugzilla@thecybershadow.net

Consider the following D program:

///////////////// test.d ////////////////
auto fun(T)(T t = T.init)
{
    struct S { T t; }
    return S(t);
}

import core.demangle;

alias f1 = fun!int;
pragma(msg, f1.mangleof.demangle.length);
alias f2 = fun!(typeof(f1()));
pragma(msg, f2.mangleof.demangle.length);
alias f3 = fun!(typeof(f2()));
pragma(msg, f3.mangleof.demangle.length);
alias f4 = fun!(typeof(f3()));
pragma(msg, f4.mangleof.demangle.length);
alias f5 = fun!(typeof(f4()));
pragma(msg, f5.mangleof.demangle.length);
/////////////////////////////////////////

Note that although the amount of code grows linearly, the length of the demangled string grows exponentially.

Some problems which results from this:

1. When the program crashes, it is very slow to print its stack trace (easily in the order of minutes). This makes debugging difficult.

2. Debuggers (at least, GDB) will run out of memory and crash when attempting to debug a program using such constructs. This also makes debugging difficult.

Chaining IFTI functions and voldemort types, which I think is idiomatic D code, is one way for this problem to be manifested. One practical use case where I have constructed such a program by accident is a functor-based text formatting package (which encapsulates values to be formatted into self-contained objects and allows building simple expression trees of them without memory allocations).

However, it is not difficult to run into it in other ways. The underlying problem is that the relationship between language constructs is a DAG; however, the demangled string version is essentially a walk of the DAG through every possible path, which is exponential in complexity. Here is a more explicit example which illustrates this:

//////////////// test.d ////////////////
struct T(alias A, alias B) {}

struct T0 {}

alias T1 = T!(T0, T0);
alias T2 = T!(T1, T1);
alias T3 = T!(T2, T2);
alias T4 = T!(T3, T3);
alias T5 = T!(T4, T4);
alias T6 = T!(T5, T5);
alias T7 = T!(T6, T6);
alias T8 = T!(T7, T7);
alias T9 = T!(T8, T8);

pragma(msg, T9.mangleof.length); // 174
pragma(msg, T9.stringof.length); // 4090
////////////////////////////////////////

The mangled version of the symbols does not have this problem, and grows linearly; it seems to be using backreferences to avoid this problem there.

Ignoring the DAG nature of the relationships of the mangled objects is a mis-design of the demangling algorithm, and is fundamentally incorrect! Not only does it cause issues such as the above, it makes debugging non-trivial template code difficult due to the verbosity of the output, and it allows the equivalent of ZIP-bombs (https://en.wikipedia.org/wiki/Zip_bomb).

To fix this problem, the demangling algorithm (in Druntime and in debuggers) should not expand long back-references, emitting the DAG as it is. For example, instead of:

pure nothrow @nogc @safe
test.fun!(test.fun!(int).fun(int).S).fun(test.fun!(int).fun(int).S).S
test.fun!(test.fun!(int).fun(int).S).fun(test.fun!(int).fun(int).S)

it could write e.g.:

pure nothrow @nogc @safe test.fun!(T1).fun(__1).S test.fun!(__1).fun(T1);
__1=test.fun!(int).fun(int).S

--