July 15, 2017
On Saturday, July 15, 2017 11:10:32 Enamex via Digitalmars-d wrote:
> - What type information are being kept because of UFCS chains?
> Doesn't that mechanism simply apply overload resolution then
> choose between the prefix and .method forms as appropriate,
> rewriting the terms?
>      Then it's a problem of function invocation. I don't get
> what's happening here still. Does this tie to the Voldemort types
> problem? (=> are many of the functions in the chain returning
> custom types?) It doesn't make sense, especially if, from your
> indirection workaround, it looks like it would work around the
> same but without the bloat problem if we unlinked the chain into
> many intermediate temporary parts. So how is this a type
> information issue?

UFCS is irrelevant to all of this. That just so happens to be how he's calling the functions. You'd get the same issue if you called all of the functions in a chain with the traditional function call syntax rather than treating them as a bunch of member functions. The issue has to do with how each invocation of a range-based function tends to result in a new template instantiation, and it's common practice in D to chain a bunch of templated function calls together. For instance, if you have

    int[] a;
    auto b = a.map!(a => a / 2)();
    pragma(msg, typeof(b));

then it prints out

    MapResult!(__lambda1, int[])

If you have

    int[] a;
    auto b = a.map!(a => a / 2)().map!(a => a)();
    pragma(msg, typeof(b));

then it prints out

    MapResult!(__lambda2, MapResult!(__lambda1, int[]))

If you have

    int[] a;
    auto b = a.map!(a => a / 2)().map!(a => a)().filter!(a => a < 7)();
    pragma(msg, typeof(b));

then it prints out

    FilterResult!(__lambda3, MapResult!(__lambda2, MapResult!(__lambda1,
int[])))

The type is getting progressively longer and more complicated, because when the function template is instantiated, it's instantiated with the type from the previous function call, so it's wrapping the previous type, and you get a new type that has the name of the type of its argument embedded in it. It's like if you keep wrapping a container inside another container.

Array!int a;
Array!(Array!int) b;
Array!(Array!(Array!int)) c;

The first level or two may not be so bad, but pretty quickly, it gets quite ridiculous. The fact that we use auto and/or the name of the template parameter in range-based code completely hides the fact that the type we're operating on has a very long and hideous name. So, for the most part, you don't care, but you do get ridiculously long symbol names underneath the hood, and the compiler and linker have to deal with them, and it becomes a performance problem. And as bad as this example is, map doesn't actually use Voldemort types. MapResult is declared outside of map as a private struct. The extra information that gets encoded in the symbol name with a Voldemort type makes them _much_ worse, though interestingly enough, the compiler doesn't print all of that out. e.g.

    int[][] a;
    auto b = a.joiner([5]);
    pragma(msg, typeof(b));

just prints

Result

which doesn't look bad even though things are apparently much uglier underneath the hood.

But the fact that working with ranges results in a whole bunch of chained invocations of function templates tends to result in a much larger symbol sizes than you'd get with code that wasn't so generic. And the fact that Voldemort types have been pushed as a great way to encapsulate the range types that result from function calls and really shouldn't be used by name makes the situation that much worse. So, idiomatic D code currently results in very large symbol names, and while in many cases, you don't notice, some people (like H.S. Teoh) _are_ noticing, and it's becoming a problem. As I understand it, Weka has quite a few problems with this. Fortunately, a fix to condense symbol names should be along shortly, which will help the situation considerably, but in general, we really need to look at how we can make some of these common D idioms do better in terms of symbol size.

- Jonathan M Davis

July 15, 2017
On Saturday, 15 July 2017 at 15:58:12 UTC, Jonathan M Davis wrote:
> On Saturday, July 15, 2017 11:10:32 Enamex via Digitalmars-d wrote:
>> [...]
> The issue has to do with how each invocation of a range-based function tends to result in a new template instantiation, and it's common practice in D to chain a bunch of templated function calls together. For instance, if you have
>
> [...]
>
> The type is getting progressively longer and more complicated, because when the function template is instantiated, it's instantiated with the type from the previous function call, so it's wrapping the previous type, and you get a new type that has the name of the type of its argument embedded in it. It's like if you keep wrapping a container inside another container.
>
> [...]
>
> - Jonathan M Davis

This was quite insightful, thank you.

All that time I'd assumed that 'symbols' as linkers used them were constant length :T

Just to be clear: 100% of that bloat resides in the symbol table? So the current proposed remedy is to hash symbols above a length threshold?

Besides the symbol problem though, does the template instantiation explosion problem imply as many duplicated function bodies corresponding to the new type symbols?
July 17, 2017
On Saturday, 15 July 2017 at 15:58:12 UTC, Jonathan M Davis wrote:
>
>     int[] a;
>     auto b = a.map!(a => a / 2)();
>     pragma(msg, typeof(b));
>
> then it prints out
>
>     MapResult!(__lambda1, int[])
>
> If you have
>
>     int[] a;
>     auto b = a.map!(a => a / 2)().map!(a => a)();
>     pragma(msg, typeof(b));
>
> then it prints out
>
>     MapResult!(__lambda2, MapResult!(__lambda1, int[]))
>
> If you have
>
>     int[] a;
>     auto b = a.map!(a => a / 2)().map!(a => a)().filter!(a => a < 7)();
>     pragma(msg, typeof(b));
>
> then it prints out
>
>     FilterResult!(__lambda3, MapResult!(__lambda2, MapResult!(__lambda1,
> int[])))
>

Is there any way - theoretically - to compute typeof(b) lazily, so that the information is only provided as needed?
July 17, 2017
On Monday, 17 July 2017 at 12:51:37 UTC, jmh530 wrote:
> On Saturday, 15 July 2017 at 15:58:12 UTC, Jonathan M Davis wrote:
>>
>>     int[] a;
>>     auto b = a.map!(a => a / 2)();
>>     pragma(msg, typeof(b));
>>
>> then it prints out
>>
>>     MapResult!(__lambda1, int[])
>>
>> If you have
>>
>>     int[] a;
>>     auto b = a.map!(a => a / 2)().map!(a => a)();
>>     pragma(msg, typeof(b));
>>
>> then it prints out
>>
>>     MapResult!(__lambda2, MapResult!(__lambda1, int[]))
>>
>> If you have
>>
>>     int[] a;
>>     auto b = a.map!(a => a / 2)().map!(a => a)().filter!(a => a < 7)();
>>     pragma(msg, typeof(b));
>>
>> then it prints out
>>
>>     FilterResult!(__lambda3, MapResult!(__lambda2, MapResult!(__lambda1,
>> int[])))
>>
>
> Is there any way - theoretically - to compute typeof(b) lazily, so that the information is only provided as needed?

typeof(b) is needed as soon as b is declared.
So no.
There is no point in making it lazy.
July 17, 2017
On 7/14/17 7:30 PM, H. S. Teoh via Digitalmars-d wrote:
> On Fri, Jul 14, 2017 at 03:45:44PM -0700, H. S. Teoh via Digitalmars-d wrote:
> [...]
>> Here's a further update to the saga of combating ridiculously large
>> symbol sizes.
> [...]
>> 	     .wrapWithInterface	// <--- type erasure happens here
> [...]
> 
> Some further thoughts about type erasure and UFCS chains.
> 
> Nobody says type erasure *requires* OO -- that's just an artifact of the
> way things are currently implemented.  Consider, for example, your
> generic UFCS chain:
> 
> 	auto r = source
> 		.ufcsFunc1
> 		.ufcsFunc2
> 		.ufcsFunc3
> 		...
> 		.ufcsFuncn;
> 
>>From the POV of outside code, *nobody cares* what the specific types of
> each stage of the UFCS pipeline are.  Only the code that implements
> ufcsFunc1, ufcsFunc2, etc., need to know.  Furthermore, suppose X is the
> specific type of the range that's returned by ufcsFunc3, in the middle
> of the pipeline.  What are the chances this exact type is going to be
> reused again later?  Very slim.  And if ufcsFunc2, say, takes an alias
> parameter that's instantiated with a lambda function, you can basically
> guarantee that this type X will *never* ever be repeated again, outside
> of this specific UFCS chain.

I think this is slightly flawed thinking.

Types can certainly be repeated. Even long chains of UFCS functions may be repeated elsewhere. Therefore, you need to have these characteristics:

1. The compiler can generate the same symbol given the same compile-time parameters. Separate compilation dictates this is a necessity.
2. This CANNOT depend on position inside the module (i.e. no line numbers or global counters). It would be too surprising for it to be a linker error to reorder functions in a module.
3. The mangled symbol should be reversable back to the original symbol name.
4. The symbol should make sense to the viewer.

Note that 3 and 4 are more "nice to have" than "essential", as you can certainly compile, link, and run without ever having to print a symbol name.

Thinking about this, I've had a couple interesting thoughts. First, when this really matters (i.e. you get really long symbol names) is symbols defined inside template functions. I don't need to rehash this, as you all know this.

But the definitions also depend on the runtime parameters, which ironically are usually a repeat of the template parameters. Hence the huge bloat.

But what if we forgo that requirement of using the runtime parameters? That is:

// prototypical UFCS wrapper
auto foo(Args...)(Args args)
{
  static struct Result { ... }
  return Result(args);
}

Instead of Result being typed as (e.g.):
  foo!(int, string, bool)(int, string, bool).Result

it really is
  foo!(int, string, bool).Result

What happens? In essence, this is the "horcrux" solution that I came up with. However, there is one case where it doesn't work. And that is, when you have foo overloaded only on runtime parameters:

auto foo(T)(T t) { struct Result { ... } ... }
auto foo(T)(T t, string s) { struct Result { ... } ... }

If we define Result the way I suggest, then both have the same name (foo!(int).Result) even though they are different structs.

If you use horcrux on this right now, it actually is a bug, as only the first Result definition is considered (https://issues.dlang.org/show_bug.cgi?id=17653)

But I would propose, that we just make this an error. That is, you just have to rename the second one Result2, or something else.

1. This would break some code. Not much, but some rare examples. Most UFCS code has one definition of a function, or multiple definitions, but with different template parameters.

2. It would solve the exponential problem, as now only the template definition is considered, so the growth of the symbol is linear.

3. The savings from Rainer's patch still can be additive.

Thoughts?

-Steve
July 17, 2017
On Sat, Jul 15, 2017 at 07:11:54PM +0000, Enamex via Digitalmars-d wrote: [...]
> All that time I'd assumed that 'symbols' as linkers used them were constant length :T

They may have been once upon a time, many decades ago. :-D  But modern linkers support symbols of arbitrary length (well, up to a limit :-D) because of the demands of modern languages that generate long symbols as part of implementing function overloading, etc..


> Just to be clear: 100% of that bloat resides in the symbol table?

Pretty much, yes.  Unless your program happens to want to print a string constructed from the type name (very unlikely unless it's purely for debugging purposes).


> So the current proposed remedy is to hash symbols above a length threshold?

Rainers' PR (which is now available as a dmd/druntime branch, btw) changes the mangling scheme so that when a symbol contains substrings that are repeated, the repeated instances are replaced with an encoded back-reference. When substrings are long and/or repeated frequently, this leads to a significant reduction in symbol size.

In fact, I just tested the `mangle` branch this morning on my code, and it shows major improvements: my executable sizes are now down to 4MB, about half the size of what I could achieve by type erasure via OO polymorphism.  The longest symbol lengths are now about 1000 characters or so, as opposed to 20KB (or, before I applied the horcrux trick, 800KB).  Compared to the original 600MB executable sizes before I started tackling this issue, this represents a 99% improvement(!).

Now Rainers' changes are just waiting for various dmd testsuite issues to be ironed out, and it should be available in git master soon. Hooray!


> Besides the symbol problem though, does the template instantiation explosion problem imply as many duplicated function bodies corresponding to the new type symbols?

That's a different, though somewhat related, issue.  I have some ideas on that front, but for now, Rainers' PR addresses only the problem of symbol length.  So the next task after his fix is merged is to tackle the larger problem of how to improve the implementation of templates.

One of the major issues with heavily-templated code is that you could potentially end up with many duplicated function bodies that may in fact be binary-identical, but are stored separately in the executable because from the language's POV, they are distinct functions.  (Well, there's also the problem of separate compilation, where you could end up with the *same* function instantiated multiple times in different object files, but for the most part, LTO (link-time optimization) takes care of that: the compiler writes out the duplicated instantiations as "weak symbols", so the linker discards all but the first copy.)

Often, this results from templates that contain helper code that's independent of the template parameters, or dependent only on a subset of template parameters. For example:

	struct S(T, size_t size) {
		T method1() {
			// N.B.: independent of `size`
			return T.init;
		}
		int[size] method2() {
			// N.B.: independent of `T`
			return int[size].init;
		}
		T[size] method3() {
			// N.B.: depends on both `T` and `size`
			return T[size].init;
		}
	}

	S!(int, 5) si5;
	S!(int, 6) si6;
	S!(string, 2) ss2;
	S!(string, 4) ss4;

Here, si5.method1 and si6.method1 are identical, because method1 is independent of `size`.  However, because they are part of the template S, they have distinct names: `S!(int, 5).method1` and `S!(int, 6).method1`, so they will be duplicated in the generated code.  Ideally, the compiler should detect that they are actually identical, and merge them accordingly.

A similar situation occurs with method2. For method3, since it depends on both template parameters, two versions of the function must be generated, obviously.

Of course, this example is contrived, and rather trivial.  A less trivial example shows that there's more to this problem than just detecting which template parameters a method depends on:

	struct List(T) {
		struct Node {
			Node* next;
			T t;
		}
		Node* removeNext(Node* prev) {
			Node* tmp = prev.next;
			prev.next = tmp.next;
			tmp.next = null;
			return tmp;
		}
		...
	}

Here, removeNext takes Node* as parameter, and since the definition of Node depends on T, it would appear that removeNext also depends on T. However, closer inspection shows that actually it never touches the portion of Node that is dependent on T.  Therefore, removeNext will actually be identical across *all* instantiations of List.  This case is not as easy for the compiler to detect, because the definition of Node depends on T, and it could be the case that Node is defined in a way that *will* require removeNext to also be dependent on T. For example, if we defined Node as:

	struct Node {
		T t;
		Node* next;
	}

then the offset of Node.next will change depending on the size of T, so it would not be possible to merge removeNext across different instantiations of List.

The usual workaround is to use OO to manually perform the merging of removeNext yourself:

	class NodeBase {
		NodeBase next;
	}

	// N.B.: module-global non-template function.
	NodeBase removeNext(NodeBase prev) {
		... // implementation here
	}

	struct List(T) {
		// N.B.: use inheritance to separate the T-dependent
		// part of Node from the T-independent part.
		class Node : NodeBase {
			T t;
		}

		// Ugly: need a wrapper function to forward calls to
		// List.removeNext to the module-global version
		Node removeNext(Node prev) {
			// Ugly: need to downcast to get Node back
			// rather than NodeBase
			return cast(Node) .removeNext(prev);
		}
	}

But this adds a lot of boilerplate and relies on the user to manually do what, arguably, the compiler ought to be able to automate. Not to mention the ugly points noted in the comments above. Of course, some of the ugliness can be alleviated by also refactoring List(T) to have a non-template base class that implements a removeNext method, or using a `this` template parameter, etc., but the point is that this is tedious boilerplate that would be nice for the compiler to automatically do on our behalf.


T

-- 
If lightning were to ever strike an orchestra, it'd always hit the conductor first.
July 18, 2017
On Monday, 17 July 2017 at 18:42:36 UTC, H. S. Teoh wrote:
>> Besides the symbol problem though, does the template instantiation explosion problem imply as many duplicated function bodies corresponding to the new type symbols?
>
> That's a different, though somewhat related, issue.  I have some ideas on that front, but for now, Rainers' PR addresses only the problem of symbol length.  So the next task after his fix is merged is to tackle the larger problem of how to improve the implementation of templates.
>
> [...]
>
> Often, this results from templates that contain helper code that's independent of the template parameters, or dependent only on a subset of template parameters. For example:

Yeah, I think this is the second best optimization for symbol names after using references for duplicates symbol subnames.

One thing you didn't mention is that this also creates a lot of useless duplicate for Voldemort types in template functions that don't depend on (all of) the functions parameters. For instance, looking at the code from the OP's issue:

    auto s(T)(T t)
    {
        struct Result
        {
            void foo(){ throw new Exception("2");}
        }
        return Result();
    }

    void main(string[] args)
    {
        auto x = 1.s.s.s.s.s;
        x.foo;
    }

foo mangles to a super long symbol, even though the actual symbol should be something like

    moduleName.s!(__any_type)(__any_type).Result.foo

Actually, it's a little more complicated than that, because s might have overloads the compiler is not aware of, which means it doesn't trivially know that T doesn't matter to result.

Still, pretty sure there are optimizations to be had there.

One possibility would be to allow the coder to give unique subnames to functions. For instance

    int add.withInts(int n1, int n2)
    float add.withFloats(float n1, float n2)

This would allow the compiler to assume that a function's mangling is unique, without having to worry about overloads and other templates, and therefore skip the unnecessary parameters the function's Voldemort type.
1 2
Next ›   Last »