Thread overview
[core.reflect] showcase fqn
Oct 07, 2021
Stefan Koch
Oct 07, 2021
bauss
Oct 07, 2021
Stefan Koch
Oct 07, 2021
bauss
Oct 07, 2021
Stefan Koch
Oct 07, 2021
russhy
Oct 07, 2021
Stefan Koch
October 07, 2021

TLDR; a non-optimized fqn using core.reflect is roughly 4 times faster than the phobos version.

I have had issues with the fullyQualifiedName template in std.phobos for a while.
So I have implemented a version of it for core.reflect.utils which uses the core.reflect transitive parent reflection.

For those how are not interested in the code. First comes the little performance comparison.

First with an ldc optimized -O3 build of the compiler, which is what you should use in a commercial setting:

Benchmark #1: generated/linux/release/64/dmd -version=core_reflect -c fqn_reflect.d
  Time (mean ± σ):      19.6 ms ±   2.9 ms    [User: 14.5 ms, System: 5.3 ms]
  Range (min … max):    12.6 ms …  27.3 ms    142 runs

Benchmark #2: generated/linux/release/64/dmd -version=phobos_fqn -c fqn_reflect.d
  Time (mean ± σ):      73.9 ms ±   2.5 ms    [User: 57.1 ms, System: 16.9 ms]
  Range (min … max):    64.9 ms …  80.7 ms    38 runs

Summary
  'generated/linux/release/64/dmd -version=core_reflect -c fqn_reflect.d' ran
    3.78 ± 0.57 times faster than 'generated/linux/release/64/dmd -version=phobos_fqn -c fqn_reflect.d'

And now with a dmd debug build of the compiler that I use for faster iteration when working on compiler features.

Benchmark #1: generated/linux/release/64/dmd -version=core_reflect -c fqn_reflect.d
  Time (mean ± σ):      30.3 ms ±   2.4 ms    [User: 25.5 ms, System: 4.7 ms]
  Range (min … max):    22.7 ms …  40.9 ms    98 runs

Benchmark #2: generated/linux/release/64/dmd -version=phobos_fqn -c fqn_reflect.d
  Time (mean ± σ):     120.2 ms ±   2.4 ms    [User: 104.9 ms, System: 15.1 ms]
  Range (min … max):   116.9 ms … 125.9 ms    24 runs

Summary
  'generated/linux/release/64/dmd -version=core_reflect -c fqn_reflect.d' ran
    3.97 ± 0.32 times faster than 'generated/linux/release/64/dmd -version=phobos_fqn -c fqn_reflect.d'

Now comes the source of fqn_reflect.d unedited this time to avoid typos.

module reflect.showcases.nicer.java.like.package_.structure.fqn_reflect;

struct U
{
    struct V
    {
        struct W{
            class C
            { int x;  }
        }
    }
}

version (core_reflect)
{
        import core.reflect.utils;
        static assert(fqn!(U.V.W.C) == "reflect.showcases.nicer.java.like.package_.structure.fqn_reflect.U.V.W.C");
}
version (phobos_fqn)
{
    import std.traits;
    static assert(fullyQualifiedName!(U.V.W.C) == "reflect.showcases.nicer.java.like.package_.structure.fqn_reflect.U.V.W.C");
}

What about memory usage?

I am glad you asked. Memory usage is around 3 times lower.

Cheers,

Stefan

October 07, 2021

On Thursday, 7 October 2021 at 10:36:42 UTC, Stefan Koch wrote:

>

TLDR; a non-optimized fqn using core.reflect is roughly 4 times faster than the phobos version.

I have had issues with the fullyQualifiedName template in std.phobos for a while.
So I have implemented a version of it for core.reflect.utils which uses the core.reflect transitive parent reflection.

For those how are not interested in the code. First comes the little performance comparison.

First with an ldc optimized -O3 build of the compiler, which is what you should use in a commercial setting:

Benchmark #1: generated/linux/release/64/dmd -version=core_reflect -c fqn_reflect.d
  Time (mean ± σ):      19.6 ms ±   2.9 ms    [User: 14.5 ms, System: 5.3 ms]
  Range (min … max):    12.6 ms …  27.3 ms    142 runs

Benchmark #2: generated/linux/release/64/dmd -version=phobos_fqn -c fqn_reflect.d
  Time (mean ± σ):      73.9 ms ±   2.5 ms    [User: 57.1 ms, System: 16.9 ms]
  Range (min … max):    64.9 ms …  80.7 ms    38 runs

Summary
  'generated/linux/release/64/dmd -version=core_reflect -c fqn_reflect.d' ran
    3.78 ± 0.57 times faster than 'generated/linux/release/64/dmd -version=phobos_fqn -c fqn_reflect.d'

And now with a dmd debug build of the compiler that I use for faster iteration when working on compiler features.

Benchmark #1: generated/linux/release/64/dmd -version=core_reflect -c fqn_reflect.d
  Time (mean ± σ):      30.3 ms ±   2.4 ms    [User: 25.5 ms, System: 4.7 ms]
  Range (min … max):    22.7 ms …  40.9 ms    98 runs

Benchmark #2: generated/linux/release/64/dmd -version=phobos_fqn -c fqn_reflect.d
  Time (mean ± σ):     120.2 ms ±   2.4 ms    [User: 104.9 ms, System: 15.1 ms]
  Range (min … max):   116.9 ms … 125.9 ms    24 runs

Summary
  'generated/linux/release/64/dmd -version=core_reflect -c fqn_reflect.d' ran
    3.97 ± 0.32 times faster than 'generated/linux/release/64/dmd -version=phobos_fqn -c fqn_reflect.d'

Now comes the source of fqn_reflect.d unedited this time to avoid typos.

module reflect.showcases.nicer.java.like.package_.structure.fqn_reflect;

struct U
{
    struct V
    {
        struct W{
            class C
            { int x;  }
        }
    }
}

version (core_reflect)
{
        import core.reflect.utils;
        static assert(fqn!(U.V.W.C) == "reflect.showcases.nicer.java.like.package_.structure.fqn_reflect.U.V.W.C");
}
version (phobos_fqn)
{
    import std.traits;
    static assert(fullyQualifiedName!(U.V.W.C) == "reflect.showcases.nicer.java.like.package_.structure.fqn_reflect.U.V.W.C");
}

What about memory usage?

I am glad you asked. Memory usage is around 3 times lower.

Cheers,

Stefan

Why does it have to be abbreviated like fqn tho, instead of also just being named fullyQualifiedName? It's one of the things I dislike the most about C and I don't want it to infect D.

Otherwise after a while of using a couple of functions that are abbreviated you mix them up and forget which is which etc.

D mostly uses non-abbreviated names for functions etc. and I think it should stay that way. There are only some minor exceptions like "writeln" where line is abbreviated, but it's not such a big deal, since it's obvious what ln means.

Someone who has never seen the function "fqn" or even heard the word "fullyQualifiedName" will not know what it means and would have to look it up. It's not clear, even from context what it actually does/returns.

October 07, 2021

On Thursday, 7 October 2021 at 10:43:00 UTC, bauss wrote:

>

Why does it have to be abbreviated like fqn tho, instead of also just being named fullyQualifiedName? It's one of the things I dislike the most about C and I don't want it to infect D.

It doesn't have to be.
it's just that I tend to mistype longer words and prefer shorter abriviations.
There is no need for the name fqn to be that.
FullyQualifiedName would also work just fine.

October 07, 2021

On Thursday, 7 October 2021 at 10:54:13 UTC, Stefan Koch wrote:

>

On Thursday, 7 October 2021 at 10:43:00 UTC, bauss wrote:

>

Why does it have to be abbreviated like fqn tho, instead of also just being named fullyQualifiedName? It's one of the things I dislike the most about C and I don't want it to infect D.

It doesn't have to be.
it's just that I tend to mistype longer words and prefer shorter abriviations.
There is no need for the name fqn to be that.
FullyQualifiedName would also work just fine.

Thanks! I really like the previews of core.reflection so far tho. It'll be so much better than using traits.

October 07, 2021

On Thursday, 7 October 2021 at 10:36:42 UTC, Stefan Koch wrote:

>

TLDR; a non-optimized fqn using core.reflect is roughly 4 times faster than the phobos version.

I went ahead and did a test on a somewhat bigger (auto-generated) testcase.

TLDR: On bigger testcases where the constant overhead is less of a factor, core.reflect is roughly 11.5 times faster

To see if the the initial overhead the phobos version has would make it perform better once it's run multiple times.

Here are the results

uplink@uplink-black:~/d/dmd(core_reflect)$ hyperfine "generated/linux/release/64/dmd nested_structs.di -o- -version=phobos_fqn" "generated/linux/release/64/dmd nested_structs.di -o- -version=core_reflect" "generated/linux/release/64/dmd nested_structs.di -o- -version=no_fqn"
Benchmark #1: generated/linux/release/64/dmd nested_structs.di -o- -version=phobos_fqn
  Time (mean ± σ):      6.307 s ±  0.155 s    [User: 5.596 s, System: 0.706 s]
  Range (min … max):    6.138 s …  6.637 s    10 runs

Benchmark #2: generated/linux/release/64/dmd nested_structs.di -o- -version=core_reflect
  Time (mean ± σ):     792.2 ms ±  14.7 ms    [User: 716.4 ms, System: 75.3 ms]
  Range (min … max):   777.0 ms … 827.3 ms    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #3: generated/linux/release/64/dmd nested_structs.di -o- -version=no_fqn
  Time (mean ± σ):     316.2 ms ±   5.7 ms    [User: 257.9 ms, System: 58.2 ms]
  Range (min … max):   311.5 ms … 329.0 ms    10 runs


Summary
  'generated/linux/release/64/dmd nested_structs.di -o- -version=no_fqn' ran
    2.51 ± 0.07 times faster than 'generated/linux/release/64/dmd nested_structs.di -o- -version=core_reflect'
   19.95 ± 0.61 times faster than 'generated/linux/release/64/dmd nested_structs.di -o- -version=phobos_fqn'

Note that the -version=no_fqn doesn't do any fqn computation and merely parses the nested structs.
That is such that you can get an idea of the constant overhead which does not go away.

If we factor that in we end up with the following numbers.
baseline: no_fqn_min = 311 ms -- that's the time to parse and semantically the file essentially.

core_reflect_max = 827.3 ms
phobos_fqn_min = 6307 ms

to adjust for the overhead we now subtract 311 from both values and get

core_reflect_self = 516.3
phobos_fqn_self = 6392
real_speedup = 6392 / 516.3 = ~12

Which shows that in reality core.reflect is 12 times faster.

Cheers,
Stefan

P.S. tests on even larger test-cases suggest that the real speedup drops down to 11.5

In order for you to be able to verify at least the phobos_fqn and the no_fqn timings, I have published my testcase in the following gist:
https://gist.github.com/UplinkCoder/5acd25168238cac179a5c4ffdf945187

Memory use is 10 times less in the absolute measurement
And 30 times less if corrected for the no_fqn version as baseline

October 07, 2021

Impressive results!

I wasn't sold initially, but that was because i didn't know what it was exactly, this totally is game changer

__traits is nice for simple stuff, but as soon as you try to do more complex logic it starts to become unmanageable, and i never remember the exact syntax, hopefully core.reflect will solve that, and so far looks like it's already set to replace __traits fully

October 07, 2021

On Thursday, 7 October 2021 at 22:42:06 UTC, russhy wrote:

>

Impressive results!

I wasn't sold initially, but that was because i didn't know what it was exactly, this totally is game changer

__traits is nice for simple stuff, but as soon as you try to do more complex logic it starts to become unmanageable, and i never remember the exact syntax, hopefully core.reflect will solve that, and so far looks like it's already set to replace __traits fully

Thanks. It is good to hear that.

I just did a little work on the fqn utility.
This is the code which makes it print template instances should a template instance happen to be a parent.


    TemplateInstance ti = cast(TemplateInstance)lastParentDecl.getParent();
    if (ti)
    {
        string argString;
        foreach(arg;ti.arguments)
        {
            if (auto il = cast(IntegerLiteral) arg)
            {
                import std.conv : to;
                argString ~= to!string(il.value);
                argString ~= ", ";
            } else if (auto sl = cast(StringLiteral) arg)
            {
                import std.conv : to;
                argString ~= `"` ~ sl.value ~ `"`;
                argString ~= ", ";
             }
        }
        result = ti.name ~ "!(" ~ argString[0 .. $-2]~ ")" ~ result[ti.name.length .. $];
        lastParentDecl = ti;
        d = cast(Declaration)ti.parent;
        if (d) goto Ldecl;
    }

So if the template argument is not a string or an integer this fqn function, (and it is an actual function)
would not be able to print it correctly.
However it is easy to add because it's just regular user-level code.