newCTFE gets a 10x faster string concat

September 23, 2021

Posted by Stefan Koch

Permalink

Stefan Koch

Permalink

Hi there,

in preparation for my little talk/demo of newCTFE I have worked on a few things to make it less embarrassing.
Consider the following code:

string makeBigString(int N)
{
    string x = "this is the string I want to append\n";
    string result = "";
    foreach(_; 0 .. N)
    {
        result ~= x;
    }
    return result;
}

pragma(msg, makeBigString(short.max / 4).length);

An hour ago this would had this embarrassing outcome:

Benchmark #1: generated/linux/release/64/dmd -c testStringConcat.d  -new-ctfe
  Time (mean ± σ):     831.7 ms ±  29.3 ms    [User: 320.9 ms, System: 509.5 ms]
  Range (min … max):   805.9 ms … 880.9 ms    10 runs

Benchmark #2: generated/linux/release/64/dmd -c testStringConcat.d
  Time (mean ± σ):     378.2 ms ±  12.1 ms    [User: 102.6 ms, System: 274.9 ms]
  Range (min … max):   366.7 ms … 400.0 ms    10 runs

Summary
  'generated/linux/release/64/dmd -c testStringConcat.d' ran
    2.20 ± 0.10 times faster than 'generated/linux/release/64/dmd -c testStringConcat.d  -new-ctfe'

With new CTFE being twice as slow.
And if you had written

pragma(msg, makeBigString(short.max).length);

you would have gotten something even more embarrassing:

core.exception.AssertError@src/dmd/ctfe/bc.d(3675): !!! HEAP OVERFLOW !!!

I have fixed that now.
As of a few moments ago the results look different though.

for

pragma(msg, makeBigString(short.max/4).length);

you now get:

Benchmark #1: generated/linux/release/64/dmd -c testStringConcat.d  -new-ctfe
  Time (mean ± σ):      55.3 ms ±   2.7 ms    [User: 40.4 ms, System: 14.7 ms]
  Range (min … max):    48.2 ms …  63.8 ms    50 runs

Benchmark #2: generated/linux/release/64/dmd -c testStringConcat.d
  Time (mean ± σ):     387.6 ms ±  16.6 ms    [User: 112.0 ms, System: 274.6 ms]
  Range (min … max):   372.5 ms … 420.9 ms    10 runs

Summary
  'generated/linux/release/64/dmd -c testStringConcat.d  -new-ctfe' ran
    7.01 ± 0.45 times faster than 'generated/linux/release/64/dmd -c testStringConcat.d'

and for for pragma(msg, makeBigString(short.max).length);

Benchmark #1: generated/linux/release/64/dmd -c testStringConcat.d  -new-ctfe
  Time (mean ± σ):     498.6 ms ±  16.0 ms    [User: 209.3 ms, System: 287.7 ms]
  Range (min … max):   481.8 ms … 523.6 ms    10 runs

Benchmark #2: generated/linux/release/64/dmd -c testStringConcat.d
  Time (mean ± σ):      5.094 s ±  0.130 s    [User: 995.8 ms, System: 4086.8 ms]
  Range (min … max):    4.909 s …  5.270 s    10 runs

Summary
  'generated/linux/release/64/dmd -c testStringConcat.d  -new-ctfe' ran
   10.22 ± 0.42 times faster than 'generated/linux/release/64/dmd -c testStringConcat.d'

Which is the 10x faster that I was talking about.

If you want to know how I was able to speed it up attend my demonstration at beerconf on Saturday.

P.S.
In terms of memory use we are looking at 1.3 GB for newCTFE and
18.1GB for "oldCTFE".

which is roughly a 13x difference.

Cheers,
Stefan

On Thursday, 23 September 2021 at 13:01:33 UTC, Stefan Koch wrote:

Hi there,
[ ... 10x difference bla bla ...]

Of course it is possible by varying the test-cases to get an almost arbitrary speedup.

Benchmark #1: generated/linux/release/64/dmd -c testStringConcat.d -new-ctfe
  Time (mean ± σ):     160.3 ms ±   2.8 ms    [User: 121.6 ms, System: 38.4 ms]
  Range (min … max):   154.1 ms … 164.9 ms    18 runs

Benchmark #2: generated/linux/release/64/dmd -c testStringConcat.d
  Time (mean ± σ):      6.538 s ±  0.105 s    [User: 3.253 s, System: 3.276 s]
  Range (min … max):    6.450 s …  6.768 s    10 runs

Summary
  'generated/linux/release/64/dmd -c testStringConcat.d -new-ctfe' ran
   40.79 ± 0.96 times faster than 'generated/linux/release/64/dmd -c testStringConcat.d'

The highest I have been able to get it a 50x ... after that the old interpreter will run out of memory and freeze my computer
The code for the benchmark below is:

string makeBigString(int N)
{
    string x = "this is the string I want to append\n";
    string result = "";
    foreach(_; 0 .. N)
    {
        result ~= x;
    }
    return result;
}

// pragma(msg, makeBigString(cast(uint)(short.max * 1.91)).length);
// max for newCTFE we run out of 32 address space after this
// commented out because without newCTFE we just crash

int[] crappyIota(int N)
{
    int[] result = [];
    foreach(i; 0 .. N)
    {
        result ~= i;
    }
    return result;
}

pragma(msg, crappyIota(short.max).length + crappyIota(short.max)[$-1]);
pragma(msg, makeBigString(cast(uint)(short.max / 4)).length);
pragma(msg, makeBigString(cast(uint)(short.max / 2)).length);

As you can see makeBigString(cast(uint)(short.max * 1.91)).length)
is the most I can test at all since the newCTFE VM uses a 31bit bit heap address space.
as half of the space is reserved for the stack.
I am meaning to change the 2GB/2GB split to a 3.498 GB / 0.512 GB split
but I haven't done that yet.

For the example above newCTFE uses 60 times less memory than the current interpreter.

Forums