Good Morning,
I just saw the need to concatenate a long-ish string at ctfe.
TLDR; writing your own specialized chunking appender is way faster than naive CTFE
using the phobos provided appender seems to be a bad idea ;) at least at CTFE.
As you may know when you try to so this the naive way:
enum x = () { string result;
foreach(_; 0 ..n_iterations)
result ~= "whatever"
return result; } ()
it scales quite poorly as n_iterations
grows.
and eventually you'll run out of memory and time.
it turns out ctfe really doesn't like appending because it doesn't preallocate buffers or keep capacity around.
So what if we made our own string-builder that uses whole chunks to do the appending in?
in about 70 lines of code we can build our own string build that appends strings.
module stringBuilder;
struct StringBuilder
{
enum CHUNK_SIZE = 4096;
char[CHUNK_SIZE] fixedBuffer;
size_t fixedBufferUsed;
char[CHUNK_SIZE][] chunks;
size_t[] chunk_lengths;
ushort last_chunk_used;
void appendString(const(char)[] s)
{
// can we used our preallocated fixed buffer?
if (!last_chunk_used)
{
auto fixedBufferEnd = fixedBufferUsed + s.length;
if (fixedBufferEnd < CHUNK_SIZE)
{
fixedBuffer[fixedBufferUsed .. fixedBufferEnd] = s;
fixedBufferUsed = fixedBufferEnd;
}
}
else
{
LappendToLastChunk:
auto chunkBegin = chunk_lengths[last_chunk_used - 1];
auto chunkEnd = chunkBegin + s.length;
if (chunkEnd < CHUNK_SIZE)
{
chunks[last_chunk_used - 1][chunkBegin .. chunkEnd] = s;
chunk_lengths[last_chunk_used - 1] = chunkEnd;
}
else if (s.length <= CHUNK_SIZE)
{
chunks.length = last_chunk_used + 1;
last_chunk_used += 1;
goto LappendToLastChunk;
}
else
{
assert(0,
"string to be appended is longer than "
~ CHUNK_SIZE.stringof ~ " this is not supported currently");
}
}
}
string releaseBuffer()
{
size_t result_length = fixedBufferUsed;
foreach (i; 0 .. last_chunk_used - 1)
{
result_length += chunk_lengths[i];
}
char[] result;
result.length = result_length;
result[0 .. fixedBufferUsed] = fixedBuffer[0 .. fixedBufferUsed];
size_t result_begin_pos = fixedBufferUsed;
foreach (i; 0 .. last_chunk_used - 1)
{
const chunk_used = chunk_lengths[i];
result[result_begin_pos .. result_begin_pos + chunk_used] = chunks[i][0 .. chunk_used];
result_begin_pos += chunk_used;
}
return cast(string) result;
}
}
that wasn't too bad was it?
But will such a simple piece of code really be able to speed us up?
lets do a quick perf test.
import stringBuilder;
static if (is(typeof(import ("N_ITERATIONS"))))
{
enum n_iterations = mixin(import("N_ITERATIONS"));
}
else
{
enum n_iterations = 4096;
}
pragma(msg, "N: ", n_iterations);
enum big_string =
() {
version (StringBuilder)
{
StringBuilder b;
}
else version (Appender)
{
import std.array : Appender;
Appender!(string) result;
}
else
{
string result;
}
foreach(_; 0 ..n_iterations)
{
version(StringBuilder)
{
b.appendString("Let us see ... how meta can you go?\n");
}
else version (Appender)
{
result ~= ("Let us see ... how meta can you go?\n");
}
else
{
result ~= ("Let us see ... how meta can you go?\n");
}
}
version (StringBuilder)
{
return b.releaseBuffer();
}
else version (Appender)
{
return result.opSlice();
}
else
return result;
} ();
pragma(msg, big_string);
And here are the results for various values of N_ITERATIONS:
(times are in milliseconds and are the time) dmd -c takes
| N | stringBuilder | naive | Appender |
|-------|-----------------|---------|-----------------------------|
| 32 | 28.1 | 28 | 53 |
| 128 | 29.0 | 27.3 | 58 |
| 512 | 29.4 | 29.7 | 120.4 |
| 1024 | 30.6 | 33.7 | 304.4 |
| 2048 | 31.8 | 51.2 | 1055 |
| 4096 | 35.1 | 120.8 | 4338 |
| 8192 | 41.0 | 368.1 | 18146 |
| 16384 | 52.5 | 1353.1 | inf (it ran out of memeory) |