Thread overview
What are best practices around toString?
Sep 30, 2022
christian.koestlin
Oct 01, 2022
tsbockman
Oct 01, 2022
Salih Dincer
Oct 01, 2022
tsbockman
Oct 02, 2022
Salih Dincer
Oct 06, 2022
christian.koestlin
September 30, 2022

Dear Dlang experts,

up until now I was perfectly happy with implementing (override) string toString() const or something to get nicely formatted (mostly debug) output for my structs, classes and exceptions.

But recently I stumbled upon https://wiki.dlang.org/Defining_custom_print_format_specifiers and additionally https://github.com/dlang/dmd/blob/4ff1eec2ce7d990dcd58e5b641ef3d0a1676b9bb/druntime/src/object.d#L2637 which at first sight is great, because it provides the same customization of an objects representation with less memory allocations.

When grepping through phobos, there are a bunch of "different" signatures implemented for this, e.g.

...
phobos/std/typecons.d:        void toString(DG)(scope DG sink) const
...
phobos/std/typecons.d:        void toString(DG, Char)(scope DG sink,  scope const ref FormatSpec!Char fmt) const
...
phobos/std/typecons.d:        void toString()(scope void delegate(const(char)[]) sink, scope const ref FormatSpec!char fmt)
...
phobos/std/sumtype.d:        void toString(this This, Sink, Char)(ref Sink sink, const ref FormatSpec!Char fmt);
...

to just show a few.

Furthermore, when one works with instances of struct, objects or exceptions a aInstance.toString() does not "work" when one only implements the sink interface (which is to be expected), whereas a std.conv.to!string or a formatted write with %s always works (no matter what was used to implement the toString).

So I wonder, what is best practice in the community and would it make sense to add something to dscanner, that "warns" on usages of aInstance.toString()?

Kind regards,
Christian

October 01, 2022

On Friday, 30 September 2022 at 13:11:56 UTC, christian.koestlin wrote:

>

Dear Dlang experts,

up until now I was perfectly happy with implementing (override) string toString() const or something to get nicely formatted (mostly debug) output for my structs, classes and exceptions.

Human beings read extremely slowly compared to how quickly the GC can allocate and free strings as needed, so there is no need to complicate your code with more text formatting strategies unless you want to generate this debug output far faster than a human can actually read it.

>

But recently I stumbled upon https://wiki.dlang.org/Defining_custom_print_format_specifiers and additionally https://github.com/dlang/dmd/blob/4ff1eec2ce7d990dcd58e5b641ef3d0a1676b9bb/druntime/src/object.d#L2637 which at first sight is great, because it provides the same customization of an objects representation with less memory allocations.

When grepping through phobos, there are a bunch of "different" signatures implemented for this, e.g.

...
phobos/std/typecons.d:        void toString(DG)(scope DG sink) const
...
phobos/std/typecons.d:        void toString(DG, Char)(scope DG sink,  scope const ref FormatSpec!Char fmt) const
...
phobos/std/typecons.d:        void toString()(scope void delegate(const(char)[]) sink, scope const ref FormatSpec!char fmt)
...
phobos/std/sumtype.d:        void toString(this This, Sink, Char)(ref Sink sink, const ref FormatSpec!Char fmt);
...

to just show a few.

The FormatSpec parameter only belongs there if you're actually going to do something useful with it in your toString implementation. Even if you are going to use it, you should probably still provide a convenience overload with a default specifier.

>

Furthermore, when one works with instances of struct, objects or exceptions a aInstance.toString() does not "work" when one only implements the sink interface (which is to be expected), whereas a std.conv.to!string or a formatted write with %s always works (no matter what was used to implement the toString).

I generally do something like this:

struct A {
    string message;
    int enthusiasm;

    void toString(DG)(scope DG sink) scope const @safe
        if(is(DG : void delegate(scope const(char[])) @safe)
        || is(DG : void function(scope const(char[])) @safe))
    {
        import std.format : formattedWrite;
        sink(message);
        sink(" x ");
        formattedWrite!"%d"(sink, enthusiasm);
        sink("!");
    }
    string toString() scope const pure @safe {
        StringBuilder builder;
        toString(&(builder.opCall)); // Find the exact string length.
        builder.allocate();
        toString(&(builder.opCall)); // Actually write the chars.
        return builder.finish();
    }
}

So, the first toString overload defines how to format the value to text, while the second overload does memory management and forwards the formatting work to the first.

StringBuilder is a utility shared across the entire project:

struct StringBuilder {
private:
    char[] buffer;
    size_t next;

public:
    void opCall(scope const(char[]) str) scope pure @safe nothrow @nogc {
        const curr = next;
        next += str.length;
        if(buffer !is null)
            buffer[curr .. next] = str[];
    }
    void allocate() scope pure @safe nothrow {
        buffer = new char[next];
        next = 0;
    }
    void allocate(const(size_t) maxLength) scope pure @safe nothrow {
        buffer = new char[maxLength];
        next = 0;
    }
    string finish() pure @trusted nothrow @nogc {
        assert(buffer !is null);
        string ret = cast(immutable) buffer[0 .. next];
        buffer = null;
        next = 0;
        return ret;
    }
}

The first formatting pass to find the required buffer length can be skipped if you can somehow pre-calculate the maximum possible length, or if you prefer the common strategy of repeatedly re-allocating the buffer with exponentially increasing size used by the likes of std.array.Appender. Since the API for toString remains the same regardless, you are free to choose the best strategy for each type.

October 01, 2022

On Saturday, 1 October 2022 at 08:26:43 UTC, tsbockman wrote:

>

So, the first toString overload defines how to format the value to text, while the second overload does memory management and forwards the formatting work to the first.

StringBuilder is a utility shared across the entire project:

Appender not good enough; at least in terms of allocating memory and accumulating a string?

Thanks...

SDB@79

October 01, 2022

On Saturday, 1 October 2022 at 10:02:34 UTC, Salih Dincer wrote:

>

On Saturday, 1 October 2022 at 08:26:43 UTC, tsbockman wrote:

>

StringBuilder is a utility shared across the entire project:

Appender not good enough; at least in terms of allocating memory and accumulating a string?

Appender is a legitimate option, but unless it is provided with a good estimate of the final length at the beginning, it will allocate several times for a longer string, and the final buffer will be, on average, 50% larger than needed.

Neither of these things is a major problem, but StringBuilder is only a few lines of code to perfectly minimize allocation, so why not?

October 02, 2022
On Saturday, 1 October 2022 at 17:50:54 UTC, tsbockman wrote:
> but unless it is provided with a good estimate of the final
> length at the beginning, it will allocate several times for
> a longer string, and the final buffer will be, on average, 50% larger than needed.

I see, it's smart!

SDB@79


October 06, 2022

On Saturday, 1 October 2022 at 17:50:54 UTC, tsbockman wrote:

>

On Saturday, 1 October 2022 at 10:02:34 UTC, Salih Dincer wrote:

>

On Saturday, 1 October 2022 at 08:26:43 UTC, tsbockman wrote:

>

StringBuilder is a utility shared across the entire project:

Appender not good enough; at least in terms of allocating memory and accumulating a string?

Appender is a legitimate option, but unless it is provided with a good estimate of the final length at the beginning, it will allocate several times for a longer string, and the final buffer will be, on average, 50% larger than needed.

Neither of these things is a major problem, but StringBuilder is only a few lines of code to perfectly minimize allocation, so why not?

Thanks a lot. One needs to go twice through the serialization, but perhaps thats better than reallocing memory.

Kind regards,
Christian