October 28, 2014
On 10/27/14 8:01 PM, Manu via Digitalmars-d wrote:
>   28 October 2014 04:40, Benjamin Thaut via Digitalmars-d
> <digitalmars-d@puremagic.com> wrote:
>> Am 27.10.2014 11:07, schrieb Daniel Murphy:
>>
>>> "Benjamin Thaut"  wrote in message news:m2kt16$2566$1@digitalmars.com...
>>>>
>>>> I'm planning on doing a pull request for druntime which rewrites every
>>>> toString function within druntime to use the new sink signature. That
>>>> way druntime would cause a lot less allocations which end up beeing
>>>> garbage right away. Are there any objections against doing so? Any
>>>> reasons why such a pull request would not get accepted?
>>>
>>>
>>> How ugly is it going to be, since druntime can't use std.format?
>>
>>
>> They wouldn't get any uglier than they already are, because the current
>> toString functions within druntime also can't use std.format.
>>
>> An example would be to toString function of TypInfo_StaticArray:
>>
>> override string toString() const
>> {
>>          SizeStringBuff tmpBuff = void;
>>          return value.toString() ~ "[" ~
>> cast(string)len.sizeToTempString(tmpBuff) ~ "]";
>> }
>>
>> Would be replaced by:
>>
>> override void toString(void delegate(const(char)[]) sink) const
>> {
>>          SizeStringBuff tmpBuff = void;
>>          value.toString(sink);
>>          sink("[");
>>          sink(cast(string)len.sizeToTempString(tmpBuff));
>>          sink("]");
>> }
>
> The thing that really worries me about this synk API is that your code
> here produces (at least) 4 calls to a delegate. That's a lot of
> indirect function calling, which can be a severe performance hazard on
> some systems.
> We're trading out garbage for low-level performance hazards, which may
> imply a reduction in portability.

I think given the circumstances, we are better off. But when we find a platform that does perform worse, we can try and implement alternatives. I don't want to destroy performance on the platforms we *do* support, for the worry that some future platform isn't as friendly to this method.

> But in any case, I think all synk code like this should aim to call
> the user supplied synk delegate at most *once* per toString.
> I'd like to see code that used the stack to compose the string
> locally, then feed it through to the supplied synk delegate in fewer
> (or one) calls.

This is a good goal to have, regardless. The stack is always pretty high performing. However, it doesn't scale well. If you look above, the function already uses the stack to output the number. It would be trivial to add 2 chars to put the "[]" there also so only one sink call occurs.

But an aggregate which relies on members to output themselves is going to have a tough time following this model. Only at the lowest levels can we enforce such a rule.

Another thing to think about is that the inliner can potentially get rid of the cost of delegate calls.

> Ideally, I guess I'd prefer to see an overload which receives a slice
> to write to instead and do away with the delegate call. Particularly
> in druntime, where API and potential platform portability decisions
> should be *super*conservative.

This puts the burden on the caller to ensure enough space is allocated. Or you have to reenter the function to finish up the output. Neither of these seem like acceptable drawbacks.

What would you propose for such a mechanism? Maybe I'm not thinking of your ideal API.

-Steve
October 28, 2014
On Tue, 28 Oct 2014 08:37:43 -0400
Steven Schveighoffer via Digitalmars-d <digitalmars-d@puremagic.com>
wrote:

> Meta has a cost with the current compiler. It would be nice if it didn't, but I have practical concerns.
i don't think that there will be alot calls to 'write[f]' anyway. i know that CTFE is not costless (i once wrote a simple Convey's Life sample and summoned OOM-killer with CTFE .lif parser! ;-), but this can be compensated by adding CTFE writef use function-by-function.

> I think a few simple functions can suffice for druntime's purposes. We don't need a kitchen sink function (pun intended).
ah, those famous last words... ;-) from my observations it's enough to implement '%[+-]width[.maxlen]s' and the same for '%x'. i also added codes to skip arguments and to print all what's left ('%*'). i'm sure that it can be done in <10KB, and it will be very handy to have. druntime doesn't do alot of printing and string conversions anyway. and phobos is already ridden with templates and CTFE.


October 28, 2014
On 28 October 2014 22:51, Steven Schveighoffer via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 10/27/14 8:01 PM, Manu via Digitalmars-d wrote:
>>
>>   28 October 2014 04:40, Benjamin Thaut via Digitalmars-d
>> <digitalmars-d@puremagic.com> wrote:
>>>
>>> Am 27.10.2014 11:07, schrieb Daniel Murphy:
>>>
>>>> "Benjamin Thaut"  wrote in message news:m2kt16$2566$1@digitalmars.com...
>>>>>
>>>>>
>>>>> I'm planning on doing a pull request for druntime which rewrites every toString function within druntime to use the new sink signature. That way druntime would cause a lot less allocations which end up beeing garbage right away. Are there any objections against doing so? Any reasons why such a pull request would not get accepted?
>>>>
>>>>
>>>>
>>>> How ugly is it going to be, since druntime can't use std.format?
>>>
>>>
>>>
>>> They wouldn't get any uglier than they already are, because the current toString functions within druntime also can't use std.format.
>>>
>>> An example would be to toString function of TypInfo_StaticArray:
>>>
>>> override string toString() const
>>> {
>>>          SizeStringBuff tmpBuff = void;
>>>          return value.toString() ~ "[" ~
>>> cast(string)len.sizeToTempString(tmpBuff) ~ "]";
>>> }
>>>
>>> Would be replaced by:
>>>
>>> override void toString(void delegate(const(char)[]) sink) const
>>> {
>>>          SizeStringBuff tmpBuff = void;
>>>          value.toString(sink);
>>>          sink("[");
>>>          sink(cast(string)len.sizeToTempString(tmpBuff));
>>>          sink("]");
>>> }
>>
>>
>> The thing that really worries me about this synk API is that your code
>> here produces (at least) 4 calls to a delegate. That's a lot of
>> indirect function calling, which can be a severe performance hazard on
>> some systems.
>> We're trading out garbage for low-level performance hazards, which may
>> imply a reduction in portability.
>
>
> I think given the circumstances, we are better off. But when we find a platform that does perform worse, we can try and implement alternatives. I don't want to destroy performance on the platforms we *do* support, for the worry that some future platform isn't as friendly to this method.

Video games consoles are very real, and very now.
I suspect they may even represent the largest body of native code in
the world today.

I don't know if 'alternatives' is the right phrase, since this approach isn't implemented yet, and I wonder if a slightly different API strategy exists which may not exhibit this problem.


>> But in any case, I think all synk code like this should aim to call
>> the user supplied synk delegate at most *once* per toString.
>> I'd like to see code that used the stack to compose the string
>> locally, then feed it through to the supplied synk delegate in fewer
>> (or one) calls.
>
>
> This is a good goal to have, regardless. The stack is always pretty high performing. However, it doesn't scale well. If you look above, the function already uses the stack to output the number. It would be trivial to add 2 chars to put the "[]" there also so only one sink call occurs.

It would be trivial, and that's precisely what I'm suggesting! :)


> But an aggregate which relies on members to output themselves is going to have a tough time following this model. Only at the lowest levels can we enforce such a rule.

I understand this, which is the main reason I suggest to explore something other than a delegate based interface.


> Another thing to think about is that the inliner can potentially get rid of the cost of delegate calls.

druntime is a binary lib. The inliner has no effect on this equation.


>> Ideally, I guess I'd prefer to see an overload which receives a slice to write to instead and do away with the delegate call. Particularly in druntime, where API and potential platform portability decisions should be *super*conservative.
>
>
> This puts the burden on the caller to ensure enough space is allocated. Or you have to reenter the function to finish up the output. Neither of these seem like acceptable drawbacks.

Well that's why I open for discussion. I'm sure there's room for creativity here.

It doesn't seem that unreasonable to reenter the function to me actually, I'd prefer a second static call in the rare event that a buffer wasn't big enough, to many indirect calls in every single case. There's no way that reentry would be slower. It may be more inconvenient, but I wonder if some API creativity could address that...?


> What would you propose for such a mechanism? Maybe I'm not thinking of your ideal API.

I haven't thought of one I'm really happy with.
I can imagine some 'foolproof' solution at the API level which may
accept some sort of growable string object (which may represent a
stack allocation by default). This could lead to a virtual call if the
buffer needs to grow, but that's not really any worse than a delegate
call, and it's only in the rare case of overflow, rather than many
calls in all cases.
October 29, 2014
On Tuesday, 28 October 2014 at 23:06:32 UTC, Manu via Digitalmars-d wrote:
> I haven't thought of one I'm really happy with.
> I can imagine some 'foolproof' solution at the API level which may
> accept some sort of growable string object (which may represent a
> stack allocation by default). This could lead to a virtual call if the
> buffer needs to grow, but that's not really any worse than a delegate
> call, and it's only in the rare case of overflow, rather than many
> calls in all cases.

If you want to combine two approaches of writing data, you will need to write toString logic twice, both variants intertwined on every byte written.
October 29, 2014
struct Sink
{
   char[] buff;
   void delegate(in char[]) sink;

   void write(in char[] s)
   {
      auto len=min(s.length,buff.length);
      buff[0..len]=s[0..len];
      buff=buff[len..$];
      const s1=s[len..$];
      if(s1.length)sink(s1);
   }
}

override void toString(ref Sink sink) const
{
   value.toString(sink);
   sink.write("[");
   len.toString(sink);
   sink.write("]");
}
October 30, 2014
On 10/28/14 7:06 PM, Manu via Digitalmars-d wrote:
> On 28 October 2014 22:51, Steven Schveighoffer via Digitalmars-d
> <digitalmars-d@puremagic.com> wrote:
>> On 10/27/14 8:01 PM, Manu via Digitalmars-d wrote:
>>>
>>>    28 October 2014 04:40, Benjamin Thaut via Digitalmars-d
>>> <digitalmars-d@puremagic.com> wrote:
>>>>
>>>> Am 27.10.2014 11:07, schrieb Daniel Murphy:
>>>>
>>>>> "Benjamin Thaut"  wrote in message news:m2kt16$2566$1@digitalmars.com...
>>>>>>
>>>>>>
>>>>>> I'm planning on doing a pull request for druntime which rewrites every
>>>>>> toString function within druntime to use the new sink signature. That
>>>>>> way druntime would cause a lot less allocations which end up beeing
>>>>>> garbage right away. Are there any objections against doing so? Any
>>>>>> reasons why such a pull request would not get accepted?
>>>>>
>>>>>
>>>>>
>>>>> How ugly is it going to be, since druntime can't use std.format?
>>>>
>>>>
>>>>
>>>> They wouldn't get any uglier than they already are, because the current
>>>> toString functions within druntime also can't use std.format.
>>>>
>>>> An example would be to toString function of TypInfo_StaticArray:
>>>>
>>>> override string toString() const
>>>> {
>>>>           SizeStringBuff tmpBuff = void;
>>>>           return value.toString() ~ "[" ~
>>>> cast(string)len.sizeToTempString(tmpBuff) ~ "]";
>>>> }
>>>>
>>>> Would be replaced by:
>>>>
>>>> override void toString(void delegate(const(char)[]) sink) const
>>>> {
>>>>           SizeStringBuff tmpBuff = void;
>>>>           value.toString(sink);
>>>>           sink("[");
>>>>           sink(cast(string)len.sizeToTempString(tmpBuff));
>>>>           sink("]");
>>>> }
>>>
>>>
>>> The thing that really worries me about this synk API is that your code
>>> here produces (at least) 4 calls to a delegate. That's a lot of
>>> indirect function calling, which can be a severe performance hazard on
>>> some systems.
>>> We're trading out garbage for low-level performance hazards, which may
>>> imply a reduction in portability.
>>
>>
>> I think given the circumstances, we are better off. But when we find a
>> platform that does perform worse, we can try and implement alternatives. I
>> don't want to destroy performance on the platforms we *do* support, for the
>> worry that some future platform isn't as friendly to this method.
>
> Video games consoles are very real, and very now.
> I suspect they may even represent the largest body of native code in
> the world today.

Sorry, I meant future *D supported* platforms, not future not-yet-existing platforms.

> I don't know if 'alternatives' is the right phrase, since this
> approach isn't implemented yet, and I wonder if a slightly different
> API strategy exists which may not exhibit this problem.

Well, the API already exists and is supported. The idea is to migrate the existing toString calls to the new API.

>> But an aggregate which relies on members to output themselves is going to
>> have a tough time following this model. Only at the lowest levels can we
>> enforce such a rule.
>
> I understand this, which is the main reason I suggest to explore
> something other than a delegate based interface.

Before we start ripping apart our existing APIs, can we show that the performance is really going to be so bad? I know virtual calls have a bad reputation, but I hate to make these choices absent real data.

For instance, D's underlying i/o system uses FILE *, which is about as virtual as you can get. So are you avoiding a virtual call to use a buffer to then pass to a virtual call later?

>> Another thing to think about is that the inliner can potentially get rid of
>> the cost of delegate calls.
>
> druntime is a binary lib. The inliner has no effect on this equation.

It depends on the delegate and the item being output, whether the source is available to the compiler, and whether or not it's a virtual function. True, some cases will not be inlinable. But the "tweaks" we implement for platform X which does not do well with delegate calls, could be to make this more available.

>>> Ideally, I guess I'd prefer to see an overload which receives a slice
>>> to write to instead and do away with the delegate call. Particularly
>>> in druntime, where API and potential platform portability decisions
>>> should be *super*conservative.
>>
>>
>> This puts the burden on the caller to ensure enough space is allocated. Or
>> you have to reenter the function to finish up the output. Neither of these
>> seem like acceptable drawbacks.
>
> Well that's why I open for discussion. I'm sure there's room for
> creativity here.
>
> It doesn't seem that unreasonable to reenter the function to me
> actually, I'd prefer a second static call in the rare event that a
> buffer wasn't big enough, to many indirect calls in every single case.

A reentrant function has to track the state of what has been output, which is horrific in my opinion.

> There's no way that reentry would be slower. It may be more
> inconvenient, but I wonder if some API creativity could address
> that...?

The largest problem I see is, you may not know before you start generating strings whether it will fit in the buffer, and therefore, you may still end up eventually calling the sink.

Note, you can always allocate a stack buffer, use an inner function as a delegate, and get the inliner to remove the indirect calls. Or use an alternative private mechanism to build the data.

Would you say that *one* delegate call per object output is OK?

>> What would you propose for such a mechanism? Maybe I'm not thinking of your
>> ideal API.
>
> I haven't thought of one I'm really happy with.
> I can imagine some 'foolproof' solution at the API level which may
> accept some sort of growable string object (which may represent a
> stack allocation by default). This could lead to a virtual call if the
> buffer needs to grow, but that's not really any worse than a delegate
> call, and it's only in the rare case of overflow, rather than many
> calls in all cases.
>

This is a typical mechanism that Tango used -- pass in a ref to a dynamic array referencing a stack buffer. If it needed to grow, just update the length, and it moves to the heap. In most cases, the stack buffer is enough. But the idea is to try and minimize the GC allocations, which are performance killers on the current platforms.

I think adding the option of using a delegate is not limiting -- you can always, on a platform that needs it, implement a alternative protocol that is internal to druntime. We are not preventing such protocols by adding the delegate version.

But on our currently supported platforms, the delegate vs. GC call is soo much better. I can't see any reason to avoid the latter.

-Steve
October 30, 2014
On 10/29/14 6:03 AM, Kagamin wrote:
> struct Sink
> {
>     char[] buff;
>     void delegate(in char[]) sink;
>
>     void write(in char[] s)
>     {
>        auto len=min(s.length,buff.length);
>        buff[0..len]=s[0..len];
>        buff=buff[len..$];
>        const s1=s[len..$];
>        if(s1.length)sink(s1);
>     }
> }
>
> override void toString(ref Sink sink) const
> {
>     value.toString(sink);
>     sink.write("[");
>     len.toString(sink);
>     sink.write("]");
> }

This would require sink to write the buffer before it's first call, since you don't track that.

Wouldn't it be better to track the "used" length in buff directly so write can handle that?

Not a bad idea, BTW.

-Steve
October 30, 2014
On Thursday, 30 October 2014 at 15:32:30 UTC, Steven Schveighoffer wrote:
> This would require sink to write the buffer before it's first call, since you don't track that.

That's delegated to the sink delegate.

> Wouldn't it be better to track the "used" length in buff directly so write can handle that?

The used length is tracked by shrinking the buff. This only shows the callee's perspective.
October 30, 2014
On Thursday, 30 October 2014 at 15:32:30 UTC, Steven Schveighoffer wrote:
> This would require sink to write the buffer before it's first call, since you don't track that.

Also not always true, depends on sink implementation. If you construct string in memory, you can concat the buffered result with sunken result any time later.
October 30, 2014
On 10/30/14 11:54 AM, Kagamin wrote:
> On Thursday, 30 October 2014 at 15:32:30 UTC, Steven Schveighoffer wrote:
>> This would require sink to write the buffer before it's first call,
>> since you don't track that.
>
> That's delegated to the sink delegate.

Keep in mind, sink delegate is not a singly implemented function, it's implemented wherever the output is done. So it's a lot of boilerplate to copy around.

>
>> Wouldn't it be better to track the "used" length in buff directly so
>> write can handle that?
>
> The used length is tracked by shrinking the buff. This only shows the
> callee's perspective.

No, what I mean is:

struct Sink
{
   char[] buff;
   uint used; // buff[used..$] is what your buff used to be.
   void delegate(in char[]) sink;
   ...
}

This way, when write finds it runs out of space, first thing it does is sink the buff, then starts sinking the rest. In fact, you can just keep using buff once you sink it, to avoid future "extra calls" to sink.

Note, that this can be implemented right on top of the existing sink mechanism.

-Steve