November 01, 2014
On Saturday, 1 November 2014 at 05:27:16 UTC, Jonathan Marler wrote:
> No need for the extra function, just call:
>
> x.toString(&(outputRange.put));

That doesn't work for a wide variety of possible cases, notably when `put` is a function template or when the code depends on std.range.put or some other UFCS `put` function. As such, it should be avoided in generic code, and then you might as well avoid it in general, lest your algorithm unnecessarily ends up breaking with output ranges you didn't test for after refactoring.

(Note that parantheses are not required in your example)
November 01, 2014
On 10/30/2014 8:30 AM, Steven Schveighoffer wrote:
> This is a typical mechanism that Tango used -- pass in a ref to a dynamic array
> referencing a stack buffer. If it needed to grow, just update the length, and it
> moves to the heap. In most cases, the stack buffer is enough. But the idea is to
> try and minimize the GC allocations, which are performance killers on the
> current platforms.

We keep solving the same problem over and over. std.internal.scopebuffer does this handily. It's what it was designed for - it works, it's fast, and it virtually eliminates the need for heap allocations. Best of all, it's an Output Range, meaning it fits in with the range design of Phobos.

November 01, 2014
On Saturday, 1 November 2014 at 07:02:03 UTC, Walter Bright wrote:
> On 10/30/2014 8:30 AM, Steven Schveighoffer wrote:
>> This is a typical mechanism that Tango used -- pass in a ref to a dynamic array
>> referencing a stack buffer. If it needed to grow, just update the length, and it
>> moves to the heap. In most cases, the stack buffer is enough. But the idea is to
>> try and minimize the GC allocations, which are performance killers on the
>> current platforms.
>
> We keep solving the same problem over and over. std.internal.scopebuffer does this handily. It's what it was designed for - it works, it's fast, and it virtually eliminates the need for heap allocations. Best of all, it's an Output Range, meaning it fits in with the range design of Phobos.

It is not the same thing as ref/out buffer argument. We have been running ping-pong comments about it for a several times now. All std.internal.scopebuffer does is reducing heap allocation count at cost of stack consumption (and switching to raw malloc for heap) - it does not change big-O estimate of heap allocations unless it is used as a buffer argument - at which point it is no better than plain array.
November 01, 2014
On Saturday, 1 November 2014 at 06:04:56 UTC, Jakob Ovrum wrote:
> On Saturday, 1 November 2014 at 05:27:16 UTC, Jonathan Marler wrote:
>> No need for the extra function, just call:
>>
>> x.toString(&(outputRange.put));
>
> That doesn't work for a wide variety of possible cases, notably when `put` is a function template or when the code depends on std.range.put or some other UFCS `put` function. As such, it should be avoided in generic code, and then you might as well avoid it in general, lest your algorithm unnecessarily ends up breaking with output ranges you didn't test for after refactoring.
>
> (Note that parantheses are not required in your example)

Ah yes, you are right that this wouldn't work in generic code.  Meaning, if the code calling toString was itself a template accepting output ranges, then many times the &outputRange.put wouldn't work.  In this case I think the anonymous function is a good way to go.  I was more thinking of the case where the code calling toString was user code where the outputRange was a known type.  Thanks for catching my silly assumption.
November 01, 2014
On Saturday, 1 November 2014 at 12:31:15 UTC, Dicebot wrote:
> On Saturday, 1 November 2014 at 07:02:03 UTC, Walter Bright wrote:
>> On 10/30/2014 8:30 AM, Steven Schveighoffer wrote:
>>> This is a typical mechanism that Tango used -- pass in a ref to a dynamic array
>>> referencing a stack buffer. If it needed to grow, just update the length, and it
>>> moves to the heap. In most cases, the stack buffer is enough. But the idea is to
>>> try and minimize the GC allocations, which are performance killers on the
>>> current platforms.
>>
>> We keep solving the same problem over and over. std.internal.scopebuffer does this handily. It's what it was designed for - it works, it's fast, and it virtually eliminates the need for heap allocations. Best of all, it's an Output Range, meaning it fits in with the range design of Phobos.
>
> It is not the same thing as ref/out buffer argument. We have been running ping-pong comments about it for a several times now. All std.internal.scopebuffer does is reducing heap allocation count at cost of stack consumption (and switching to raw malloc for heap) - it does not change big-O estimate of heap allocations unless it is used as a buffer argument - at which point it is no better than plain array.

Sorry if this is a stupid question but what's being discussed here? Are we talking about passing a scope buffer to toString, or are we talking about the implementation of the toString function allocating it's own scope buffer?  API change or implementation notes or something else?  Thanks.
November 01, 2014
On 31 October 2014 01:30, Steven Schveighoffer via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 10/28/14 7:06 PM, Manu via Digitalmars-d wrote:
>>
>> On 28 October 2014 22:51, Steven Schveighoffer via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>
>>> On 10/27/14 8:01 PM, Manu via Digitalmars-d wrote:
>>>>
>>>>
>>>>    28 October 2014 04:40, Benjamin Thaut via Digitalmars-d
>>>> <digitalmars-d@puremagic.com> wrote:
>>>>>
>>>>>
>>>>> Am 27.10.2014 11:07, schrieb Daniel Murphy:
>>>>>
>>>>>> "Benjamin Thaut"  wrote in message news:m2kt16$2566$1@digitalmars.com...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'm planning on doing a pull request for druntime which rewrites
>>>>>>> every
>>>>>>> toString function within druntime to use the new sink signature. That
>>>>>>> way druntime would cause a lot less allocations which end up beeing
>>>>>>> garbage right away. Are there any objections against doing so? Any
>>>>>>> reasons why such a pull request would not get accepted?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> How ugly is it going to be, since druntime can't use std.format?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> They wouldn't get any uglier than they already are, because the current toString functions within druntime also can't use std.format.
>>>>>
>>>>> An example would be to toString function of TypInfo_StaticArray:
>>>>>
>>>>> override string toString() const
>>>>> {
>>>>>           SizeStringBuff tmpBuff = void;
>>>>>           return value.toString() ~ "[" ~
>>>>> cast(string)len.sizeToTempString(tmpBuff) ~ "]";
>>>>> }
>>>>>
>>>>> Would be replaced by:
>>>>>
>>>>> override void toString(void delegate(const(char)[]) sink) const
>>>>> {
>>>>>           SizeStringBuff tmpBuff = void;
>>>>>           value.toString(sink);
>>>>>           sink("[");
>>>>>           sink(cast(string)len.sizeToTempString(tmpBuff));
>>>>>           sink("]");
>>>>> }
>>>>
>>>>
>>>>
>>>> The thing that really worries me about this synk API is that your code
>>>> here produces (at least) 4 calls to a delegate. That's a lot of
>>>> indirect function calling, which can be a severe performance hazard on
>>>> some systems.
>>>> We're trading out garbage for low-level performance hazards, which may
>>>> imply a reduction in portability.
>>>
>>>
>>>
>>> I think given the circumstances, we are better off. But when we find a
>>> platform that does perform worse, we can try and implement alternatives.
>>> I
>>> don't want to destroy performance on the platforms we *do* support, for
>>> the
>>> worry that some future platform isn't as friendly to this method.
>>
>>
>> Video games consoles are very real, and very now.
>> I suspect they may even represent the largest body of native code in
>> the world today.
>
>
> Sorry, I meant future *D supported* platforms, not future not-yet-existing platforms.

I'm not sure what you mean. I've used D on current and existing games consoles. I personally think it's one of D's most promising markets... if not for just a couple of remaining details.

Also, my suggestion will certainly perform better on all platforms. There is no platform that can benefit from the existing proposal of an indirect function call per write vs something that doesn't.

>> I don't know if 'alternatives' is the right phrase, since this approach isn't implemented yet, and I wonder if a slightly different API strategy exists which may not exhibit this problem.
>
>
> Well, the API already exists and is supported. The idea is to migrate the existing toString calls to the new API.

Really? Bummer... I haven't seen this API anywhere yet.
Seems a shame to make such a mistake with a brand new API. Too many
competing API patterns :/

>>> But an aggregate which relies on members to output themselves is going to have a tough time following this model. Only at the lowest levels can we enforce such a rule.
>>
>>
>> I understand this, which is the main reason I suggest to explore something other than a delegate based interface.
>
>
> Before we start ripping apart our existing APIs, can we show that the performance is really going to be so bad? I know virtual calls have a bad reputation, but I hate to make these choices absent real data.

My career for a decade always seems to find it's way back to fighting
virtual calls. (in proprietary codebases so I can't easily present
case studies)
But it's too late now I guess. I should have gotten in when someone
came up with the idea... I thought it was new.

> For instance, D's underlying i/o system uses FILE *, which is about as virtual as you can get. So are you avoiding a virtual call to use a buffer to then pass to a virtual call later?

I do a lot of string processing, but it never finds it's way to a FILE*. I don't write console based software.

>>> Another thing to think about is that the inliner can potentially get rid
>>> of
>>> the cost of delegate calls.
>>
>>
>> druntime is a binary lib. The inliner has no effect on this equation.
>
>
> It depends on the delegate and the item being output, whether the source is available to the compiler, and whether or not it's a virtual function. True, some cases will not be inlinable. But the "tweaks" we implement for platform X which does not do well with delegate calls, could be to make this more available.

I suspect the cases where the inliner can do something useful would be in quite a significant minority (with respect to phobos and druntime in particular). I haven't tried it, but I have a lifetime of disassembling code of this sort, and I'm very familiar with the optimisation patterns.

>>>> Ideally, I guess I'd prefer to see an overload which receives a slice to write to instead and do away with the delegate call. Particularly in druntime, where API and potential platform portability decisions should be *super*conservative.
>>>
>>>
>>>
>>> This puts the burden on the caller to ensure enough space is allocated.
>>> Or
>>> you have to reenter the function to finish up the output. Neither of
>>> these
>>> seem like acceptable drawbacks.
>>
>>
>> Well that's why I open for discussion. I'm sure there's room for creativity here.
>>
>> It doesn't seem that unreasonable to reenter the function to me actually, I'd prefer a second static call in the rare event that a buffer wasn't big enough, to many indirect calls in every single case.
>
>
> A reentrant function has to track the state of what has been output, which is horrific in my opinion.

How so? It doesn't seem that bad to me. We're talking about druntime here, the single most used library in the whole ecosystem... that shit should be tuned to the max. It doesn't matter how pretty the code is.

>> There's no way that reentry would be slower. It may be more inconvenient, but I wonder if some API creativity could address that...?
>
>
> The largest problem I see is, you may not know before you start generating strings whether it will fit in the buffer, and therefore, you may still end up eventually calling the sink.

Right. The api should be structured to make a virtual call _only_ in
the rare instance the buffer overflows. That is my suggestion.
You can be certain to supply a buffer that will not overflow in many/most cases.

> Note, you can always allocate a stack buffer, use an inner function as a delegate, and get the inliner to remove the indirect calls. Or use an alternative private mechanism to build the data.

We're talking about druntime specifically. It is a binary lib. The inliner won't save you.

> Would you say that *one* delegate call per object output is OK?

I would say that an uncontrollable virtual call is NEVER okay,
especially in otherwise trivial and such core functions like toString
in druntime. But one is certainly better than many.
Remember I was arguing for final-by-default for years (because it's
really important)... and I'm still extremely bitter about that
outcome.

>>> What would you propose for such a mechanism? Maybe I'm not thinking of
>>> your
>>> ideal API.
>>
>>
>> I haven't thought of one I'm really happy with.
>> I can imagine some 'foolproof' solution at the API level which may
>> accept some sort of growable string object (which may represent a
>> stack allocation by default). This could lead to a virtual call if the
>> buffer needs to grow, but that's not really any worse than a delegate
>> call, and it's only in the rare case of overflow, rather than many
>> calls in all cases.
>>
>
> This is a typical mechanism that Tango used -- pass in a ref to a dynamic array referencing a stack buffer. If it needed to grow, just update the length, and it moves to the heap. In most cases, the stack buffer is enough. But the idea is to try and minimize the GC allocations, which are performance killers on the current platforms.

I wouldn't hard-code to overflow to the GC heap specifically. It should be an API that the user may overflow to wherever they like.

> I think adding the option of using a delegate is not limiting -- you can always, on a platform that needs it, implement a alternative protocol that is internal to druntime. We are not preventing such protocols by adding the delegate version.

You're saying that some platform may need to implement a further completely different API? Then no existing code will compile for that platform. This is madness. We already have more than enough API's.

> But on our currently supported platforms, the delegate vs. GC call is soo much better. I can't see any reason to avoid the latter.

The latter? (the GC?) .. Sorry, I'm confused.
November 01, 2014
On 1 November 2014 05:06, via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Friday, 31 October 2014 at 19:04:29 UTC, Walter Bright wrote:
>>
>> On 10/27/2014 12:42 AM, Benjamin Thaut wrote:
>>>
>>> I'm planning on doing a pull request for druntime which rewrites every
>>> toString
>>> function within druntime to use the new sink signature. That way druntime
>>> would
>>> cause a lot less allocations which end up beeing garbage right away. Are
>>> there
>>> any objections against doing so? Any reasons why such a pull request
>>> would not
>>> get accepted?
>>
>>
>> Why a sink version instead of an Output Range?
>
>
> I guess because it's for druntime, and we don't want to pull in std.range?

I'd say that I'd be nervous to see druntime chockers full of templates...?
November 01, 2014
On 31 October 2014 06:15, Steven Schveighoffer via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 10/30/14 2:53 PM, Jonathan Marler wrote:
>>>
>>>
>>> Before we start ripping apart our existing APIs, can we show that the performance is really going to be so bad? I know virtual calls have a bad reputation, but I hate to make these choices absent real data.
>>>
>>> For instance, D's underlying i/o system uses FILE *, which is about as virtual as you can get. So are you avoiding a virtual call to use a buffer to then pass to a virtual call later?
>>>
>> I think its debatable how useful this information would be but I've written a small D Program to try to explore the different performance statistics for various methods.  I've uploaded the code to my server, feel free to download/modify/use.
>>
>> Here's the various methods I've tested.
>> /**
>>     Method 1: ReturnString
>>               string toString();
>>     Method 2: SinkDelegate
>>               void toString(void delegate(const(char)[]) sink);
>>     Method 3: SinkDelegateWithStaticHelperBuffer
>>               struct SinkStatic { char[64] buffer; void
>> delegate(const(char)[]) sink; }
>>           void toString(ref SinkStatic sink);
>>     Method 4: SinkDelegateWithDynamicHelperBuffer
>>               struct SinkDynamic { char[] buffer; void
>> delegate(const(char)[]) sink; }
>>           void toString(ref SinkDynamic sink);
>>           void toString(SinkDynamic sink);
>>
>>   */
>>
>> Dmd/No Optimization (dmd dtostring.d):
>>
>> RuntimeString run 1 (loopcount 10000000)
>>    Method 1     : 76 ms
>>    Method 2     : 153 ms
>
>
> I think the above result is deceptive, and the test isn't very useful. The RuntimeString toString isn't a very interesting data point -- it's simply a single string. Not many cases are like that. Most types have multiple members, and it's the need to *construct* a string from that data which is usually the issue.
>
> But I would caution, the whole point of my query was about data on the platforms of which Manu speaks. That is, platforms that have issues dealing with virtual calls. x86 doesn't seem to be one of them.
>
> -Steve

I want to get back to this (and other topics), but I'm still about 30
posts behind, and I have to go...
I really can't keep up with this forum these days :/

I'll be back on this topic soon...
November 01, 2014
On Saturday, 1 November 2014 at 12:31:15 UTC, Dicebot wrote:
> It is not the same thing as ref/out buffer argument. We have been running ping-pong comments about it for a several times now. All std.internal.scopebuffer does is reducing heap allocation count at cost of stack consumption (and switching to raw malloc for heap) - it does not change big-O estimate of heap allocations unless it is used as a buffer argument - at which point it is no better than plain array.

Agreed. Its API also has severe safety problems.

David
November 01, 2014
On 11/1/2014 5:31 AM, Dicebot wrote:
> On Saturday, 1 November 2014 at 07:02:03 UTC, Walter Bright wrote:
>> On 10/30/2014 8:30 AM, Steven Schveighoffer wrote:
>>> This is a typical mechanism that Tango used -- pass in a ref to a dynamic array
>>> referencing a stack buffer. If it needed to grow, just update the length, and it
>>> moves to the heap. In most cases, the stack buffer is enough. But the idea is to
>>> try and minimize the GC allocations, which are performance killers on the
>>> current platforms.
>>
>> We keep solving the same problem over and over. std.internal.scopebuffer does
>> this handily. It's what it was designed for - it works, it's fast, and it
>> virtually eliminates the need for heap allocations. Best of all, it's an
>> Output Range, meaning it fits in with the range design of Phobos.
>
> It is not the same thing as ref/out buffer argument.

Don't understand your comment.


> We have been running
> ping-pong comments about it for a several times now. All
> std.internal.scopebuffer does is reducing heap allocation count at cost of stack
> consumption (and switching to raw malloc for heap) - it does not change big-O
> estimate of heap allocations unless it is used as a buffer argument - at which
> point it is no better than plain array.

1. stack allocation is not much of an issue here because these routines in druntime are not recursive, and there's plenty of stack available for what druntime is using toString for.

2. "All it does" is an inadequate summary. Most of the time it completely eliminates the need for allocations. This is a BIG deal, and the source of much of the speed of Warp. The cases where it does fall back to heap allocation do not leave GC memory around.

3. There is big-O in worst case, which is very unlikely, and big-O in 99% of the cases, which for scopebuffer is O(1).

4. You're discounting the API of scopebuffer which is usable as an output range, fitting it right in with Phobos' pipeline design.

Furthermore, again, I know it works and is fast.