February 07, 2014
"Lars T. Kyllingstad" <public@kyllingen.net> writes:

> On Friday, 7 February 2014 at 10:29:07 UTC, Walter Bright wrote:
> Ah, now I understand.  I misunderstood you -- I thought you meant that using
> ScopeBuffer to build the return array inside buildPath(), while retaining the
> function's API, would somehow improve its performance greatly.  But what
> you're saying is that you would change the signature as well, to something
> like this:
>
>   void buildPath(IR, OR)(IR segments, OR result)
>       if (isInputRange!IR && isOutputRange!(OR, char));
>
> I fully agree, then, and I see how ScopeBuffer would be extremely useful if more of Phobos' functions were written like this.  In the specific case of buildPath(), I've actually considered letting the user supply their own buffer in the form of an array, but this is of course a more general solution.

I'd suggest reversing the arguments:

  void buildPath(IR, OR)(OR result, IR segments)
      if (isInputRange!IR && isOutputRange!(OR, char));

That way you can use it as:

buffer.buildPath(p1, p2, ...);

It at least opens up chaining possibilities.

February 07, 2014
On Friday, 7 February 2014 at 21:05:43 UTC, Jerry wrote:
> I'd suggest reversing the arguments:
>
>   void buildPath(IR, OR)(OR result, IR segments)
>       if (isInputRange!IR && isOutputRange!(OR, char));
>
> That way you can use it as:
>
> buffer.buildPath(p1, p2, ...);
>
> It at least opens up chaining possibilities.

On the other hand the output buffer last allows stuff like this contrived example:

"some/foo/path"
  .splitter("/")
  .buildPath(buffer);

I'm not sure what would be more common and useful.
February 07, 2014
On 2/7/2014 1:13 AM, Walter Bright wrote:
> On 2/7/2014 12:46 AM, Brad Anderson wrote:
>> Why not just stick the stack array in ScopeBuffer itself (the
>> size could be a template parameter)?
>
> 1. It's set up to fit into two registers on 64 bit code. This means it can be
> passed/returned from functions in registers. When I used this in my own code,
> high speed was the top priority, and this made a difference.
>
> 2. It needs to avoid the default initialization of the array, because it's both
> unnecessary and it kills the speed. Currently, this
> cannot be avoided for part of a struct.
>
> 3. I wanted it to be usable for any block of memory the user wished to dedicate
> to be a buffer.
>
> 4. Having an internal reference like that would mean a postblit is required for
> copying/moving the range, another source of speed degradation.

5. Every instantiation that only differs by the buffer size would generate a separate set of code. Having the buffer size passed into the constructor means only one instance is generated per type.
February 07, 2014
On Friday, 7 February 2014 at 22:12:29 UTC, Walter Bright wrote:
>
> 5. Every instantiation that only differs by the buffer size would generate a separate set of code. Having the buffer size passed into the constructor means only one instance is generated per type.

There's always alloca :)
February 07, 2014
On Friday, 7 February 2014 at 21:24:16 UTC, Brad Anderson wrote:
> On Friday, 7 February 2014 at 21:05:43 UTC, Jerry wrote:
>> I'd suggest reversing the arguments:
>>
>>  void buildPath(IR, OR)(OR result, IR segments)
>>      if (isInputRange!IR && isOutputRange!(OR, char));
>>
>> That way you can use it as:
>>
>> buffer.buildPath(p1, p2, ...);
>>
>> It at least opens up chaining possibilities.
>
> On the other hand the output buffer last allows stuff like this contrived example:
>
> "some/foo/path"
>   .splitter("/")
>   .buildPath(buffer);
>
> I'm not sure what would be more common and useful.

Those are two different overloads, so I think we could do both.
February 07, 2014
On 2/7/2014 2:14 PM, Brad Anderson wrote:
> On Friday, 7 February 2014 at 22:12:29 UTC, Walter Bright wrote:
>>
>> 5. Every instantiation that only differs by the buffer size would generate a
>> separate set of code. Having the buffer size passed into the constructor means
>> only one instance is generated per type.
>
> There's always alloca :)

alloca() cannot be used to allocate stack data in a function enclosing the current one.
February 07, 2014
On Friday, 7 February 2014 at 23:10:50 UTC, Walter Bright wrote:
> On 2/7/2014 2:14 PM, Brad Anderson wrote:
>>
>> There's always alloca :)
>
> alloca() cannot be used to allocate stack data in a function enclosing the current one.

Oh, right. Forgot about that.
February 08, 2014
On 2/7/14, 3:11 PM, Brad Anderson wrote:
> On Friday, 7 February 2014 at 23:10:50 UTC, Walter Bright wrote:
>> On 2/7/2014 2:14 PM, Brad Anderson wrote:
>>>
>>> There's always alloca :)
>>
>> alloca() cannot be used to allocate stack data in a function enclosing
>> the current one.
>
> Oh, right. Forgot about that.

You can with a default parameter...

void fun(void* sneaky = alloca(42));

will allocate memory on fun caller's frame and make it available to fun.

I've known this for a while and am not sure whether it's an awesome idiom of the spawn of devil.


Andrei

February 08, 2014
On Friday, 7 February 2014 at 23:11:42 UTC, Brad Anderson wrote:
> On Friday, 7 February 2014 at 23:10:50 UTC, Walter Bright wrote:
>> On 2/7/2014 2:14 PM, Brad Anderson wrote:
>>>
>>> There's always alloca :)
>>
>> alloca() cannot be used to allocate stack data in a function enclosing the current one.
>
> Oh, right. Forgot about that.

(Of course you can. Modify the parent's stack frame and return by a JMP. ;-)
February 08, 2014
On Friday, February 07, 2014 11:23:30 Lars T. Kyllingstad wrote:
> On Friday, 7 February 2014 at 10:29:07 UTC, Walter Bright wrote:
> > On 2/7/2014 2:02 AM, Lars T. Kyllingstad wrote:
> >> I don't understand. Even if your workspace is stack-based or
> >> malloc-ed, you
> >> still need that one GC allocation for the return value, no?
> > 
> > If you have a path with 20 entries in it, and don't find the file, you do zero allocations instead of 20. If you find the file, that's do one allocation instead of 20/2 (on average), and you might be able to avoid that by passing it upwards, too
> > 
> > :-)
> 
> Ah, now I understand. I misunderstood you -- I thought you meant that using ScopeBuffer to build the return array inside buildPath(), while retaining the function's API, would somehow improve its performance greatly. But what you're saying is that you would change the signature as well, to something like this:
> 
> void buildPath(IR, OR)(IR segments, OR result)
> if (isInputRange!IR && isOutputRange!(OR, char));
> 
> I fully agree, then, and I see how ScopeBuffer would be extremely useful if more of Phobos' functions were written like this. In the specific case of buildPath(), I've actually considered letting the user supply their own buffer in the form of an array, but this is of course a more general solution.

We really should be moving to a model where any function that returns a new array has on overload which takes an output range and uses that for the output instead of returning a newly allocated array. That way, code that doesn't need the performance boost can just use the return value like we do now, but more performance-critical code can use output ranges and avoid GC allocations and the like.

- Jonathan M Davis