October 20, 2013
On 10/20/2013 10:46 AM, bearophile wrote:
> That's 7 lines of bug-prone code that uses a deprecated functionality and
> sometimes over-allocates on the stack. And I think you have to compare just the
> .ptr of those arrays at the end. And if you return one of such arrays you will
> produce nothing good. And what if you need 2D arrays? The code becomes even more
> complex. (You can of course create a matrix struct for that).
>
> Dynamically sized stack allocated arrays are meant to solve all those problems:
> to offer a nice, compact, clean, easy to remember and safe syntax. To be usable
> for 2D arrays too; and when you pass or return one of them the data is copied by
> the compiler on the heap (sometimes this doesn't happen if the optimizing
> compiler allocates the array in the stack frame of the caller, as sometimes done
> for structs).

If your optimizing compiler is that good, it can optimize "new T[n]" to be on the stack as well.

I'm not particularly enamored with the compiler inserting silent copying to the heap - D programmers tend to not like such things.

> D dynamic array usage should decrease and D should encourage much more the usage
> of small stack-allocated arrays. This is what languages as Ada and Rust teach
> us. Heap allocation of arrays should be much less common, almost a special case.

Rust is barely used at all, and constantly changes. I saw a Rust presentation recently by one of its developers, and he said his own slides showing pointer stuff were obsolete. I don't think there's enough experience with Rust to say it teaches us how to do things.


> This over-allocates on the stack,

I use this technique frequently. Allocating a few extra bytes on the stack generally costs nothing unless you're in a recursive function. Of course, if you're in a recursive function, stack allocated dynamic arrays can have unpredictable stack overflow issues.


> and sometimes needlessly allocates on the heap
> or in an arena. Dynamic stack arrays avoid those downsides.

The technique I showed is also generally faster than dynamic stack allocation.
October 20, 2013
On 20/10/13 18:57, Andrei Alexandrescu wrote:
> Fallback allocators will make it easy to define an allocator on top of a fixed
> array, backed by another allocator when capacity is exceeded. BTW I'm scrambling
> to make std.allocator available for people to look at and experiment with.

Great to hear, I'm looking forward to seeing that. :-)

October 20, 2013
On Sunday, 20 October 2013 at 18:42:06 UTC, Walter Bright wrote:
> If your optimizing compiler is that good, it can optimize "new T[n]" to be on the stack as well.

Just a side note: LDC actually does this if it can prove statically that the size is bounded. Unfortunately, the range detection is rather conservative (unless your allocation size turns out to be a constant due to inlining, LLVM is unlikely to get it).

One idea that might be interesting to think about is to insert a run-time check for the size if an allocation is known not to be escaped, but the size is not yet determined. As a GC allocation is very expensive anyway, this probably wouldn't even be much of a pessimization in the general case.

> I'm not particularly enamored with the compiler inserting silent copying to the heap - D programmers tend to not like such things.

Well, this is exactly what happens with closures, so one could argue that there is precedent. In general, I agree with you, though.

> I use this technique frequently. Allocating a few extra bytes on the stack generally costs nothing unless you're in a recursive function. Of course, if you're in a recursive function, stack allocated dynamic arrays can have unpredictable stack overflow issues.

I also find this pattern to be very useful. The LLVM support libraries even package it up into a nice llvm::SmallVector<T, n> template that allocates space for n elements inside the object, falling back to heap allocation only if that threshold has been exceeded (a tunable small string optimization, if you want).

David
October 20, 2013
Walter Bright:

> If your optimizing compiler is that good, it can optimize "new T[n]" to be on the stack as well.

That's escape analysis, and it returns a failure as soon as you return the array, unless you also analyze the caller, and allocate in the caller stack frame, but this can't be done if the length of the array is computed in the middle of the called function.

From what I've seen escape analysis is not bringing Java close to D performance when you use 3D vectors implemented as small class instances. We need something that guarantees stack allocation if there's enough space on the stack.


> I'm not particularly enamored with the compiler inserting silent copying to the heap - D programmers tend to not like such things.

An alternative solution is to statically require a ".dup" if you want to return one of such arrays (so it becomes a normal dynamic array). This makes the heap allocation visible.

An alternative solution is to copy the data to the stack frame of the caller. But if you do this there are some cases where one of such arrays can't be put (as in the C++ proposals), but this is not too much bad.


> Rust is barely used at all,

Right, there is only an experimental compiler written in it, and little more, like most of the compiler.

On the other hand Ada is around since a lot of time. And in the Ada 2005 standard library they have added bounded containers, so you can even allocate max-sized associative arrays on the stack :-) This shows how they care to not use heap. I think that generally Ada code allocates much less often on the heap compared to the D code.


> Allocating a few extra bytes on the stack generally costs nothing unless you're in a recursive function.

If you over-allocate you are using more stack space than necessary, this means you are moving away from cache-warm parts of the stack to parts that are outside the L1 or L2 cache. This costs you time. Saving stack saves some run-time.

Another problem is that D newbies and normal usage of D tends to stick to the simplest coding patterns. Your coding pattern is bug-prone even for you and it's not what programmers will use in casual D code. Stack allocation of (variable sized) arrays should become much simpler, otherwise most people in most cases will use heap allocation. Such allocation is not silent, but it's not better than the "silent heap allocations" discussed above.


> Of course, if you're in a recursive function, stack allocated dynamic arrays can have unpredictable stack overflow issues.

Unless you are using a segmented stack as Go or Rust.


> The technique I showed is also generally faster than dynamic stack allocation.

Do you have links to benchmarks?

Bye,
bearophile
October 20, 2013
On 10/20/2013 12:15 PM, David Nadlinger wrote:
>> I'm not particularly enamored with the compiler inserting silent copying to
>> the heap - D programmers tend to not like such things.
>
> Well, this is exactly what happens with closures, so one could argue that there
> is precedent.

Not at all. The closure code does not *copy* the data to the heap. It is allocated on the heap to start with.


> I also find this pattern to be very useful. The LLVM support libraries even
> package it up into a nice llvm::SmallVector<T, n> template that allocates space
> for n elements inside the object, falling back to heap allocation only if that
> threshold has been exceeded (a tunable small string optimization, if you want).

Nice!

October 20, 2013
On 10/20/2013 12:23 PM, bearophile wrote:
> Walter Bright:
>
>> If your optimizing compiler is that good, it can optimize "new T[n]" to be on
>> the stack as well.
>
> That's escape analysis,

Yes, I know :-)


> and it returns a failure as soon as you return the
> array, unless you also analyze the caller, and allocate in the caller stack
> frame, but this can't be done if the length of the array is computed in the
> middle of the called function.

Yes. I know you don't believe me :-) but I am familiar with data flow analysis and what it can achieve.


> Another problem is that D newbies and normal usage of D tends to stick to the
> simplest coding patterns. Your coding pattern is bug-prone even for you

I haven't had bugs with my usage of it.


> and it's not what programmers will use in casual D code. Stack allocation of (variable
> sized) arrays should become much simpler, otherwise most people in most cases
> will use heap allocation. Such allocation is not silent, but it's not better
> than the "silent heap allocations" discussed above.
>
>
>> Of course, if you're in a recursive function, stack allocated dynamic arrays
>> can have unpredictable stack overflow issues.
> Unless you are using a segmented stack as Go or Rust.

Segmented stacks have performance problems and do not interface easily with C functions. Go is not known for high performance execution, and we'll see about Rust.


>> The technique I showed is also generally faster than dynamic stack allocation.
> Do you have links to benchmarks?

No. But I do know that alloca() causes pessimizations in the code generation, and it costs many instructions to execute. Allocating fixed size things on the stack executes zero instructions.
October 20, 2013
On Sunday, 20 October 2013 at 19:42:29 UTC, Walter Bright wrote:
> On 10/20/2013 12:23 PM, bearophile wrote:
>> Walter Bright:
>>
>
> No. But I do know that alloca() causes pessimizations in the code generation, and it costs many instructions to execute. Allocating fixed size things on the stack executes zero instructions.

1) Alloca allows allocating in the parent context, which is guaranteed to elide copying, without relying on a "sufficiently smart compiler".

ref E stalloc(E)(ref E mem = *(cast(E*)alloca(E.sizeof)))
{
  return mem;
}

2) If only accessing the previous function parameter was supported(which is just an arbitrary restriction), it would be sufficient to create a helper-function to implement VLA:s.

3) Your "fixed size stack allocation" could be combined with alloca also, in which case it likely would be faster still.
October 21, 2013
On Sunday, October 20, 2013 09:33:36 Walter Bright wrote:
> On 10/20/2013 7:25 AM, bearophile wrote:
> > More discussions about variable-sized stack-allocated arrays in C++, it seems there is no yet a consensus:
> > 
> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3810.pdf
> > 
> > I'd like variable-sized stack-allocated arrays in D.
> 
> They're far more trouble than they're worth.
> 
> Just use:
> 
>      auto a = new T[n];
> 
> Stack allocated arrays are far more trouble than they're worth. But what about efficiency? Here's what I often do something along the lines of:
> 
>      T[10] tmp;
>      T[] a;
>      if (n <= 10)
> 	a = tmp[0..n];
>      else
> 	a = new T[n];
>      scope (exit) if (a != tmp) delete a;
> 
> The size of the static array is selected so the dynamic allocation is almost never necessary.

If that paradigm is frequent enough, it might be worth wrapping it in a struct. Then, you'd probably get something like

StaticArray!(int, 10) tmp(n);
int[] a = tmp[];

which used T[10] if n was 10 or less and allocated T[] otherwise. The destructor could then deal with freeing the memory.

- Jonathan M Davis
October 21, 2013
On 10/20/2013 5:59 PM, Jonathan M Davis wrote:
> If that paradigm is frequent enough, it might be worth wrapping it in a
> struct. Then, you'd probably get something like
>
> StaticArray!(int, 10) tmp(n);
> int[] a = tmp[];
>
> which used T[10] if n was 10 or less and allocated T[] otherwise. The
> destructor could then deal with freeing the memory.

Sounds like a good idea - and it should fit in with Andrei's nascent allocator design.

October 21, 2013
20.10.2013 18:25, bearophile пишет:
> More discussions about variable-sized stack-allocated arrays in C++, it
> seems there is no yet a consensus:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3810.pdf
>
> I'd like variable-sized stack-allocated arrays in D.

I'd say the most common case one need a stack-allocated array is a temporary allocation which isn't going to survive end of scope. Even more in such cases for too large for stack data one want to allocate from thread local heap instead of shared one to prevent needless locking. `unstd.memory.allocation.tempAlloc` [1] will do the job. As the one of the most common subcases is a temporary C string creation
`unstd.c.string.tempCString` will help here.

[1] http://denis-sh.bitbucket.org/unstandard/unstd.memory.allocation.html#tempAlloc
[2] http://denis-sh.bitbucket.org/unstandard/unstd.c.string.html#tempCString

-- 
Денис В. Шеломовский
Denis V. Shelomovskij