I mentioned in the monthly meeting how I would like to see a more convenient way to return variable-sized data to the stack in D. Walter mentioned that he wouldn't like to break the C ABI, which is understandable, but you can certainly make this work without a different ABI. In fact, you can even return variable-sized data to the stack in C:
#include <alloca.h>
#include <stdio.h>
struct A{ int a,b,c,d,e,f,g; };
struct B{ int a,b,c; };
int myFnRetSize(int n){ return n == 0 ? sizeof(struct A) : sizeof(struct B); }
void myFn(void* mem, int n){
if(n == 0){
struct A a = {1,2,3,4,5,6,7};
*((struct A*)mem) = a;
}else{
struct B b = {1,2,7};
*((struct B*)mem) = b;
}
}
int main(){
int n = 1; //<—— can be any number
int size = myFnRetSize(n);
void* mem = alloca(size);
myFn(mem, n);
//write out the result:
for(int i=0; i<size/sizeof(int); i++){
printf("%d ", ((int*)mem)[i]);
}
printf("\n");
return 0;
}
Sorry if the code is terrible, but hopefully it demonstrates my point adequately. You might say that this is technically returning by reference, but at the machine-code level all stack access is done via pointers.
You might be wondering: what the point of having this feature would even be?
Well, unions always take as much space as their largest member. If a union contains a struct that's (for example) 512 bytes large, it will always take 512 bytes, when really we might only need to store a 4–8 byte number most of the time. With sumtypes, variable-sized stack returns could greatly optimise their stack consumption in cases where they have vastly different type sizes, with the smaller types being used most frequently.
Some might say reference types should be used for such a purpose; but when you're programming a largely data-driven system that uses masses of structs, the sheer amount of heap allocations could become a huge performance bottleneck, whereas stack allocation is practically instant. Of course, you can always pre-allocate a huge amount of data onto the stack, but then a lot of it will go to waste and your code will be more vulnerable to stack-overflows.
A way of making variable-sized stack returns less cumbersome in D would be to have some syntactic sugar that works something like this:
struct A{ int a,b,c,d,e,f,g; }
struct B{ int a,b,c; }
@stackArrayReturn myFn(int n){
auto nSqr = n * n; //demonstrate how variables can affect the return value
auto condition = n * n;
if(condition == 0){
return A(nSqr+1,2,3,4,5,6,7);
}else{
return B(nSqr,2,7);
}
}
void main(){
int n = 1; //<—— can be any number
void[] myMem = myFn(n);
}
Which gets lowered to this:
import std.typecons;
struct A{ int a,b,c,d,e,f,g; }
struct B{ int a,b,c; }
size_t myFn(out return scope void function(void[] memory, Tuple!(int,"nSqr") context) callback, out Tuple!(int,"nSqr") context, int n){
auto nSqr = n * n;
auto condition = n * n;
if(condition == 0){
context = Tuple!(int,"nSqr")(nSqr);
callback = (void[] m, Tuple!(int,"nSqr") ctx){
*cast(A*)&m[0] = A(ctx.nSqr+1,2,3,4,5,6,7);
};
return A.sizeof;
}else{
context = Tuple!(int,"nSqr")(nSqr);
callback = (void[] m, Tuple!(int,"nSqr") ctx){
*cast(B*)&m[0] = B(ctx.nSqr,2,7);
};
return B.sizeof;
}
}
void main(){
int n = 1; //<—— can be any number
void[] myMem;
{
scope void function(void[] memory, Tuple!(int,"nSqr") context) __returnCallback;
Tuple!(int,"nSqr") __returnContext;
size_t __returnSize = myFn(__returnCallback, __returnContext, n);
import core.stdc.stdlib: alloca;
myMem = alloca(__returnSize)[0..__returnSize];
__returnCallback(myMem, __returnContext);
}
}
This is an example of returning one of two different struct types, but this could also be used to return slices of any type (e.g. int[]
).
Ideally there would be a nice way to do this with scoped delegates, but alloca
will re-allocate the data that their context pointer points to. Another less stack-wasteful (albeit potentially significantly more CPU-wasteful) method would be to have two completely separate functions. One that determines the return size, and another that does all the other logic. However, this method would either limit the function's structure significantly, or generate wasteful code.
I would love to hear feedback and suggestions for improvements, or of other possible implementations for variadic stack returns.