On Friday, 6 August 2021 at 12:27:55 UTC, wjoe wrote:
>So in theory...
if we were pre-allocating a chunk of memory on the heap at program startup and used that to store our data (like on a stack), the cost of allocation would have been paid, once we need to use it, and would only have to be done once. "Deallocation" cost for data using this "stack" should be getting pretty close to that of the stack (which is basically a subtraction). Deallocation cost for the block of heap memory on program termination doesn't matter.
In practice the "stack" would probably be closer to a pool and memory management a bit more involved than an addition/subtraction.
A cache line is usually much smaller than L1 at just a few data words. So once the pre-fetcher is set up, and the memory in question is residing in L1, there shouldn't be a difference anymore.
Therefore I would reason that utilizing cache line bandwidth efficiently is important and whether the data resides on the heap or stack is secondary (i.e. what a struct (doesn't) contain is more important than where it's stored).
The thing is, you are already forced to use the "real" stack if you want to take advantage of the CPU's call
and ret
instructions. Since you're going to be accessing that region of memory anyway, you might as well store your other "hot" data nearby.
That said, there are languages that use separate stacks for return addresses and parameter passing, like Forth. Doing so can be useful on embedded systems with highly restrictive stack-size limits, since it allows you to keep the size of the "real" stack small.