general questions on reference types versus value types... (page 2)

December 01, 2014

Re: general questions on reference types versus value types...

Posted by H. S. Teoh
in reply to Suliman

Permalink

H. S. Teoh

Posted in reply to Suliman

Permalink

On Mon, Dec 01, 2014 at 08:22:59PM +0000, Suliman via Digitalmars-d-learn wrote:
> Could anybody explain why there is opinion that stack is fast and the heap is slow. All of them are located in the same memory. So the access time should be equal.

That may be true 15 years ago, it's not true today with multilevel CPU caches. The stack is basically always "hot" in the CPU cache because it's very frequently accessed (function calls, function returns, temporaries, local variables, etc.), so accessing stuff on the stack almost always hits the cache. Accessing stuff on the heap is likely to incur a cache miss, so the CPU has to go and fetch it from RAM (sloooow).

Not to mention, allocating stuff on the heap incurs a lot of overhead to keep track of which parts of memory is in use or free, whereas allocating stuff on the stack is just bumping a pointer (and compilers usually combine several stack allocations into a single instruction that allocates all of them at once -- optimizers will even elide pointer bumps if the total required size for local variables is already known in advance and the end of the allocated region doesn't need to be used -- e.g. the function never calls another function). Also, allocating on the heap generates garbage for the GC to collect.

Also, heap-allocated objects tend to require pointer dereferences to access, which means higher chance of CPU cache miss (the pointer and the target data may be in two different RAM pages, so the CPU may have to do the RAM roundtrip *twice* -- this is esp. true for virtual function calls).


T

-- 
Живёшь только однажды.

On Monday, 1 December 2014 at 20:23:00 UTC, Suliman wrote: > Could anybody explain why there is opinion that stack is fast and the heap is slow. All of them are located in the same memory. So the access time should be equal. Yes, the problem is that if you load from a memory area (of 64 bytes) that has not been visisted in a while then the cpu will have to wait for approximately the same time it takes to execute 100 instructions. So if you touch a 64 byte block every 20 instructions that needs to be loaded from RAM you end up with the CPU being 80% underutilized… The throughput is actually quite nice, but the latency is the problem. It is possible to tell the cpu to prefetch new areas of memory ahead of time, but then you have to do it 100 instructions before you actually access it… Which is hard to get right and might be slower if the memory was already loaded into the caches anyway. On some CPUs the prefetch instructions can be counterproductive…

Forums