Thread overview
Memory allocation
Feb 23
David
Feb 24
David
February 23
Not sure if `learn` is the right topic or not to post this..

I've been going through Bob Nystrom's "Crafting Interpreters" for a bit of fun and over the weekend put together a toy allocator in D - free and gc not yet done. It's single threaded and unsurprisingly faster than malloc for small objects up to 512 bytes, and about the same for larger objects (simulated with a linear distribution of size requests).

But I'd like to be able to benchmark it to see how I'm doing.

A colleague suggested I ask here for resources and pointers - I was trying to find stats on malloc performance (internal and external fragmentation, long running reuse, distributions of size requests for testing and comparison, how to reason about trade-offs of cache missing, branch misprediction, meta data, etc).

So academic papers, books, blogs, experiences are all welcome, especially with regard to optimising for immutability / COW - would be welcome or even just some good ol' conversation.



February 24
On Tuesday, 23 February 2021 at 19:44:39 UTC, David wrote:
> Not sure if `learn` is the right topic or not to post this..
>
> I've been going through Bob Nystrom's "Crafting Interpreters" for a bit of fun and over the weekend put together a toy allocator in D - free and gc not yet done. It's single threaded and unsurprisingly faster than malloc for small objects up to 512 bytes, and about the same for larger objects (simulated with a linear distribution of size requests).
>
> [...]

Have you looked at the "experimental" allocator(s) in D?

std.experimental.allocator
February 24
On Wednesday, 24 February 2021 at 06:14:58 UTC, Imperatorn wrote:
> On Tuesday, 23 February 2021 at 19:44:39 UTC, David wrote:
>> Not sure if `learn` is the right topic or not to post this..
>>
>> I've been going through Bob Nystrom's "Crafting Interpreters" for a bit of fun and over the weekend put together a toy allocator in D - free and gc not yet done. It's single threaded and unsurprisingly faster than malloc for small objects up to 512 bytes, and about the same for larger objects (simulated with a linear distribution of size requests).
>>
>> [...]
>
> Have you looked at the "experimental" allocator(s) in D?
>
> std.experimental.allocator

Hi, yes, good question, yes I did - my first cut using the mmap_allocator was slower - I presume because I should really allocate a good chunk of virtual memory and manage the pages myself.

So some questions I'm asking myself -

What people's heuristics for switching methods say from, some sort of pool to another approach - e.g. linked list of pages? At the page level does a bitmap allocator make more sense?

Knuth (it seems) was under the impression that a third of memory is wasted by fragmentation, aka the 50% rule - that doesn't seem likely, does anyone measure these things?

How does one go about choosing a set of pool sizes?

Given that virtual pages are not necessarily physically contiguous are some of the considerations that inspired the buddy system no longer relevant.

I should also say I'm aiming for a single thread longish running process (one +  working days) that I anticipate could use a good chunk say 100GB or so on occasion but I'd like to be more efficient than java in terms of memory reuse - past experience was I had to reboot my workstation a few times a day as it did something really horrible (I don't know what).


Any there any GC frameworks that someone has shared in D?