September 24, 2013
On Tuesday, 24 September 2013 at 17:38:34 UTC, Dan Schatzberg wrote:
> What is your objective though? Aren't you trying to define a hierarchy of allocators where more specific allocators can be composed from general ones? In which case what is the concern with including locality at the base level? It seems to be one characteristic of memory that programmers might be concerned with and rather trivially you can compose a non-locality aware allocator from a locality aware allocator.

I'll elaborate on this point a bit just to make sure I'm clear. I understand your initial post here to be asking if the allocator interface you designed is comprehensive enough that all other allocators (within reason) could be constructed from it.

Whether or not it is inefficient to pass alignment, or locality domain arguments to an allocate call seems orthogonal to whether or not one can construct an allocator which does not take such arguments from one that does. As I'm sure your original design is intended for, one can imagine allocators which have a fixed size (constructed from the original allocator you proposed). The allocate path can then very efficiently hand off memory from a free-list or some other efficient structure. One can make analogous allocators that do not allow for dynamic alignment or NUMA locality.

The only reasonable argument against seems to be whether or not the flexibility is worth the semantic overhead of a larger hierarchy which is a completely reasonable and subjective argument to be had.
September 24, 2013

On 24.09.2013 10:46, Dmitry Olshansky wrote:
>
>> * expand(b, minDelta, maxDelta) grows b's length by at least minDelta
>> (and on a best-effort basis by at least maxDelta) and returns true, or
>> does nothing and returns false. In most allocators this should be @safe.
>> (One key insight is that expand() can be made @safe whereas shrink() or
>> realloc() are most often not; such mini-epiphanies are very exciting
>> because they put the design on a beam guide with few principles and many
>> consequences.) The precondition is that b is null or has been previously
>> returned by allocate(). This method is optional.
>
> Great to see this primitive. Classic malloc-ators are so lame...
> (+ WinAPI Heaps fits here)

expand is nice, but I don't like minDelta and maxDelta as arguments. On a shared allocator this can lead to undesired side-effects, i.e. when this function is called concurrently on the same block by two threads, the net result will be the sum of the deltas of the two threads, while later synchronization on the memory block might only allow one thread to actually use the extended memory area. This can happen with gc_extend in the current GC when appending to a shared array.

I'd prefer the function to actually pass the desired sizes:

void[] expand(void[] b, size_t minSize, size_t maxSize);
September 24, 2013
Hi,

I mostly just lurk around here, but occasionally I just can't resist putting in my two cents.  I really want to see D replace C++ for AAA games (my industry) and allocators are really critical to that.  I think there's an elephant here that most of the posts have been dancing around.

In C++ when you roll your own allocators (whether STL allocators or some other interface) there is basically, only one set of *semantics* to worry about - the classic C++/C pattern of new/delete or malloc/free.  This is actually much more complicated in D where you have at least two different semantics:

1) C++/C style
2) Tracing/Garbage Collection
3) Region allocators and other oddballs

Consequences:
1) These semantics aren't really interchangeable.  If, for instance, a library is implemented with one of them in mind, it can only be replaced by an allocator with the same semantics.

2) Tracing/Garbage Collection are required for whatever the default allocator is and replacement allocator must implement those semantics.

3) Its only possible to change (2) by hacking the runtime so that any language features that rely on the GC cause errors.

The design Andrei has presents seems to *allow* the creation of allocators with all of these semantics, but it doesn't seem to answer the following questions:

1) Can semantics be enforced?
2) Can allocators with different semantics be composed?
3) Can different semantics for the default allocator be implemented?
4) If so, how can I reconcile that with libraries that expect different default semantics?  A use case that I foresee for something like this would be banning the use of GC in low level engine code, but allowing it in higher level gameplay code.

5) Are any of these questions even relevant for this part of the design or will we have to wait for the rest of it to know the answers?

Thanks.



September 24, 2013
24-Sep-2013 22:28, Rainer Schuetze пишет:

> On 24.09.2013 10:46, Dmitry Olshansky wrote:
>>
>>> * expand(b, minDelta, maxDelta) grows b's length by at least minDelta
>>> (and on a best-effort basis by at least maxDelta) and returns true, or
>>> does nothing and returns false. In most allocators this should be @safe.
>>> (One key insight is that expand() can be made @safe whereas shrink() or
>>> realloc() are most often not; such mini-epiphanies are very exciting
>>> because they put the design on a beam guide with few principles and many
>>> consequences.) The precondition is that b is null or has been previously
>>> returned by allocate(). This method is optional.
>>
>> Great to see this primitive. Classic malloc-ators are so lame...
>> (+ WinAPI Heaps fits here)
>
> expand is nice, but I don't like minDelta and maxDelta as arguments. On
> a shared allocator this can lead to undesired side-effects, i.e. when
> this function is called concurrently on the same block by two threads,
> the net result will be the sum of the deltas of the two threads, while
> later synchronization on the memory block might only allow one thread to
> actually use the extended memory area. This can happen with gc_extend in
> the current GC when appending to a shared array.
>
> I'd prefer the function to actually pass the desired sizes:
>
> void[] expand(void[] b, size_t minSize, size_t maxSize);

+1


-- 
Dmitry Olshansky
September 24, 2013
On 9/24/13 11:02 AM, Peter Alexander wrote:
> On Tuesday, 24 September 2013 at 17:02:18 UTC, Andrei Alexandrescu wrote:
>> On 9/24/13 9:58 AM, Peter Alexander wrote:
>>> On Tuesday, 24 September 2013 at 15:25:11 UTC, Andrei Alexandrescu
>>> wrote:
>>>>> What are they paying exactly? An extra arg to allocate that can
>>>>> probably
>>>>> be defaulted?
>>>>>  void[] allocate(size_t bytes, size_t align = this.alignment) shared;
>>>>
>>>> For allocating relatively small objects (say up to 32K), we're looking
>>>> at tens of cycles, no more. An extra argument needs to be passed
>>>> around and more importantly looked at and acted upon. At this level
>>>> it's a serious dent in the time budget.
>>>
>>> The cost of a few cycles really doesn't matter for memory allocation...
>>> If you are really allocating memory so frequently that those few extra
>>> cycles matter then you are probably going to be memory bound anyway.
>>
>> It does. I'm not even going to argue this.
>
> Sorry but I find this insulting.

Apologies, you're right. I was bummed straight after having sent that all-too-glib message.

> Myself and Manu, both professional and
> senior game developers with a lot of experience in performance are both
> arguing against you. I'm not saying this makes us automatically right,
> but I think it's rude to dismiss our concerns as not even worthy of
> discussion.

This is not an argument "against me" - I'm looking at ways to address alignment concerns.

There's a larger issue at work: certain special allocator APIs are sensible but are unneeded for composition. The focus here is to provide enough primitives that allow composing larger allocators out of smaller components. A top-level specialized allocator may implement some or all of the discussed API, plus a bunch of other specialized functions.

>>> I think this is a situation where you need to justify yourself with
>>> something concrete. Can you provide an example of some code whose
>>> performance is significantly impacted by the addition of an alignment
>>> parameter? It has to be "real code" that does something useful, not just
>>> a loop the continually calls allocate.
>>
>> Strings.
>
> Strings what? Just allocating lots of small strings?
>
> Ok, I've put together a benchmark of the simplest allocator I can think
> of (pointer bump) doing *nothing* but allocating 12 bytes at a time and
> copying a pre-defined string into the allocated memory:
> http://dpaste.dzfl.pl/59636d82
>
> On my machine, the difference between the version with alignment and the
> version without 1%. I tried changing the allocator to a class so that
> the allocation was virtual and not inlined, and the difference was still
> only ~2% (Yes, I verified in the generated code that nothing was being
> omitted).
>
> In a real scenario, much more will be going on outside the allocator,
> making the overhead much less than 1%.
>
> Please let me know if you take issue with the benchmark. I wrote this
> quickly so hopefully I have not made any mistakes.

There is really no need for a benchmark, for at least two reasons. First, people _will_ do cycle counting, which _will_ influence the bottom line. I work with a dozen of such. I understand it doesn't make a difference for you, Manu, a lot of game developers, and a bunch of others, but I know for a fact it _does_ make a difference for an important category of program(mer)?s.

Second, there's no need for a defaulted argument; the aligned allocation can be an optional overload of the one-argument function. I'm looking into ways to compose with that overload.


Andrei

September 24, 2013
On 9/24/13 11:28 AM, Rainer Schuetze wrote:
>
>
> On 24.09.2013 10:46, Dmitry Olshansky wrote:
>>
>>> * expand(b, minDelta, maxDelta) grows b's length by at least minDelta
>>> (and on a best-effort basis by at least maxDelta) and returns true, or
>>> does nothing and returns false. In most allocators this should be @safe.
>>> (One key insight is that expand() can be made @safe whereas shrink() or
>>> realloc() are most often not; such mini-epiphanies are very exciting
>>> because they put the design on a beam guide with few principles and many
>>> consequences.) The precondition is that b is null or has been previously
>>> returned by allocate(). This method is optional.
>>
>> Great to see this primitive. Classic malloc-ators are so lame...
>> (+ WinAPI Heaps fits here)
>
> expand is nice, but I don't like minDelta and maxDelta as arguments. On
> a shared allocator this can lead to undesired side-effects, i.e. when
> this function is called concurrently on the same block by two threads,
> the net result will be the sum of the deltas of the two threads, while
> later synchronization on the memory block might only allow one thread to
> actually use the extended memory area. This can happen with gc_extend in
> the current GC when appending to a shared array.
>
> I'd prefer the function to actually pass the desired sizes:
>
> void[] expand(void[] b, size_t minSize, size_t maxSize);

I don't understand this argument. These parameters can be trivially computed from the others, after which locking etc. proceeds just the same.

FWIW I chose the "delta"-based signature because with it the precondition is:

assert(minDelta <= maxDelta);

In fact behavior can be easily defined at no cost even if this precondition is not satisfied. With the "absolute"-based signature the precondition is:

assert(minSize <= maxSize && b.length <= minSize);

which causes confusion - people will pass small sizes to expand() expecting it to contract, something that expand() can't support as a matter of principle (safety!!!).


Andrei

September 24, 2013
24-Sep-2013 20:48, Andrei Alexandrescu пишет:
> On 9/24/13 9:39 AM, Brad Anderson wrote:
>> Somewhat related: http://probablydance.com/2013/05/13/4gb-per-vector/
>
> Great insight, long scroll. Please trim quoting appropriately.
>
> Andrei
>

In fact one may have both deque (just start in the middle) and array that never needs to reallocate as long as paying the cost of reserving a large address space chunk is an option.

More then that the resulting deque has insanely fast index - it's exactly the same as in array (it is an array with plenty of space to grow).

-- 
Dmitry Olshansky
September 24, 2013
24-Sep-2013 23:36, Andrei Alexandrescu пишет:
> On 9/24/13 11:28 AM, Rainer Schuetze wrote:
>>
[snip]

> which causes confusion - people will pass small sizes to expand()
> expecting it to contract, something that expand() can't support as a
> matter of principle (safety!!!).
>

Showstopper. It has to support contraction.

>
> Andrei
>


-- 
Dmitry Olshansky
September 24, 2013
On 9/24/13 12:47 PM, Dmitry Olshansky wrote:
> 24-Sep-2013 23:36, Andrei Alexandrescu пишет:
>> On 9/24/13 11:28 AM, Rainer Schuetze wrote:
>>>
> [snip]
>
>> which causes confusion - people will pass small sizes to expand()
>> expecting it to contract, something that expand() can't support as a
>> matter of principle (safety!!!).
>>
>
> Showstopper. It has to support contraction.

reallocate() is allowed contract. Show restarted :o).

Andrei


September 24, 2013
On 9/24/13 12:49 PM, Andrei Alexandrescu wrote:
> On 9/24/13 12:47 PM, Dmitry Olshansky wrote:
>> 24-Sep-2013 23:36, Andrei Alexandrescu пишет:
>>> On 9/24/13 11:28 AM, Rainer Schuetze wrote:
>>>>
>> [snip]
>>
>>> which causes confusion - people will pass small sizes to expand()
>>> expecting it to contract, something that expand() can't support as a
>>> matter of principle (safety!!!).
>>>
>>
>> Showstopper. It has to support contraction.
>
> reallocate() is allowed contract. Show restarted :o).

s/contract/to contract/

Andrei