September 23, 2013
On Monday, 23 September 2013 at 15:45:25 UTC, Andrei Alexandrescu wrote:
> On 9/23/13 7:22 AM, ponce wrote:
>> Great news! It looks like a big improvement on akward C++ allocators.
>>
>> (For what it's worth I have a working implementation of aligned
>> malloc/free/realloc here
>> https://github.com/p0nce/gfm/blob/master/gfm/core/memory.d, which can be
>> the basis for an allocator layered upon another)
>
> I gave this a read, nice work.
>
> One question: what circumstances require run-time alignment values, and what values would those be? I'm currently under the assumption that alignments are known during compilation.
>
>
> Thanks,
>
> Andrei

I don't know of a use case for run-time alignment values.
September 23, 2013
On 23 September 2013 23:58, Andrei Alexandrescu < SeeWebsiteForEmail@erdani.org> wrote:

> On 9/22/13 9:03 PM, Manu wrote:
>
>> On 23 September 2013 12:28, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org <mailto:SeeWebsiteForEmail@**erdani.org<SeeWebsiteForEmail@erdani.org>
>> >>
>>
>> wrote:
>>     My design makes it very easy to experiment by allowing one to define
>>     complex allocators out of a few simple building blocks. It is not a
>>     general-purpose allocator, but it allows one to define any number of
>>     such.
>>
>> Oh okay, so this isn't really intended as a system then, so much a suggested API?
>>
>
> For some definition of "system" and "API", yes :o).
>
>
>  That makes almost all my questions redundant. I'm interested in the
>> system, not the API of a single allocator (although your API looks fine
>> to me).
>> I already have allocators I use in my own code. Naturally, they don't
>> inter-operate with anything, and that's what I thought std.allocator was
>> meant to address.
>>
>
> Great. Do you have a couple of nontrivial allocators (heap, buddy system etc) that could be adapted to the described API?
>

Err, not really actually. When I use custom allocator's, it's for performance, which basically implies that it IS a trivial allocator :) The common ones I use are: stack-based mark&release, circular buffers, pools, pool groups (collection of different sized pools)... that might be it actually. Very simple tools for different purposes.

     The proposed design makes it easy to create allocator objects. How
>>     they are used and combined is left to the application.
>>
>> Is that the intended limit of std.allocator's responsibility, or will patterns come later?
>>
>
> Some higher level design will come later. I'm not sure whether or not you'll find it satisfying, for reasons I'll expand on below.
>
>
>  Leaving the usage up to the application means we've gained nothing.
>> I already have more than enough allocators which I use throughout my
>> code. The problem is that they don't inter-operate, and certainly not
>> with foreign code/libraries.
>> This is what I hoped std.allocator would address.
>>
>
> Again, if you already have many allocators, please let me know if you can share some.
>
> std.allocator will prescribe a standard for defining allocators, with which the rest of std will work, same as std.range prescribes a standard for defining ranges, with which std.algorithm, std.format, and other modules work. Clearly one could come back with "but I already have my own ranges that use first/done/next instead of front/empty/popFront, so I'm not sure what we're gaining here".
>

No, it's just that I'm saying std.allocator needs to do a lot more than define a contract before I can start to consider if it solves my problems. This is a good first step though, I'm happy to discuss this, but I think discussion about the practical application may also reveal design details at this level.

It's like you say, I can rename my allocator's methods to suit an agreed standard, that'll take me 2 minutes, but it's how the rest of the universe interacts with that API that matters, and if it effectively solves my problems.

     An allocator instance is a variable like any other. So you use the
>>     classic techniques (shared globals, thread-local globals, passing
>>     around as parameter) for using the same allocator object from
>>     multiple places.
>>
>>
>> Okay, that's fine... but this sort of manual management implies that I'm using it explicitly. That's where it all falls down for me.
>>
>
> I think a disconnect here is that you think "it" where I think "them". It's natural for an application to use one allocator that's not provided by the standard library, and it's often the case that an application defines and uses _several_ allocators for different parts of it. Then the natural question arises, how to deal with these allocators, pass them around, etc. etc.


No, I certainly understand you mean 'them', but you lead to what I'm
asking, how do these things get carried/passed around. Are they discreet,
or will they invade argument lists everywhere? Are they free to flow in/out
of libraries in a natural way?
These patterns are what will define the system as I see it.
Perhaps more importantly, where do these allocators get their memory
themselves (if they're not a bottom-level allocator)? Global override
perhaps, or should a memory source always be explicitly provided to a
non-bottom-level allocator?

 Eg, I want to use a library, it's allocation patterns are incompatible
>> with my application; I need to provide it with an allocator.
>> What now? Is every library responsible for presenting the user with a
>> mechanism for providing allocators? What if the author forgets? (a
>> problem I've frequently had to chase up in the past when dealing with
>> 3rd party libraries)
>>
>
> If the author forgets and hardcodes a library to use malloc(), I have no
> way around that.


Sure, but the common case is that the author will almost certainly use
keyword 'new'. How can I affect that as a 3rd party?
This would require me overriding the global allocator somehow... which you
touched on earlier.

 Once a library is designed to expect a user to supply an allocator, what
>> happens if the user doesn't? Fall-back logic/boilerplate exists in every library I guess...
>>
>
> The library wouldn't need to worry as there would be the notion of a default allocator (probably backed by the existing GC).


Right. So it's looking like like the ability to override the global allocator is a critical requirement.

 And does that mean that applications+libraries are required to ALWAYS
>> allocate through given allocator objects?
>>
>
> Yes, they should.


Then we make keyword 'new' redundant?

 That effectively makes the new keyword redundant.
>>
>
> new will still be used to tap into the global shared GC. std.allocator will provide other means of allocating memory.


I think the system will fail here. People will use 'new', siomply because
it's a keyword. Once that's boxed in a library, I will no longer be able to
affect that inconsiderate behaviour from my application.
Again, I think this signals that a global override is necessary.

 And what about the GC?
>>
>
> The current global GC is unaffected for the time being.
>
>
>  I can't really consider std.allocator intil it presents some usage
>> patterns.
>>
>
> Then you'd need to wait a little bit.
>

Okay.

         It wasn't clear to me from your demonstration, but 'collect()'
>>         implies
>>         that GC becomes allocator-aware; how does that work?
>>
>>
>>     No, each allocator has its own means of dealing with memory. One
>>     could define a tracing allocator independent of the global GC.
>>
>>
>> I'm not sure what this means. Other than I gather that the GC and allocators are fundamentally separate?
>>
>
> Yes, they'd be distinct. Imagine an allocator that requests 4 MB from the GC as NO_SCAN memory, and then does its own management inside that block. User-level code allocates and frees e.g. strings or whatever from that block, without the global GC intervening.


Yup, that's fine. But what if the GC isn't the bottom level? There's just
another allocator underneath.
What I'm saying is, the GC should *be* an allocator, not be a separate
entity.

I want to eliminate the GC from my application. Ideally, in the future, it can be replaced with an ARC, which I have become convinced is the right choice for my work.

 Is it possible to create a tracing allocator without language support?
>>
>
> I think it is possible.
>
>
>  Does the current language insert any runtime calls to support the GC?
>>
>
> Aside from operator new, I don't think so.


Okay, so a flexible lowering of 'new' is all we need for now?
It will certainly need substantially more language support for ARC.

 I want a ref-counting GC for instance to replace the existing GC, but
>> it's impossible to implement one of them nicely without support from the language, to insert implicit inc/dec ref calls all over the place, and to optimise away redundant inc/dec sequences.
>>
>
> Unfortunately that's a chymera I had to abandon, at least at this level.


And there's the part you said I'm not going to like? ;)

The problem is that installing an allocator does not get to define what a
> pointer is and what a reference is.


Why not? A pointer has a type, like anything else. An ARC pointer can
theoretically have the compiler insert ARC magic.
That does imply though that the allocator affects the type, which I don't
like... I'll think on it.

These are notions hardwired into the language, so the notion of turning a
> switch and replacing the global GC with a reference counting scheme is impossible at the level of a library API.
>

Indeed it is. So is this API being built upon an incomplete foundation? Is there something missing, and can it be added later, or will this design cement some details that might need changing in the future? (we all know potentially breaking changes like that will never actually happen)

(As an aside, you still need tracing for collecting cycles in a transparent
> reference counting scheme, so it's not all roses.)
>

It's true, but it's possible to explicitly control all those factors. It remains deterministic.

What I do hope to get to is to have allocators define their own pointers
> and reference types. User code that uses those will be guaranteed certain allocation behaviors.


Interesting, will this mangle the pointer type, or the object type being pointed to? The latter is obviously not desirable. Does the former actually work in theory?

 I can easily define an allocator to use in my own code if it's entirely
>> up to me how I use it, but that completely defeats the purpose of this exercise.
>>
>
> It doesn't. As long as the standard prescribes ONE specific API for defining untyped allocators, if you define your own to satisfy that API, then you'll be able to use your allocator with e.g. std.container, just the same as defining your own range as std.range requires allows you to tap into std.algorithm.


I realise this. That's all fine.

 Until there aren't standard usage patterns, practises, conventions that
>> ALL code follows, then we have nothing. I was hoping to hear your thoughts about those details.
>>
>
>
>
>          It's quite an additional burden of resources and management to
>>         manage
>>         the individual allocations with a range allocator above what is
>>         supposed
>>         to be a performance critical allocator to begin with.
>>
>>
>>     I don't understand this.
>>
>>
>> It's irrelevant here.
>> But fwiw, in relation to the prior point about block-freeing a range
>> allocation;
>>
>
> What is a "range allocation"?
>
>
>  there will be many *typed* allocations within these ranges,
>> but a typical range allocator doesn't keep track of the allocations within.
>>
>
> Do you mean s/range/region/?


Yes.

 This seems like a common problem that may or may not want to be
>> addressed in std.allocator.
>> If the answer is simply "your range allocator should keep track of the
>> offsets of allocations, and their types", then fine. But that seems like
>> boilerplate that could be automated, or maybe there is a
>> different/separate system for such tracking?
>>
>
> If you meant region, then yes that's boilerplate that hopefully will be reasonably automated by std.allocator. (What I discussed so far predates that stage of the design.)
>
>          C++'s design seems reasonable in some ways, but history has
>>         demonstrated
>>         that it's a total failure, which is almost never actually used
>> (I've
>>         certainly never seen anyone use it).
>>
>>
>>     Agreed. I've seen some uses of it that quite fall within the notion
>>     of the proverbial exception that prove the rule.
>>
>>
>> I think the main fail of C++'s design is that it mangles the type.
>> I don't think a type should be defined by the way it's memory is
>> allocated, especially since that could change from application to
>> application, or even call to call. For my money, that's the fundamental
>> flaw in C++'s design.
>>
>
> This is not a flaw as much as an engineering choice with advantages and disadvantages on the relative merits of which reasonable people may disagree.
>
> There are two _fundamental_ flaws of the C++ allocator design, in the sense that they are very difficult to argue in favor of and relatively easy to argue against:
>
> 1. Allocators are parameterized by type; instead, individual allocations should be parameterized by type.
>
> 2. There is no appropriate handling for allocators with state.
>
> The proposed std.allocator design deals with (2) with care, and will deal
> with (1) when it gets to typed allocators.


Fair enough. These are certainly more critical mistakes than the one I
raised.
I'm trying to remember the details of the practical failures I ran into
trying to use C++ allocators years ago.
Eventually, experience proved to us (myself and colleagues) that it wasn't
worth the mess, and we simply pursued a more direct solution. I've heard
similar stories from friends in other companies...
I need to try and recall the specific scenarios though, they might be
interesting :/ .. (going back the better part of a decade >_<)

 Well as an atom, as you say, it seems like a good first step.
>> I can't see any obvious issues, although I don't think I quite understand the collect() function if it has no relation to the GC. What is it's purpose?
>>
>
> At this point collect() is only implemented by the global GC. It is possible I'll drop it from the final design. However, it's also possible that collect() will be properly defined as "collect all objects allocated within this particular allocator that are not referred from any objects also allocated within this allocator". I think that's a useful definition.


Perhaps. I'm not sure how this situation arises though. Unless you've managed to implement your own GC inside an allocator.

 If the idea is that you might implement some sort of tracking heap which
>> is able to perform a collect, how is that actually practical without language support?
>>
>
> Language support would be needed for things like scanning the stack and the globals. But one can gainfully use a heap with semantics as described just above, which requires no language support.
>
>
>  I had imagined going into this that, like the range interface which the
>> _language_ understands and interacts with, the allocator interface would be the same, ie, the language would understand this API and integrate it with 'new', and the GC... somehow.
>>
>
> The D language has no idea what a range is. The notion is completely defined in std.range.
>
>
>  If allocators are just an object like in C++ that people may or may not
>> use, I don't think it'll succeed as a system. I reckon it needs deep language integration to be truly useful.
>>
>
> I guess that's to be seen.


I think a critical detail to keep in mind, is that (I suspect) people
simply won't use it if it doesn't interface with keyword 'new'.
It also complicates generic code, and makes it more difficult to retrofit
an allocator where 'new' is already in use.

 The key problem to solve is the friction between different libraries,
>> and different moments within a single application its self.
>> I feel almost like the 'current' allocator needs to be managed as some
>> sort of state-machine. Passing them manually down the callstack is no
>> good. And 'hard' binding objects to their allocators like C++ is no good
>> either.
>>
>
> I think it's understood that if a library chooses its own ways to allocate memory, there's no way around that.


Are we talking here about explicit choice for sourcing memory, or just that the library allocates through the default/GC?

This is the case where I like to distinguish a bottom-level allocator from
a high-level allocator.
A library probably wants to use some patterns for allocation of it's
object, these are high-level allocators, but where it sources it's memory
from still needs to be overridable.
It's extremely common that I want to enforce that a library exist entirely
within a designated heap. It can't fall back to the global GC.

I work on platforms where memory is not unified. Different resources need
to go into different heaps.
It has happened on numerous occasions that we have been denied a useful
library simply because the author did not provide allocation hooks, and the
author was not responsive to requests... leading to my favourite scenario
of re-inventing yet another wheel (the story of my career).
It shouldn't be the case that the author has to manually account for the
possibility that someone might want to provide a heap for the libraries
resources.

This is equally true for the filesystem (a gripe I haven't really raised in
D yet).

The point of std.allocator is that it defines a common interface that user
> code can work with.
>
>
> Andrei
>
>


September 23, 2013
Am Tue, 24 Sep 2013 00:50:02 +1000
schrieb Manu <turkeyman@gmail.com>:

> [...] It also screws with generic code; X should be allocated with
> 'new', but Y should be allocated with yAllocator.alloc()?
> What if you decide that Z, which allocates with 'new', becomes a
> problem and you want to switch it into a pool? You now need to track
> down every instance of 'new Z', and change it.

That's not a problem with the proposed GC allocator. We should add
proper overloads to emplace and a "create" function and then all
code should use create!(Allocator, MyClass)(args) or
create!MyClass(allocatorInstance, args).

New should then be considered as deprecated and replaced by
create!(GCAllocator, Class)().
September 23, 2013
On 9/23/13, Johannes Pfau <nospam@example.com> wrote:
> New should then be considered as deprecated and replaced by
> create!(GCAllocator, Class)().

What? That's never gonna happen. For one thing, it looks ugly as hell. And for another, it's going to break everything written in D.
September 23, 2013
Am 23.09.2013 16:16, schrieb Andrei Alexandrescu:
> On 9/23/13 7:07 AM, Manu wrote:
>> On 24 September 2013 00:04, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org <mailto:SeeWebsiteForEmail@erdani.org>>
>> wrote:
>>
>>     On 9/22/13 10:20 PM, Benjamin Thaut wrote:
>>
>>         Am 23.09.2013 01:49, schrieb Andrei Alexandrescu:
>>
>>             Hello,
>>
>>
>>             2. Untyped allocator - traffics exclusively in ubyte[].
>>
>>
>>         Why "ubyte[]" and not "void[]"?
>>
>>
>>     It's the logical choice at this level.
>>
>>     ubyte[] == "these are octets"
>>
>>
>> Isn't that what void[] also means?
>> Except it says "these are un-typed octets, ie, not a sequence of typed
>> integers in the range 0-255".
>
> I think void[] means "objects of unknown type".
>
> Andrei
>

I always understood void[] as block of unkown data. Which a allocator should return in my opinion. Whats the point of "void" having a size in D if we still do it the C way? In my opinion ubyte[] is a array of values in the range of 0-255 like manu says. Also if you get a ubyte[] you might get the opinion that it is initialized to all zeros or something. Which might not be true for all allocators (performance ...)
If you get a void[] you know, all bets are off, and you have to check if the allocator preinitialized it or not.

Kind Regards
Benjamin Thaut
September 23, 2013
Am Mon, 23 Sep 2013 18:36:57 +0200
schrieb Andrej Mitrovic <andrej.mitrovich@gmail.com>:

> On 9/23/13, Johannes Pfau <nospam@example.com> wrote:
> > New should then be considered as deprecated and replaced by
> > create!(GCAllocator, Class)().
> 
> What? That's never gonna happen. For one thing, it looks ugly as hell. And for another, it's going to break everything written in D.

That's why I say "considered" and not really deprecated. But if you want to / have to write allocator aware code which can use the GC it's a nice solution:

auto list(Payload, Allocator = GCAllocator)()
{
    create!(Allocator) ...
}

Of course the API should be improved.
For example create could default to the GC allocator. Then it's
"auto a = new Class(...)" vs "auto a = create!Class(...)".

But IIRC when emplace was first implemented and class allocators were removed it was quite clear that new could be easily replaced by a template function and has no real place as a builtin anymore. It's just too late to remove it.
September 23, 2013
On Sunday, 22 September 2013 at 23:49:56 UTC, Andrei Alexandrescu wrote:
> Hello,
>

First of all, awesome !

Now the meeeeh part.

I really think the it thing is not good. I don't think it is desirable or necessary. We should get rid of it.

You can't deal with ubyte[] like that, that is incorrect in regard to - unimplemented - aliasing rules. Allocator should deal with void[] .

What you call safe really isn't. Allocate something on the GC, store a pointer on a custom allocated location, collect, enjoy the memory corruption. All operation are safe according to your proposal. Allocation can only be safe if the GRAND MASTER GC is aware of it.

You proposal allocate shared memory. This is common in C/C++ world as memory is shared by default, but shouldn't be in D. It is highly desirable to allocate with different methods for different type qualifier. How does your design adapt to that ?

Finally, we got to decide how these basics block are used to form typed allocators, and interact with language constructs.

Sorry if this has been mentioned before, it is really ate here and I can't read the whole thread, especially since Manu is on steroids :D
September 23, 2013
Am 23.09.2013 17:02, schrieb Adam D. Ruppe:
> We should really deprecate the new keyword. It'd break like all code
> ever, but with D's templates, there's no need for it, and when it is
> there, it is going to spark problems about replacing global allocators
> or the library allocators being second class citizens.
>

Its not possible to replace new with a library function in all cases. There are many examples where it breaks (I tried it believe me). Just let me give a few:


class A
{
  class B
  {
  }

  B m_b;

  this()
  {
    // uses the sourrounding context
    // not possible with library function
    m_b = new B();
  }
}

Also there is the problem that D does not have perfect forwarding. That means if "new" is a library function it will cause additional copies / moves of structs which are not the case with the buildin new. Next there are implict conversions of literals. Those work just fine with the buildin new, but will break with a templated library new.

E.g.

class C
{
  this(ubyte a, ubyte b)
  {
  }
}

new C(5,3); // fine
New!C(5,3); // Error: can not implicitly convert int to ubyte

Unless of course you want to write a template metaprogram that does those implict conversions. Although the problem would remain that you can't know if the value you get comes from a literal or from a function call, variable, etc.

The next thing are arrays. While you get the nice array syntax with the buildin new, a library new just looks ugly.

new int[5];
vs.
NewArray!int(5); // Can't use New! here because it would conflict with the New! for creating objects / structs

I'm using library implemented new / delete since over a year and its annoying and makes the language look ugly and feel strange to use. See my allocator and New / Delete implementation here:

https://github.com/Ingrater/druntime/blob/master/src/core/allocator.d

I would rather want new to be overloadable and have 2 sets of parameters

new (allocator)(arg1, arg2)

Where "allocator" would go to the overloaded version of new and "arg1" and "arg2" will be passed to the constructor.

I think its a really bad idea to deprecate new.

Replacing Delete works just fine with a library function though.

Kind Regards
Benjamin Thaut
September 23, 2013
On 9/23/13 9:47 AM, Benjamin Thaut wrote:
> I always understood void[] as block of unkown data. Which a allocator
> should return in my opinion. Whats the point of "void" having a size in
> D if we still do it the C way? In my opinion ubyte[] is a array of
> values in the range of 0-255 like manu says. Also if you get a ubyte[]
> you might get the opinion that it is initialized to all zeros or
> something. Which might not be true for all allocators (performance ...)
> If you get a void[] you know, all bets are off, and you have to check if
> the allocator preinitialized it or not.

You might be right. For example, ubyte[] allows arithmetic on its elements, which is something one shouldn't ever care to do in an allocation library.

I'm unclear on what void[] means starting from its semantics. That said, I replaced ubyte[] with void[] throughout my existing code and it works.


Andrei

September 23, 2013
On 9/23/13 10:01 AM, deadalnix wrote:
> On Sunday, 22 September 2013 at 23:49:56 UTC, Andrei Alexandrescu wrote:
>> Hello,
>>
>
> First of all, awesome !
>
> Now the meeeeh part.
>
> I really think the it thing is not good. I don't think it is desirable
> or necessary. We should get rid of it.

The singleton allocator "it" is instrumental for supporting stateful and stateless allocators with ease. It took me a few iterations to get to that, and I'm very pleased with the results. I think it would be difficult to improve on it without making other parts more difficult.

> You can't deal with ubyte[] like that, that is incorrect in regard to -
> unimplemented - aliasing rules. Allocator should deal with void[] .

I think I'll do that.

> What you call safe really isn't. Allocate something on the GC, store a
> pointer on a custom allocated location, collect, enjoy the memory
> corruption.

I don't understand this. There are no pointers at this level, only untyped memory. The main chance of corruption is to access something after it's been freed.

> All operation are safe according to your proposal.

No, for most allocators freeing and reallocating are unsafe.

> Allocation can only be safe if the GRAND MASTER GC is aware of it.

I don't think so.

> You proposal allocate shared memory.

No. It allocates unshared memory off of a shared or unshared allocator. The memory just allocated is as of yet unshared for the simple reason that only one thread has as of yet access to it.

> This is common in C/C++ world as
> memory is shared by default, but shouldn't be in D. It is highly
> desirable to allocate with different methods for different type
> qualifier. How does your design adapt to that ?

The typed allocators will have distinct methods for shared and unshared allocation. Even at this level it's possible to design an allocator that allocates in different ways with a shared vs. an unshared allocator object (overload on shared). So far I've only designed non-shared allocators, or wrap those that are already shared (malloc and new).

> Finally, we got to decide how these basics block are used to form typed
> allocators, and interact with language constructs.

Sure. Again: I describe the Otto cycle and you discuss how to paint the road.


Andrei