September 23, 2013
On Monday, 23 September 2013 at 13:58:42 UTC, Andrei Alexandrescu wrote:
> If the author forgets and hardcodes a library to use malloc(), I have no way around that.
>

We should consider the lib writer use new and/or automatic allocation like for closures. In practice, this is what will happen.

> new will still be used to tap into the global shared GC. std.allocator will provide other means of allocating memory.
>

That uterly suck and we can do better using type informations.

> At this point collect() is only implemented by the global GC. It is possible I'll drop it from the final design. However, it's also possible that collect() will be properly defined as "collect all objects allocated within this particular allocator that are not referred from any objects also allocated within this allocator". I think that's a useful definition.
>

I don't think collect is usefull with no type information (not even if the data may contain pointers).

> The D language has no idea what a range is. The notion is completely defined in std.range.
>

foreach has some idea of what a range is.
September 23, 2013
On 9/23/13 8:32 AM, Manu wrote:
> On 23 September 2013 23:58, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org <mailto:SeeWebsiteForEmail@erdani.org>>
> wrote:
> This is a good first step though, I'm happy to discuss this, but I think
> discussion about the practical application may also reveal design
> details at this level.

Absolutely. This is refreshing since I've gone through the cycle of "types must be integrated into allocators and with the global GC" ... -> ... "untyped allocators can actually be extricated" once. It is now great to revisit the assumptions again.

One matter I'm rather surprised didn't come up (I assumed everyone would be quite curious about it) is that the allocator does not store the allocated size, or at least does not allow it to be discovered easily. This is a definitely appropriate topic for the design at hand.

The malloc-style APIs go like:

void* malloc(size_t size);
void free(void*);

Note that the user doesn't pass the size to free(), which means the allocator is burdened with inferring the size of the block from the pointer efficiently. Given that most allocators make crucial strategic choices depending on the size requested, this API shortcoming is a bane of allocator writers everywhere, and a source of inefficiency in virtually all contemporary malloc-style allocators.

This is most ironic since there is next to nothing that user code can do with a pointer without knowing the extent of memory addressed by it. (A notable exception is zero-terminated strings, another questionable design decision.) It all harkens back to Walter's claim of "C's biggest mistake" (with which I agree) of not defining a type for a bounded memory region.

Upon studying a few extant allocators and D's lore, I decided that in D things have evolved sufficiently to have the user pass the size upon deallocation as well:

void[] allocate(size_t size);
void deallocate(void[] buffer);

This is because the size of D objects is naturally known: classes have it in the classinfo, slices store it, and the few cases of using bald pointers for allocation are irrelevant and unrecommended.

This all makes memory allocation in D a problem fundamentally simpler than in C, which is quite an interesting turn :o).

> No, I certainly understand you mean 'them', but you lead to what I'm
> asking, how do these things get carried/passed around. Are they
> discreet, or will they invade argument lists everywhere?

Since there's a notion of "the default" allocator, there will be ways to push/pop user-defined allocators that temporarily (or permanently) replace the default allocator. This stage of the design is concerned with allowing users to define such user-defined allocators without having to implement them from scratch.

> Are they free
> to flow in/out of libraries in a natural way?
> These patterns are what will define the system as I see it.
> Perhaps more importantly, where do these allocators get their memory
> themselves (if they're not a bottom-level allocator)? Global override
> perhaps, or should a memory source always be explicitly provided to a
> non-bottom-level allocator?

There are a few obvious bottom allocators, such as malloc, new, sbrk, or Windows heap. All other allocators will compose by parameterizing on their parent allocator.

> Sure, but the common case is that the author will almost certainly use
> keyword 'new'. How can I affect that as a 3rd party?
> This would require me overriding the global allocator somehow... which
> you touched on earlier.

The way I'm thinking of this is to install/uninstall user-defined allocators that will satisfy calls to new. Since not all allocators support tracing, some would require the user to deallocate stuff manually.

>     The library wouldn't need to worry as there would be the notion of a
>     default allocator (probably backed by the existing GC).
>
> Right. So it's looking like like the ability to override the global
> allocator is a critical requirement.

Again, this is difficult to handle in the same conversation as "should we pass size upon deallocation". Yes, we need road laws, but it's hard to talk about those and engine lubrication in the same breath. For me,
"critical" right now is to assess whether the untyped API misses an important category of allocators, what safety level it has (that's why "expand" is so different from "realloc"!) etc.

>         And does that mean that applications+libraries are required to
>         ALWAYS
>         allocate through given allocator objects?
>
>
>     Yes, they should.
>
>
> Then we make keyword 'new' redundant?

Probably not. Typed allocators will need to integrate with (and occasionally replace) the global shared GC.

>     The problem is that installing an allocator does not get to define
>     what a pointer is and what a reference is.
>
> Why not?

Because that requires a language change. I'm not sure you realize but you move the goalposts all the time. We were talking within the context of libraries and installing allocators dynamically and all of a sudden you get to change what the compiler thinks a pointer is.

> A pointer has a type, like anything else. An ARC pointer can
> theoretically have the compiler insert ARC magic.
> That does imply though that the allocator affects the type, which I
> don't like... I'll think on it.

I talked Walter's ear off a while ago at an ACCU conference about the notion that reference counting could be a switch passed to the compiler. Recently he's authored a DIP about the compiler inserting refcounting primitives in the generated code. Unfortunately I think the DIP is awfully incomplete and superficial (to the extent it would bring more damage to the language if any implementation effort would be invested in it at this point).

Can it be done? Probably. But it would be an absolutely major effort.

>     These are notions hardwired into the language, so the notion of
>     turning a switch and replacing the global GC with a reference
>     counting scheme is impossible at the level of a library API.
>
>
> Indeed it is. So is this API being built upon an incomplete foundation?

The API is quite good at what it puports to do.

>     What I do hope to get to is to have allocators define their own
>     pointers and reference types. User code that uses those will be
>     guaranteed certain allocation behaviors.
>
>
> Interesting, will this mangle the pointer type, or the object type being
> pointed to? The latter is obviously not desirable. Does the former
> actually work in theory?

I don't think I understand what you mean. Honest, it seems to me you're confused about what you want, how it can be done, and what moving pieces are involved.

One example is Rust, which defines several pointer types to implement its scoping and borrowing semantics. I think it's not easy to achieve its purported semantics with fewer types, and time will tell whether programmers will put up with the complexity for the sake of the corresponding benefits.

>     At this point collect() is only implemented by the global GC. It is
>     possible I'll drop it from the final design. However, it's also
>     possible that collect() will be properly defined as "collect all
>     objects allocated within this particular allocator that are not
>     referred from any objects also allocated within this allocator". I
>     think that's a useful definition.
>
> Perhaps. I'm not sure how this situation arises though. Unless you've
> managed to implement your own GC inside an allocator.

That should be doable within D today.

> I think a critical detail to keep in mind, is that (I suspect) people
> simply won't use it if it doesn't interface with keyword 'new'.

I think this is debatable. For one, languages such as Java and C++ still have built-in "new" but quite ubiquitously unrecommend their usage in user code. Far as I can tell that's been a successful campaign.

For two, there are allocations that don't even entail calls to new, such as array concatenation.

> It's extremely common that I want to enforce that a library exist
> entirely within a designated heap. It can't fall back to the global GC.
>
> I work on platforms where memory is not unified. Different resources
> need to go into different heaps.

The proposed API would make that entirely possible. You seem to care only with the appendix "and mind you I only want to use new for all of those!" which is a bit out there. But nevertheless good feedback.


Andrei
September 23, 2013
On Monday, 23 September 2013 at 17:01:47 UTC, deadalnix wrote:
> What you call safe really isn't. Allocate something on the GC, store a pointer on a custom allocated location, collect, enjoy the memory corruption. All operation are safe according to your proposal. Allocation can only be safe if the GRAND MASTER GC is aware of it.

I don't see why the GC can't be allocator-agnostic. In principle, it should be possible to register arbitrary allocators with the GC (possibly by storing a pointer to the allocator in each GC-managed block of memory). The GC would have a default allocator which would be used with the "new" operator, and something like the aforementioned create!() syntax could be used for custom allocators:

SomeClass c1 = new SomeClass("arg");
//Managed by the GC
SomeClass c2 = gcCreate!SomeClass(someAllocator, "arg");
//Another possible syntax (much prettier!):
SomeClass c3 = someAllocator.new SomeClass("arg");

This could also integrate with DIP46; instead of pushing and popping a GC object, one would push and pop the GC's default allocator (for the current thread, of course).

There would be a bit of overhead, but that seems to be unavoidable in a sufficiently generic design, and I doubt that it would cut performance to an unacceptable level, especially because really performance-critical code would avoid the GC anyway.

> You proposal allocate shared memory. This is common in C/C++ world as memory is shared by default, but shouldn't be in D. It is highly desirable to allocate with different methods for different type qualifier. How does your design adapt to that ?

If the aforementioned thread-local GC reference is implemented, it should be possible to provide decent thread-local allocation. I'm not sure exactly how to design that, but it seems like thread-local GC allocation would require some work at the GC level; a global GC won't necessarily play nicely with thread-local allocators.

For non-GC thread-local allocations, I'd imagine that you could just create a thread-local allocator.


September 23, 2013
On 9/23/2013 3:17 AM, Manu wrote:
> Sure, true. But little things like separate it into its own lib, so you can
> easily link a different one, or not link one at all.

To link in a different one, put the different one on the link command before libphobos2.a. The different one will override the one in the library.

> Also, what about language support?

How it's done is deliberately not part of the language.

> How much language support is there currently?

Zero.

> I don't think I could currently write the ref-counting GC that I'd like without
> extensive language support.
> inc/dec ref calls would need to be inserted all over the place by he compiler,
> and optimiser would need to know how to remove redundant inc/dec sequences.

A ref counting system is something entirely different than just plugging in another GC.
September 23, 2013
On 2013-09-23 17:47, Manu wrote:

> I like operator new. It goes blue, and the statement isn't riddled with
> exclamation marks.

For Ruby, many editors and IDE's highlight standard built in functions as keywords. Like "new", "require", "attr_accessor" (and friends) to mention a few.

> Imagine if new was a template (it would have to be a template).
> Allocating a template would require nested template syntax every time,
> which is pretty ugly.
>
>    X x = new!(X!arg(args...)); // ewoo, paren spam...

Why not:

X x = new!X(args);

-- 
/Jacob Carlborg
September 23, 2013
On 09/23/2013 07:16 PM, Andrei Alexandrescu wrote:
> On 9/23/13 10:01 AM, deadalnix wrote:
>
>> Finally, we got to decide how these basics block are used to form typed allocators, and interact with language constructs.
>
> Sure. Again: I describe the Otto cycle and you discuss how to paint the road.
IMO there is a problem with this metaphor. If the road is painted metallic the Otto motor (cycle) needs cooling, otherwise ... boom. So I think the overall picture has to be kept in sight.

September 23, 2013
On 2013-09-23 16:05, Manu wrote:

> Another situation that I've encountered on a few occasions...
> Imagine I declare a particular allocator for my application, when
> dealing with 3rd party libs, I expect it to allocate within my
> application's heap. Seems like a situation where I might set a global
> allocator...
> Mr 3rd party library also has its own allocators though, things like
> pools or groups which it uses explicitly within it's code.
> I don't actually want to override these allocators, since they're
> effectively a higher-level construct, but I *do* want to override these
> allocator's memory source. Ie, they should allocate their pools from my
> application's heap, rather than the default heap.
> So in this way, it's important to be able to set overrides at multiple
> levels.

You might be able to combine two different allocators. Basically create a new allocator that somehow extends the third party allocator but allocates in your own heap instead.

-- 
/Jacob Carlborg
September 23, 2013
On 2013-09-23 17:03, Andrei Alexandrescu wrote:

> I'd think new already is translated into a library call. Walter?

Yes, "new" is lowered to a couple different runtime functions. Here's a list:

http://wiki.dlang.org/Runtime_Hooks

-- 
/Jacob Carlborg
September 23, 2013
On 2013-09-23 20:59, Jacob Carlborg wrote:

> http://wiki.dlang.org/Runtime_Hooks

Search for "new".

-- 
/Jacob Carlborg
September 23, 2013
On 9/23/13 11:24 AM, Walter Bright wrote:
> On 9/23/2013 3:17 AM, Manu wrote:
>> Sure, true. But little things like separate it into its own lib, so
>> you can
>> easily link a different one, or not link one at all.
>
> To link in a different one, put the different one on the link command
> before libphobos2.a. The different one will override the one in the
> library.
>
>> Also, what about language support?
>
> How it's done is deliberately not part of the language.
>
>> How much language support is there currently?
>
> Zero.

I think there's at least indirect support in e.g. generating typeinfo objects appropriately. Many of the components of those typeinfos are expressly defined for helping a tracing collector.

Andrei