May 23, 2012
On 23-05-2012 17:29, Don Clugston wrote:
> On 23/05/12 15:56, Alex Rønne Petersen wrote:
>> On 23-05-2012 15:17, Don Clugston wrote:
>>> On 23/05/12 05:22, Steven Schveighoffer wrote:
>>>> I have come across a dilemma.
>>>>
>>>> Alex Rønne Petersen has a pull request changing some things in the
>>>> GC to
>>>> pure. I think gc_collect() should be weak-pure, because it could
>>>> technically run on any memory allocation (which is already allowed in
>>>> pure functions), and it runs in a context that doesn't really affect
>>>> execution of the pure function.
>>>>
>>>> So I think it should be able to be run inside a strong pure function.
>>>
>>> I am almost certain it should not.
>>>
>>> And I think this is quite important. A strongly pure function should be
>>> considered to have its own gc, and should not be able to collect any
>>> memory it did not allocate itself.
>>>
>>> Memory allocation from a pure function might trigger a gc cycle, but it
>>> would ONLY look at the memory allocated inside that pure function.
>>
>> Implementing this on a per-function basis is not very realistic. Some
>> programs have hundreds (if not thousands) of pure functions.
>
> No, it's not realistic for every function. But it's extremely easy for
> others. In particular, if you have a pure function which has no
> reference parameters, you just need a pointer to the last point a
> strongly pure function was entered. This partitions the heap into two
> parts. Each can be gc'd independently.
>
> And, in the non-pure part, nothing is happening. Once you've done a GC
> there, you NEVER need to do it again.
>
>> Not to mention, we'd need some mechanism akin to critical regions to
>> figure out when a thread is in a pure function during stop-the-world.
>> Further, data allocated in a pure function f() in thread A must not be
>> touched by a collection triggered by an allocation inside f() in thread
>> B. It'd be a huge mess.
>
> Not so. It's impossible for anything outside of a strongly pure function
> to hold a pointer to memory allocated by the pure function.

Not sure I follow:

immutable(int)* foo() pure
{
        return new int;
}

void main()
{
        auto ptr = foo();
        // we now have a pointer to memory allocated by a pure function?
}

Unless, of course, you consider this weakly pure. But at that point, strongly pure functions are starting to get very, very useless.

> In my view, this is the single most interesting feature of purity.
>
>> And, frankly, if my program dies from an OOME due to pure functions
>> being unable to do full collection cycles, I'd just stop using pure
>> permanently. It's not a very realistic approach to automatic memory
>> management; at that point, manual memory management would work better.
>
> Of course. But I don't see how that's relevant. How the pure function
> actually obtains its memory is an implementation detail.
>
> There's a huge difference between "a global collection *may* be
> performed from a pure function" vs "it *must* be possible to force a
> global collection from a pure function".
>
> The difficulty in expressing the latter is a simple consequence of the
> fact that it is intrinsically impure.


-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
May 23, 2012
On Wed, 23 May 2012 11:41:00 -0400, Alex Rønne Petersen <alex@lycus.org> wrote:

> On 23-05-2012 17:29, Don Clugston wrote:

>> Not so. It's impossible for anything outside of a strongly pure function
>> to hold a pointer to memory allocated by the pure function.
>
> Not sure I follow:
>
> immutable(int)* foo() pure
> {
>          return new int;
> }
>
> void main()
> {
>          auto ptr = foo();
>          // we now have a pointer to memory allocated by a pure function?
> }

I think what Don means is this:

1. upon entry into a strong-pure function, record a GC context that remembers what point in the stack it entered (no need to search above that stack), and uses its parameters as "context roots".
2. Any collection performed while *in* the strong-pure function explicitly will simply deal with the contexted GC data.  It does not need to look at the main heap, except for those original roots.
3. upon exiting, you can remove the original roots, and add the return value as a root, and run one final GC collection within the context.  This should deterministically clean up any memory that was temporary while inside the pure function.
4. Anything that is left in the contexted GC is assimilated into the main GC.

Everything can be done without using a GC lock *except* the final assimilation (which may not need to lock because there is nothing to add).

Don, why can't gc_collect do the right thing based on whether it's in a contexted GC or not?  The compiler is going to have to initialize the context when first entering a strong-pure function, so we should be able to have a hook recording that the thread is using a pure-function context, no?

Also, let's assume it's a separate call to do pure function gc_collect, i.e. we have a pure gc_pureCollect function.

What if a weak-pure function calls this?  What happens when it is not called from within a strong-pure function?

I still think gc_collect can be marked pure and do the right thing.

-Steve
May 23, 2012
On 23-05-2012 17:56, Steven Schveighoffer wrote:
> On Wed, 23 May 2012 11:41:00 -0400, Alex Rønne Petersen
> <alex@lycus..org> wrote:
>
>> On 23-05-2012 17:29, Don Clugston wrote:
>
>>> Not so. It's impossible for anything outside of a strongly pure function
>>> to hold a pointer to memory allocated by the pure function.
>>
>> Not sure I follow:
>>
>> immutable(int)* foo() pure
>> {
>> return new int;
>> }
>>
>> void main()
>> {
>> auto ptr = foo();
>> // we now have a pointer to memory allocated by a pure function?
>> }
>
> I think what Don means is this:
>
> 1. upon entry into a strong-pure function, record a GC context that
> remembers what point in the stack it entered (no need to search above
> that stack), and uses its parameters as "context roots".
> 2. Any collection performed while *in* the strong-pure function
> explicitly will simply deal with the contexted GC data. It does not need
> to look at the main heap, except for those original roots.
> 3. upon exiting, you can remove the original roots, and add the return
> value as a root, and run one final GC collection within the context.
> This should deterministically clean up any memory that was temporary
> while inside the pure function.
> 4. Anything that is left in the contexted GC is assimilated into the
> main GC.
>
> Everything can be done without using a GC lock *except* the final
> assimilation (which may not need to lock because there is nothing to add).
>
> Don, why can't gc_collect do the right thing based on whether it's in a
> contexted GC or not? The compiler is going to have to initialize the
> context when first entering a strong-pure function, so we should be able
> to have a hook recording that the thread is using a pure-function
> context, no?
>
> Also, let's assume it's a separate call to do pure function gc_collect,
> i.e. we have a pure gc_pureCollect function.
>
> What if a weak-pure function calls this? What happens when it is not
> called from within a strong-pure function?
>
> I still think gc_collect can be marked pure and do the right thing.
>
> -Steve

I still don't think my concern about pure functions allocating tons of memory (and thus requiring global GC to happen) has been addressed, though.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
May 23, 2012
On Wed, 23 May 2012 12:22:39 -0400, Alex Rønne Petersen <alex@lycus.org> wrote:

> On 23-05-2012 17:56, Steven Schveighoffer wrote:
>> On Wed, 23 May 2012 11:41:00 -0400, Alex Rønne Petersen
>> <alex@lycus..org> wrote:
>>
>>> On 23-05-2012 17:29, Don Clugston wrote:
>>
>>>> Not so. It's impossible for anything outside of a strongly pure function
>>>> to hold a pointer to memory allocated by the pure function.
>>>
>>> Not sure I follow:
>>>
>>> immutable(int)* foo() pure
>>> {
>>> return new int;
>>> }
>>>
>>> void main()
>>> {
>>> auto ptr = foo();
>>> // we now have a pointer to memory allocated by a pure function?
>>> }
>>
>> I think what Don means is this:
>>
>> 1. upon entry into a strong-pure function, record a GC context that
>> remembers what point in the stack it entered (no need to search above
>> that stack), and uses its parameters as "context roots".
>> 2. Any collection performed while *in* the strong-pure function
>> explicitly will simply deal with the contexted GC data. It does not need
>> to look at the main heap, except for those original roots.
>> 3. upon exiting, you can remove the original roots, and add the return
>> value as a root, and run one final GC collection within the context.
>> This should deterministically clean up any memory that was temporary
>> while inside the pure function.
>> 4. Anything that is left in the contexted GC is assimilated into the
>> main GC.
>>
>> Everything can be done without using a GC lock *except* the final
>> assimilation (which may not need to lock because there is nothing to add).
>>
>> Don, why can't gc_collect do the right thing based on whether it's in a
>> contexted GC or not? The compiler is going to have to initialize the
>> context when first entering a strong-pure function, so we should be able
>> to have a hook recording that the thread is using a pure-function
>> context, no?
>>
>> Also, let's assume it's a separate call to do pure function gc_collect,
>> i.e. we have a pure gc_pureCollect function.
>>
>> What if a weak-pure function calls this? What happens when it is not
>> called from within a strong-pure function?
>>
>> I still think gc_collect can be marked pure and do the right thing.
>>
>> -Steve
>
> I still don't think my concern about pure functions allocating tons of memory (and thus requiring global GC to happen) has been addressed, though.

What I forgot is:

2a. Any collection performed *implicitly* during strong-pure function may run a full collection cycle.

-Steve
May 23, 2012
Le 23/05/2012 17:29, Don Clugston a écrit :
> There's a huge difference between "a global collection *may* be
> performed from a pure function" vs "it *must* be possible to force a
> global collection from a pure function".
>

Thank you !
May 23, 2012
Le 23/05/2012 18:22, Alex Rønne Petersen a écrit :
> On 23-05-2012 17:56, Steven Schveighoffer wrote:
>> On Wed, 23 May 2012 11:41:00 -0400, Alex Rønne Petersen
>> <alex@lycus..org> wrote:
>>
>>> On 23-05-2012 17:29, Don Clugston wrote:
>>
>>>> Not so. It's impossible for anything outside of a strongly pure
>>>> function
>>>> to hold a pointer to memory allocated by the pure function.
>>>
>>> Not sure I follow:
>>>
>>> immutable(int)* foo() pure
>>> {
>>> return new int;
>>> }
>>>
>>> void main()
>>> {
>>> auto ptr = foo();
>>> // we now have a pointer to memory allocated by a pure function?
>>> }
>>
>> I think what Don means is this:
>>
>> 1. upon entry into a strong-pure function, record a GC context that
>> remembers what point in the stack it entered (no need to search above
>> that stack), and uses its parameters as "context roots".
>> 2. Any collection performed while *in* the strong-pure function
>> explicitly will simply deal with the contexted GC data. It does not need
>> to look at the main heap, except for those original roots.
>> 3. upon exiting, you can remove the original roots, and add the return
>> value as a root, and run one final GC collection within the context.
>> This should deterministically clean up any memory that was temporary
>> while inside the pure function.
>> 4. Anything that is left in the contexted GC is assimilated into the
>> main GC.
>>
>> Everything can be done without using a GC lock *except* the final
>> assimilation (which may not need to lock because there is nothing to
>> add).
>>
>> Don, why can't gc_collect do the right thing based on whether it's in a
>> contexted GC or not? The compiler is going to have to initialize the
>> context when first entering a strong-pure function, so we should be able
>> to have a hook recording that the thread is using a pure-function
>> context, no?
>>
>> Also, let's assume it's a separate call to do pure function gc_collect,
>> i.e. we have a pure gc_pureCollect function.
>>
>> What if a weak-pure function calls this? What happens when it is not
>> called from within a strong-pure function?
>>
>> I still think gc_collect can be marked pure and do the right thing.
>>
>> -Steve
>
> I still don't think my concern about pure functions allocating tons of
> memory (and thus requiring global GC to happen) has been addressed, though.
>

If the pure function allocate a ton of memory, one of these allocation will trigger the collection. Why is that an issue ?
May 24, 2012
On 23-05-2012 19:19, deadalnix wrote:
> Le 23/05/2012 18:22, Alex Rønne Petersen a écrit :
>> On 23-05-2012 17:56, Steven Schveighoffer wrote:
>>> On Wed, 23 May 2012 11:41:00 -0400, Alex Rønne Petersen
>>> <alex@lycus..org> wrote:
>>>
>>>> On 23-05-2012 17:29, Don Clugston wrote:
>>>
>>>>> Not so. It's impossible for anything outside of a strongly pure
>>>>> function
>>>>> to hold a pointer to memory allocated by the pure function.
>>>>
>>>> Not sure I follow:
>>>>
>>>> immutable(int)* foo() pure
>>>> {
>>>> return new int;
>>>> }
>>>>
>>>> void main()
>>>> {
>>>> auto ptr = foo();
>>>> // we now have a pointer to memory allocated by a pure function?
>>>> }
>>>
>>> I think what Don means is this:
>>>
>>> 1. upon entry into a strong-pure function, record a GC context that
>>> remembers what point in the stack it entered (no need to search above
>>> that stack), and uses its parameters as "context roots".
>>> 2. Any collection performed while *in* the strong-pure function
>>> explicitly will simply deal with the contexted GC data. It does not need
>>> to look at the main heap, except for those original roots.
>>> 3. upon exiting, you can remove the original roots, and add the return
>>> value as a root, and run one final GC collection within the context.
>>> This should deterministically clean up any memory that was temporary
>>> while inside the pure function.
>>> 4. Anything that is left in the contexted GC is assimilated into the
>>> main GC.
>>>
>>> Everything can be done without using a GC lock *except* the final
>>> assimilation (which may not need to lock because there is nothing to
>>> add).
>>>
>>> Don, why can't gc_collect do the right thing based on whether it's in a
>>> contexted GC or not? The compiler is going to have to initialize the
>>> context when first entering a strong-pure function, so we should be able
>>> to have a hook recording that the thread is using a pure-function
>>> context, no?
>>>
>>> Also, let's assume it's a separate call to do pure function gc_collect,
>>> i.e. we have a pure gc_pureCollect function.
>>>
>>> What if a weak-pure function calls this? What happens when it is not
>>> called from within a strong-pure function?
>>>
>>> I still think gc_collect can be marked pure and do the right thing.
>>>
>>> -Steve
>>
>> I still don't think my concern about pure functions allocating tons of
>> memory (and thus requiring global GC to happen) has been addressed,
>> though.
>>
>
> If the pure function allocate a ton of memory, one of these allocation
> will trigger the collection. Why is that an issue ?

Not with the scheme Steven first proposed, but it will with his clarification, so all is good.

That said, I'm really not convinced it's worth the effort, but as long as a "pure GC" doesn't start breaking my programs, I won't complain. ;)

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
May 24, 2012
On 23-05-2012 19:16, deadalnix wrote:
> Le 23/05/2012 17:29, Don Clugston a écrit :
>> There's a huge difference between "a global collection *may* be
>> performed from a pure function" vs "it *must* be possible to force a
>> global collection from a pure function".
>>
>
> Thank you !

I personally disagree that this should be a rationale to not allow the latter. D is a systems language and we really should stop trying to pretend that it isn't. There's a reason we have a core.memory module that lets us control the GC.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
May 24, 2012
On 24/05/12 02:26, Alex Rønne Petersen wrote:
> On 23-05-2012 19:16, deadalnix wrote:
>> Le 23/05/2012 17:29, Don Clugston a écrit :
>>> There's a huge difference between "a global collection *may* be
>>> performed from a pure function" vs "it *must* be possible to force a
>>> global collection from a pure function".
>>>
>>
>> Thank you !
>
> I personally disagree that this should be a rationale to not allow the
> latter. D is a systems language and we really should stop trying to
> pretend that it isn't. There's a reason we have a core.memory module
> that lets us control the GC.
>

This is all about not exposing quirks of the current implementation.

The way it currently is, would get you to perform a gc before you enter the first pure function.
After that, the only possible garbage to collect would have been generated from inside the pure function. And that should be very cheap to collect.
May 24, 2012
On Thu, 24 May 2012 04:58:56 -0400, Don Clugston <dac@nospam.com> wrote:

> On 24/05/12 02:26, Alex Rønne Petersen wrote:
>> On 23-05-2012 19:16, deadalnix wrote:
>>> Le 23/05/2012 17:29, Don Clugston a écrit :
>>>> There's a huge difference between "a global collection *may* be
>>>> performed from a pure function" vs "it *must* be possible to force a
>>>> global collection from a pure function".
>>>>
>>>
>>> Thank you !
>>
>> I personally disagree that this should be a rationale to not allow the
>> latter. D is a systems language and we really should stop trying to
>> pretend that it isn't. There's a reason we have a core.memory module
>> that lets us control the GC.
>>
>
> This is all about not exposing quirks of the current implementation.
>
> The way it currently is, would get you to perform a gc before you enter the first pure function.
> After that, the only possible garbage to collect would have been generated from inside the pure function. And that should be very cheap to collect.

The more I think about it, the more I believe that what gc_collect does (i.e. run a full collect, or run a pure-function specific collect) is an implementation detail.

I don't think exposing gc_collect is a quirk of the current implementation, and it should be marked pure (weak purity).

This whole thread has kind of flown way off topic.  Regardless of whether gc_collect should be callable from a pure function, there are other use cases for considering logically pure functions as pure (for the same reasons you can cast away const).  Should we be able to force weak purity or not?  If so, how to do it?

-Steve