Thread overview
[D-runtime] Extending pages in GC
Jul 14, 2010
Fawzi Mohamed
Jul 14, 2010
Fawzi Mohamed
Jul 14, 2010
Fawzi Mohamed
July 14, 2010
Currently, in the GC, there is this routine:

extend(void * p, size_t minsize, size_t maxsize)

The parameters are the pointer to extend, the minimum number of bytes to add to the block, and the maximum number of bytes to add.

This only works if the block is page size or greater.  It essentially queries the pool for pages that are "committed" and "free" that are after the currently allocated block, and adds them to the block.

If the minimum number of pages required to satisfy extend are not free and comitted, then it uses a Pool method:

extendPages(uint n)

Which will "commit" up to n pages if it has that many uncomitted pages, or it will return an error.

The issue is, what argument do you pass to extendPages?  The only sane value to pass is minsize, because if you pass maxsize and it fails, then you didn't get anything.

But why shouldn't extendPages commit as many pages as possible?  If that was the case, then you could get closer to your desired maxsize.

Here is the syndrome I'm trying to fix.  When using appending on an array to add one element at a time, the array append code will use a function newCapacity to add a certain percentage of memory to a block (the size varies, from something like 300% of the current size for small arrays to 20% for very large arrays). If it can use extend, it will.  But it will only use newCapacity if completely reallocating.  I recently change it to try extend with the minimum size needed to append as minsize, and the result of newCapacity as the maxsize.

What I found was happening is, even though the maxsize requested was hundreds, even thousands, of pages, I would get a single page, then 15 pages, then a single page, then 15 pages, etc.  So clearly the pages are available, but the routine doesn't add all it can.  What happens is the "commit" size must be a multiple of 16 pages.  So when extend cannot get any free pages, it calls extendPages to get enough pages to cover minsize (usually 1).  extendPages must commit a multiple of 16 pages, so it commits 16.  extend gets its one page, and returns.  There are now 15 free pages, so the next append adds 15 more.  But this defeats the algorithm which tries to pre-allocate a proportional amount of data while appending.

What I want to have is a function that tries its best to get to maxsize, using any means possible, including comitting pages.  I plan to add such functionality.  What I'm not sure of is, should I modify the existing functions, or add new ones?

My proposed changes would be to add a new extendPagesUpTo(n) (name to be worked on) which may extend *up to* n pages, and then modify extend to use this function instead.  I don't want to change extendPages(n) because some code may rely on the current behavior of returning an error if it cannot extend to *at least* n pages.  However, I think extend can be changed because it already is expected to return some value between minsize and maxsize or 0 if it couldn't do it.

What do you think?

-Steve




July 14, 2010
ehm, it seems that it was too long ago that I worked on that code.
Indeed I flip-flopped one more time it seem, and I also decided to
grow only based on memory usage.
The element size is used just to round the result to a multiple of the
element size...
(this can have a large impact with arrays with few large elements, I
think it is correct)

Fawzi
July 14, 2010
Hm... did you mean to reply to the newCapacity message?

Even if that is the case, I don't really understand what you are saying here...

-Steve



----- Original Message ----
> From: Fawzi Mohamed <fawzi at gmx.ch>
> To: D's runtime library developers list <d-runtime at puremagic.com>
> Sent: Wed, July 14, 2010 8:20:46 AM
> Subject: Re: [D-runtime] Extending pages in GC
> 
> ehm, it seems that it was too long ago that I worked on that code.
> Indeed I  flip-flopped one more time it seem, and I also decided to grow only
>based on  memory usage.
> The element size is used just to round the result to a multiple  of the element
>size...
> (this can have a large impact with arrays with few  large elements, I think it
>is  correct)
> 
> Fawzi
> _______________________________________________
> D-runtime  mailing list
> D-runtime at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/d-runtime
> 



July 14, 2010
On 14-lug-10, at 14:34, Steve Schveighoffer wrote:

> Hm... did you mean to reply to the newCapacity message?
>
> Even if that is the case, I don't really understand what you are saying here...
>
> -Steve
>
Sorry, I had written a first reply to the newCapacity message, and it seems that that first reply for unknown reasons did not go through to the list... so my second reply was indeed without context...

as It was not really correct, I just put here an excerpt for completeness sake
------
Again I am not too sure, I flip flopped a couple of times implementing http://github.com/fawzi/blip/blob/master/blip/util/Grow.d
  , a version of which I had also used in tango.

basically if you append to an array do you want the appending be
determined by the number of elements of the array, or its size?
Code will probably work depending on the number of elements of the
array, not on its memory usage.
In the end it is just a factor, but if you have many small elements I
feel that it if "reasonable" to allocate a smaller amount of extra
space.
On the other hand as you pointed out if you thing strictly from the
memory point of view, one can argue that the percentile of extra space
should depend only on the space used, not on how it is used internally.

Finally I kept the "per element" view, but that was done without any real testing
>


(in reality I grow strictly on memory and the use the element size to round up, guaranteeing nice behavior for arrays with few large elements).
--------

about the sub page grows I am not sure if I agree, choosing those special sizes matches the sizes of the bins in the GC, so it makes sense to have special sizes
>
>
> ----- Original Message ----
>> From: Fawzi Mohamed <fawzi at gmx.ch>
>> To: D's runtime library developers list <d-runtime at puremagic.com>
>> Sent: Wed, July 14, 2010 8:20:46 AM
>> Subject: Re: [D-runtime] Extending pages in GC
>>
>> ehm, it seems that it was too long ago that I worked on that code.
>> Indeed I  flip-flopped one more time it seem, and I also decided to
>> grow only
>> based on  memory usage.
>> The element size is used just to round the result to a multiple  of
>> the element
>> size...
>> (this can have a large impact with arrays with few  large elements,
>> I think it
>> is  correct)
>>
>> Fawzi
>> _______________________________________________
>> D-runtime  mailing list
>> D-runtime at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/d-runtime
>>
>
>
>
> _______________________________________________
> D-runtime mailing list
> D-runtime at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/d-runtime

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/d-runtime/attachments/20100714/43b00844/attachment.html>
July 14, 2010
On 14-lug-10, at 13:40, Steve Schveighoffer wrote:

> [...]
> What I want to have is a function that tries its best to get to
> maxsize, using
> any means possible, including comitting pages.  I plan to add such
> functionality.  What I'm not sure of is, should I modify the
> existing functions,
> or add new ones?
>
> My proposed changes would be to add a new extendPagesUpTo(n) (name
> to be worked
> on) which may extend *up to* n pages, and then modify extend to use
> this
> function instead.  I don't want to change extendPages(n) because
> some code may
> rely on the current behavior of returning an error if it cannot
> extend to *at
> least* n pages.  However, I think extend can be changed because it
> already is
> expected to return some value between minsize and maxsize or 0 if it
> couldn't do
> it.
>
> What do you think?

Now I am really replying to this topic :).

Why not adding a function that receives both the min and max (I would call it optimal size, because I can imagine case in which that is overshooted).

the old function could the simply call the new one with twice the same argument.

Fawzi

July 14, 2010
OK, I agree about the sub-page sizes (I do not use newCapacity in those cases, and newCapacity actually has a guard against that).

I like the uprounding, it's probably the better method than what currently happens.  I'll try to incorporate that.

Your reply went to the list, it was just delayed until after this one came through :)  Not sure why that happened...

-Steve



>
>From: Fawzi Mohamed <fawzi at gmx.ch>
>To: D's runtime library developers list <d-runtime at puremagic.com>
>Sent: Wed, July 14, 2010 8:57:29 AM
>Subject: Re: [D-runtime] Extending pages in GC
>
>
>
>On 14-lug-10, at 14:34, Steve Schveighoffer wrote:
>
>Hm... did you mean to reply to the newCapacity message?
>>
>>Even if that is the case, I don't really understand what you are saying
here...
>>
>>-Steve
>>
>>Sorry, I had written a first reply to the newCapacity message, and it seems that that first reply for unknown reasons did not go through to the list... so my second reply was indeed without context...

as It was not really correct, I just put here an excerpt for completeness sake
------
Again I am not too sure, I flip flopped a couple of times implementing http://github.com/fawzi/blip/blob/master/blip/util/Grow.d , a version of which I had also used in tango.

basically if you append to an array do you want the appending be determined by
the number of elements of the array, or its size?
Code will probably work depending on the number of elements of the array, not on
its memory usage.
In the end it is just a factor, but if you have many small elements I feel that
it if "reasonable" to allocate a smaller amount of extra space.
On the other hand as you pointed out if you thing strictly from the memory point
of view, one can argue that the percentile of extra space should depend only on
the space used, not on how it is used internally.

Finally I kept the "per element" view, but that was done without any real testing


(in reality I grow strictly on memory and the use the element size to round up, guaranteeing nice behavior for arrays with few large elements).
--------

about the sub page grows I am not sure if I agree, choosing those special sizes matches the sizes of the bins in the GC, so it makes sense to have special sizes

>
>----- Original Message ----
>
>From: Fawzi Mohamed <fawzi at gmx.ch>
>>
To: D's runtime library developers list <d-runtime at puremagic.com>
>
Sent: Wed, July 14, 2010 8:20:46 AM
>
Subject: Re: [D-runtime] Extending pages in GC
>

>
ehm, it seems that it was too long ago that I worked on that code.
>
Indeed I  flip-flopped one more time it seem, and I also decided to grow only
>
based on  memory usage.
>
The element size is used just to round the result to a multiple  of the element
>
size...
>
(this can have a large impact with arrays with few  large elements, I think it
>
is  correct)
>

>
Fawzi
>
_______________________________________________
>
D-runtime  mailing list
>
D-runtime at puremagic.com
>
http://lists.puremagic.com/mailman/listinfo/d-runtime
>

>


_______________________________________________
D-runtime mailing list
D-runtime at puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/d-runtime/attachments/20100714/2a35b5ac/attachment.html>