December 05, 2005
Sean Kelly wrote:

[...]
> However, since D objects are accessed via handles,
> the only issue should be OutOfMemory exceptions, and it's
> generally not too hard to preallocate needed memory
> before modifying data.

I fear that your answer passes the real problem without touching it.

I reported, that the runtime collapsed under winXP, when the problem size approached real memory limits. winXP reported 2.3 GB needed in total whereas only 2.0 GB real memory were available.

That led to very long times, 20 seconds, winXP was accessing the swapfile, shoveling about 600 MB around each time, interspersed by break outs of CPU activity.

No OutOfMemory exception was thrown. Maybe because the threads memory consumption did not reach the 2.0 GB real memory limit. But maybe also, that winXP will report OutOfMemory only, when the virtual address space of 1 TB is reached?

In fact I do not remember having had any OutOfMemory exception under winXP and filed a bug report on that issue also.

-manfred
December 05, 2005
Manfred Nowak wrote:
> 
> No OutOfMemory exception was thrown. Maybe because the threads memory consumption did not reach the 2.0 GB real memory limit. But maybe also, that winXP will report OutOfMemory only, when the virtual address space of 1 TB is reached?
> 
> In fact I do not remember having had any OutOfMemory exception under winXP and filed a bug report on that issue also.

I've noticed this behavior also.  In fact, I've gone close to the limits of a fixed-sized swapfile and the OS told me it was going to try and create more virtual memory.  Though I didn't track things exactly so I'm not entirely certain if I ever actually exceeded my virtual memory limits or I just passed the 1 GB barrier (my laptop has 512MB RAM and a 768MB swapfile IIRC).


Sean
December 05, 2005
>I do not understand this remark. Are you talking about management of the real memory whilst I was asking for the management of the virtual memory?
>
>Hmmmm... AA's are the core of management of the virtual memory, aren't they?
>
>-manfred

After reading:

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/30905

I'm assuming that you are using the GC for the 'internals' of your implementation, and now I have a better understanding of what you were getting at.

IMHO, it is best to let the GC handle these issues (like the rest of the D runtime library implementation), and let a future 'better' allocator/GC system account for exceptional memory conditions. That way the AA implementation will be stay consistent with the rest, along with being simpler to develop, debug and maintain. The current GC might not address the situation you describe very well right now, but that can always be addressed in the future if it becomes a serious problem.


December 05, 2005
Dave wrote:

[...]
> The current GC might not address the
> situation you describe very well right now, but that can always
> be addressed in the future if it becomes a serious problem.

After delivering a chunk of memory to the application the GC isnt aware of that memory anymore, except when allocating more memory.

Therefore a part of the time consuming swapping might stem from the activities of the GC, but that is the minor part.

Every time the application accesses a segment currently swapped out, the OS will need to swap it in again. That is the major reason for swapping and has nothing to do with the GC.

The GC does not know anything about the swapping mechanism of the underlying OSa nd therefore cannot be adapted to it.

But your argument together with the necessary coupling of the GC with the virtual managemnt looks like every language with a GC must also implement a manager for the virtual memory.

-manfred
December 05, 2005
Manfred Nowak wrote:
> 
> The GC does not know anything about the swapping mechanism of the underlying OSa nd therefore cannot be adapted to it.

Not strictly true.  Some of the fancier GCs use data from the virtual memory manager to avoid paging.  Granted, this locks a GC implementation to a specific platform, but it's not unheard of.


Sean
December 06, 2005
In article <Xns97235303CDD71svv1999hotmailcom@63.105.9.61>, Manfred Nowak says...
>
>Dave wrote:
>
>[...]
>> The current GC might not address the
>> situation you describe very well right now, but that can always
>> be addressed in the future if it becomes a serious problem.
>
>After delivering a chunk of memory to the application the GC isnt aware of that memory anymore, except when allocating more memory.
>
>Therefore a part of the time consuming swapping might stem from the activities of the GC, but that is the minor part.
>
>Every time the application accesses a segment currently swapped out, the OS will need to swap it in again. That is the major reason for swapping and has nothing to do with the GC.
>
>The GC does not know anything about the swapping mechanism of the underlying OSa nd therefore cannot be adapted to it.
>
>But your argument together with the necessary coupling of the GC with the virtual managemnt looks like every language with a GC must also implement a manager for the virtual memory.
>
>-manfred

Ok - I gotcha now.. The earlier term 'error' threw me a little bit, but basically you're talking about optimizing with regard to memory swapping because of it's impact on performance, right? I hope so, because no one should have to think this hard right after a cold beer <g>

Temporal and spatial locality issues have been a big area GC research for years now from what I gather. Of course, one of the big advantages of generational, compacting collectors is that they help to minimize swapping both during collection and (generally) give better locality for the current working set in the program itself. And even if those may not be the most efficient in terms of CPU instructions, they seem to be proving to be the best for CPU *cycle* efficiency.

From what I've read (and seen a bit of, actually), a non-generational, non-compacting GC like the current one for D would be the biggest 'offender' for swapping and VM 'abuse' during collection as it scans over what could be the entire memory space depending on the working set.

An AA implementation that pre-allocates a really large amount of memory may actually be defeating the purpose - making the job of the GC harder and actually increasing swapping both during AA accesses and then when the collector kicks in.

It's also very possible that designing a virtual memory manager tuned to something specific like an AA implementation could actually be self defeating in the context of a GC, and increase swapping as the VMM and GC compete for locality of the AA memory and live memory for other data in the program.

All this to say that I don't think a separate virtual memory management system tuned for one part of the runtime system would really be worth the effort and might be self-defeating. The GC implementation is really the key, IMHO. The AA implementation should really play in the same sandbox as the rest <g>

One idea is to make the memory pre-allocation for your AA implementation reverse logoritmic - as the AA set gets larger, the ratio of pre-allocated memory decreases. That should give really good performance for the more common, smaller sets and help to decrease swapping as the sets get larger. Plus it would help out the GC as it deals with other live data in the program.

- Dave


December 07, 2005
Manfred Nowak wrote:

[...]
> I will do shootout benchmarks next.
[...]

First results on hash.d

Size    t.curr  t.me    t.cal   Gain
4       14,6    16      14,6
7       14,7    16      14,7
10      15,3    17,4    15,2
11      16,7    18,4    15,9    212,50%
12      18,9    21,5    17,2    152,94%
14      32,5    40      26,5    125,00%
15      61      62      40      4,76%
16      108     106     65      -4,65%
17      291     214     123     -45,83%
20      2610    2015    909     -34,98%
22      25094   17466   1917    -32,91%
23      96496   69447   3950    -29,23%
24      >180000 371049


Time for current implementation is not measurable for parameter values less 1000.

As already shown, my implementation is faster for larger amounts of stored data, but because increased memory allocation reaches the limits of physical memory earlier and then collapses.

Some constants of the implementation can be adjusted so that losses in cases of fewer elements are less, but gains in cases of more lements are also lower.

How to decide for the general case?

-manfred
December 08, 2005
In article <Xns9725F3778AE4Dsvv1999hotmailcom@63.105.9.61>, Manfred Nowak says...
>
>Manfred Nowak wrote:
>
>[...]
>> I will do shootout benchmarks next.
>[...]
>
>First results on hash.d
>
>Size    t.curr  t.me    t.cal   Gain
>4       14,6    16      14,6
>7       14,7    16      14,7
>10      15,3    17,4    15,2
>11      16,7    18,4    15,9    212,50%
>12      18,9    21,5    17,2    152,94%
>14      32,5    40      26,5    125,00%
>15      61      62      40      4,76%
>16      108     106     65      -4,65%
>17      291     214     123     -45,83%
>20      2610    2015    909     -34,98%
>22      25094   17466   1917    -32,91%
>23      96496   69447   3950    -29,23%
>24      >180000 371049
>
>
>Time for current implementation is not measurable for parameter values less 1000.
>
>As already shown, my implementation is faster for larger amounts of
>stored data, but because increased memory allocation reaches the
>limits of physical memory earlier and then collapses.
>How to decide for the general case?

Can't you just make the algorithm change at a certain size?

/Oskar


December 08, 2005
Oskar Linde wrote:

[...]
> Can't you just make the algorithm change at a certain size?

Maybe, but how to decide that point for the general case? Which measure should be taken?

-manfred
December 08, 2005
Hi,

In article <Xns9726A6F56EEE0svv1999hotmailcom@63.105.9.61>, Manfred Nowak says...
>
>Oskar Linde wrote:
>
>[...]
>> Can't you just make the algorithm change at a certain size?
>
>Maybe, but how to decide that point for the general case? Which measure should be taken?

It's extremly difficult to define a general case. I'd guess that in most cases, AA are used for convenience, neither speed nor size needs to be optimized. Then there are the specific uses with specific needs. It is impossible to make a optimal choice without having the case that should be optimized. The goal for a general implementation should probably be to perform decently in many cases than excellently in few.

Performance tuning parameters are easy to change in the future. I'd say, make your best guess at a general usage case, and tune the parameters to this case. Then verify your parameters against a number of different test cases (different hash functions: good/poor, slow/fast, different usage patterns: ratio of insertion, deletes and lookups, different size of key/value data, etc...). Make sure that there are no places where you are considerably worse than a reference implementation.

I think you will find that cut-off points between different algorithms are not very sensitive. They would probably be placed in the middle of a range where the algorithms perform similarly.

If you by measure mean time vs space I'd say that since hash tables primary feature is time complexity, speed should be more important than memory usage, but there should also be a weighing between those. A 10 % speed increase at the cost of 100 % memory usage is probably not worth it for the general case.

(I feel like I've only written vague and obvious things. Apologies for the waste
of bandwidth :)

Regards,

Oskar


1 2 3 4
Next ›   Last »