View mode: basic / threaded / horizontal-split · Log in · Help
December 05, 2005
Re: Refining AA's
Sean Kelly wrote:

[...]
> However, since D objects are accessed via handles,
> the only issue should be OutOfMemory exceptions, and it's
> generally not too hard to preallocate needed memory 
> before modifying data.

I fear that your answer passes the real problem without touching it.

I reported, that the runtime collapsed under winXP, when the problem 
size approached real memory limits. winXP reported 2.3 GB needed in 
total whereas only 2.0 GB real memory were available.

That led to very long times, 20 seconds, winXP was accessing the 
swapfile, shoveling about 600 MB around each time, interspersed by 
break outs of CPU activity.

No OutOfMemory exception was thrown. Maybe because the threads memory 
consumption did not reach the 2.0 GB real memory limit. But maybe 
also, that winXP will report OutOfMemory only, when the virtual 
address space of 1 TB is reached?

In fact I do not remember having had any OutOfMemory exception under 
winXP and filed a bug report on that issue also.

-manfred
December 05, 2005
Re: Refining AA's
Manfred Nowak wrote:
> 
> No OutOfMemory exception was thrown. Maybe because the threads memory 
> consumption did not reach the 2.0 GB real memory limit. But maybe 
> also, that winXP will report OutOfMemory only, when the virtual 
> address space of 1 TB is reached?
> 
> In fact I do not remember having had any OutOfMemory exception under 
> winXP and filed a bug report on that issue also.

I've noticed this behavior also.  In fact, I've gone close to the limits 
of a fixed-sized swapfile and the OS told me it was going to try and 
create more virtual memory.  Though I didn't track things exactly so I'm 
not entirely certain if I ever actually exceeded my virtual memory 
limits or I just passed the 1 GB barrier (my laptop has 512MB RAM and a 
768MB swapfile IIRC).


Sean
December 05, 2005
Re: Refining AA's
>I do not understand this remark. Are you talking about management of 
>the real memory whilst I was asking for the management of the virtual 
>memory?
>
>Hmmmm... AA's are the core of management of the virtual memory, 
>aren't they?
>
>-manfred 

After reading:

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/30905

I'm assuming that you are using the GC for the 'internals' of your
implementation, and now I have a better understanding of what you were getting
at.

IMHO, it is best to let the GC handle these issues (like the rest of the D
runtime library implementation), and let a future 'better' allocator/GC system
account for exceptional memory conditions. That way the AA implementation will
be stay consistent with the rest, along with being simpler to develop, debug and
maintain. The current GC might not address the situation you describe very well
right now, but that can always be addressed in the future if it becomes a
serious problem.
December 05, 2005
Re: Refining AA's
Dave wrote:

[...]
> The current GC might not address the
> situation you describe very well right now, but that can always
> be addressed in the future if it becomes a serious problem.

After delivering a chunk of memory to the application the GC isnt 
aware of that memory anymore, except when allocating more memory.

Therefore a part of the time consuming swapping might stem from the 
activities of the GC, but that is the minor part.

Every time the application accesses a segment currently swapped out, 
the OS will need to swap it in again. That is the major reason for 
swapping and has nothing to do with the GC.

The GC does not know anything about the swapping mechanism of the 
underlying OSa nd therefore cannot be adapted to it.

But your argument together with the necessary coupling of the GC with 
the virtual managemnt looks like every language with a GC must also 
implement a manager for the virtual memory.

-manfred
December 05, 2005
Re: Refining AA's
Manfred Nowak wrote:
> 
> The GC does not know anything about the swapping mechanism of the 
> underlying OSa nd therefore cannot be adapted to it.

Not strictly true.  Some of the fancier GCs use data from the virtual 
memory manager to avoid paging.  Granted, this locks a GC implementation 
to a specific platform, but it's not unheard of.


Sean
December 06, 2005
Re: Refining AA's
In article <Xns97235303CDD71svv1999hotmailcom@63.105.9.61>, Manfred Nowak
says...
>
>Dave wrote:
>
>[...]
>> The current GC might not address the
>> situation you describe very well right now, but that can always
>> be addressed in the future if it becomes a serious problem.
>
>After delivering a chunk of memory to the application the GC isnt 
>aware of that memory anymore, except when allocating more memory.
>
>Therefore a part of the time consuming swapping might stem from the 
>activities of the GC, but that is the minor part.
>
>Every time the application accesses a segment currently swapped out, 
>the OS will need to swap it in again. That is the major reason for 
>swapping and has nothing to do with the GC.
>
>The GC does not know anything about the swapping mechanism of the 
>underlying OSa nd therefore cannot be adapted to it.
>
>But your argument together with the necessary coupling of the GC with 
>the virtual managemnt looks like every language with a GC must also 
>implement a manager for the virtual memory.
>
>-manfred

Ok - I gotcha now.. The earlier term 'error' threw me a little bit, but
basically you're talking about optimizing with regard to memory swapping because
of it's impact on performance, right? I hope so, because no one should have to
think this hard right after a cold beer <g>

Temporal and spatial locality issues have been a big area GC research for years
now from what I gather. Of course, one of the big advantages of generational,
compacting collectors is that they help to minimize swapping both during
collection and (generally) give better locality for the current working set in
the program itself. And even if those may not be the most efficient in terms of
CPU instructions, they seem to be proving to be the best for CPU *cycle*
efficiency.

From what I've read (and seen a bit of, actually), a non-generational,
non-compacting GC like the current one for D would be the biggest 'offender' for
swapping and VM 'abuse' during collection as it scans over what could be the
entire memory space depending on the working set.

An AA implementation that pre-allocates a really large amount of memory may
actually be defeating the purpose - making the job of the GC harder and actually
increasing swapping both during AA accesses and then when the collector kicks
in.

It's also very possible that designing a virtual memory manager tuned to
something specific like an AA implementation could actually be self defeating in
the context of a GC, and increase swapping as the VMM and GC compete for
locality of the AA memory and live memory for other data in the program.

All this to say that I don't think a separate virtual memory management system
tuned for one part of the runtime system would really be worth the effort and
might be self-defeating. The GC implementation is really the key, IMHO. The AA
implementation should really play in the same sandbox as the rest <g>

One idea is to make the memory pre-allocation for your AA implementation reverse
logoritmic - as the AA set gets larger, the ratio of pre-allocated memory
decreases. That should give really good performance for the more common, smaller
sets and help to decrease swapping as the sets get larger. Plus it would help
out the GC as it deals with other live data in the program.

- Dave
December 07, 2005
Re: Refining AA's
Manfred Nowak wrote:

[...]
> I will do shootout benchmarks next.
[...]

First results on hash.d

Size    t.curr  t.me    t.cal   Gain
4       14,6    16      14,6    
7       14,7    16      14,7    
10      15,3    17,4    15,2    
11      16,7    18,4    15,9    212,50%
12      18,9    21,5    17,2    152,94%
14      32,5    40      26,5    125,00%
15      61      62      40      4,76%
16      108     106     65      -4,65%
17      291     214     123     -45,83%
20      2610    2015    909     -34,98%
22      25094   17466   1917    -32,91%
23      96496   69447   3950    -29,23%
24      >180000 371049          


Time for current implementation is not measurable for parameter 
values less 1000.

As already shown, my implementation is faster for larger amounts of 
stored data, but because increased memory allocation reaches the 
limits of physical memory earlier and then collapses.

Some constants of the implementation can be adjusted so that losses 
in cases of fewer elements are less, but gains in cases of more 
lements are also lower.

How to decide for the general case?

-manfred
December 08, 2005
Re: Refining AA's
In article <Xns9725F3778AE4Dsvv1999hotmailcom@63.105.9.61>, Manfred Nowak
says...
>
>Manfred Nowak wrote:
>
>[...]
>> I will do shootout benchmarks next.
>[...]
>
>First results on hash.d
>
>Size    t.curr  t.me    t.cal   Gain
>4       14,6    16      14,6    
>7       14,7    16      14,7    
>10      15,3    17,4    15,2    
>11      16,7    18,4    15,9    212,50%
>12      18,9    21,5    17,2    152,94%
>14      32,5    40      26,5    125,00%
>15      61      62      40      4,76%
>16      108     106     65      -4,65%
>17      291     214     123     -45,83%
>20      2610    2015    909     -34,98%
>22      25094   17466   1917    -32,91%
>23      96496   69447   3950    -29,23%
>24      >180000 371049          
>
>
>Time for current implementation is not measurable for parameter 
>values less 1000.
>
>As already shown, my implementation is faster for larger amounts of 
>stored data, but because increased memory allocation reaches the 
>limits of physical memory earlier and then collapses.
>How to decide for the general case?

Can't you just make the algorithm change at a certain size?

/Oskar
December 08, 2005
Re: Refining AA's
Oskar Linde wrote:

[...]
> Can't you just make the algorithm change at a certain size?

Maybe, but how to decide that point for the general case? Which 
measure should be taken?

-manfred
December 08, 2005
Re: Refining AA's
Hi,

In article <Xns9726A6F56EEE0svv1999hotmailcom@63.105.9.61>, Manfred Nowak
says...
>
>Oskar Linde wrote:
>
>[...]
>> Can't you just make the algorithm change at a certain size?
>
>Maybe, but how to decide that point for the general case? Which 
>measure should be taken?

It's extremly difficult to define a general case. I'd guess that in most cases,
AA are used for convenience, neither speed nor size needs to be optimized. Then
there are the specific uses with specific needs. It is impossible to make a
optimal choice without having the case that should be optimized. The goal for a
general implementation should probably be to perform decently in many cases than
excellently in few.

Performance tuning parameters are easy to change in the future. I'd say, make
your best guess at a general usage case, and tune the parameters to this case.
Then verify your parameters against a number of different test cases (different
hash functions: good/poor, slow/fast, different usage patterns: ratio of
insertion, deletes and lookups, different size of key/value data, etc...). Make
sure that there are no places where you are considerably worse than a reference
implementation.

I think you will find that cut-off points between different algorithms are not
very sensitive. They would probably be placed in the middle of a range where the
algorithms perform similarly.

If you by measure mean time vs space I'd say that since hash tables primary
feature is time complexity, speed should be more important than memory usage,
but there should also be a weighing between those. A 10 % speed increase at the
cost of 100 % memory usage is probably not worth it for the general case.

(I feel like I've only written vague and obvious things. Apologies for the waste
of bandwidth :)

Regards,

Oskar
Next ›   Last »
1 2 3 4
Top | Discussion index | About this forum | D home