General Problems for GC'ed Applications?

General Problems for GC'ed Applications?
Jul 23, 2006 Karen Lanrap
Jul 23, 2006 Unknown W. Brackets
Jul 23, 2006 Andrew Fedoniouk
Jul 23, 2006 Walter Bright
Jul 23, 2006 Andrew Fedoniouk
Jul 23, 2006 Unknown W. Brackets
Jul 24, 2006 Karen Lanrap
Jul 24, 2006 Dave
Jul 25, 2006 Karen Lanrap
Jul 25, 2006 Dave
Jul 25, 2006 Karen Lanrap
Jul 25, 2006 Unknown W. Brackets
Jul 25, 2006 Karen Lanrap
Jul 25, 2006 Unknown W. Brackets
Jul 27, 2006 Walter Bright
Jul 27, 2006 Sean Kelly
Jul 27, 2006 Frits van Bommel
Jul 27, 2006 Sean Kelly
Jul 27, 2006 Frits van Bommel
Jul 27, 2006 Walter Bright
Jul 27, 2006 Sean Kelly
Jul 28, 2006 Karen Lanrap
Jul 28, 2006 Sean Kelly
Jul 28, 2006 BCS
Jul 28, 2006 Walter Bright
Jul 28, 2006 Karen Lanrap
Jul 29, 2006 Walter Bright
Jul 29, 2006 Karen Lanrap
Jul 29, 2006 Walter Bright
Jul 29, 2006 Karen Lanrap
Jul 29, 2006 Walter Bright
Jul 29, 2006 Sean Kelly
Jul 31, 2006 Lionello Lunesu
Aug 02, 2006 Karen Lanrap
Jul 24, 2006 Dave
Jul 24, 2006 renox
Jul 26, 2006 Tommie Gannert
Jul 27, 2006 Karen Lanrap
Jul 27, 2006 Tommie Gannert
Jul 27, 2006 Walter Bright

July 23, 2006

Posted by Karen Lanrap

Permalink

Karen Lanrap

Permalink

I see three problems:

1) The typical behaviour of a GC'ed application is to require more and more main memory but not to need it. Hence every GC'ed application forces the OS to diminish the size of the system cache held in main memory until the GC of the application kicks in.

2) If the available main memory is unsufficient for the true memory requirements of the application and the OS provides virtual memory by swapping out to secondary storage, every run of the GC forces the OS to slowly swap back all data for this application from secondary storage and runs of the GC occur frequently, because main memory is tight.

3) If there is more than one GC'ed application running, those applications compete for the available main memory.

I see four risks:

a) from 1: The overall reaction of the system gets slower in favor for the GC'ed application.

b) from 2: Projects decomposed into several subtasks may face severe runtime problems when integrating the independent and succesful tested modules.

c) from 2 and b: The reduction of man time in the development and maintenance phases for not being forced to avoid memory leaks may be overly compensated by an increase of machine time by a factor of 50 or more.

d) from 1 and 3: A more complicated version of the dining philosophers problem is introduced. In this version every philosopher is allowed to rush around the table and grab all unused forks and declare them used, before he starts to eat---and nobody can force him to put them back on the table.

Conclusion:

I know that solving the original dining philosophers problem took
several years and I do not see any awareness towards this more
complicated version arising by using a GC.
Risks c) and d) are true killers.
Therefore GC'ed applications currently seem to be suitable only if
they are running single instance on a machine well equipped with
main memory and no other GC'ed applications are used.
To assure that these conditions hold, the GC should maintain
statistics on the duration of its runs and frequency of calls. This
would allows the GC to throw an "Almost out of memory".

July 23, 2006

Re: General Problems for GC'ed Applications?

Posted by Unknown W. Brackets
in reply to Karen Lanrap

Permalink

Unknown W. Brackets

Posted in reply to Karen Lanrap

Permalink

Personally, I see it as good coding practice to, in D, use delete on those things you know you need/want to delete immediately.  Just don't go nuts about those things you don't.

Typically, in a well programmed piece of software, the number of allocations you can't be entirely sure about deleting will be much less than those you are sure about.  Without a GC, this small percentage causes a huge amount of headache.  With it, you can ignore these and worry only about the easily-determined cases.

Throwing caution to the wind and using the GC for everything isn't, in my opinion, a good idea.  If anyone disagrees with me, I'd like to hear it, but I know of no reason why you'd want to do that.  I realize many people do throw caution to the wind, but so do many people drive drunk.  It doesn't mean anyone suggests it, necessarily.

#1 does not seem to be a severe problem to me.  Memory usage will be higher, but that's why there's flex for cache and buffers.

#2 would be a problem for any application.  I do not estimate that my application, if I did not track every allocation down (but only delete when I know for sure), would use even 50% more memory.

As such, I think it would be difficult to show a piece of well-written software that runs into this problem using the GC, but does not otherwise.  IMHO, if a program requires so much memory swapping starts happening frequently, it's a bug, a design flaw, or a fact of life. It's not the garbage collector.

Yes, a collect will cause swapping - if you have that much memory used.  Ideally, collects won't happen often (since they can't just happen whenever anyway, they happen when you use up milestones of memory) and you can disable/enable the GC and run collects manually when it makes the most sense for your software.

Failing that, software which is known to use a large amount of memory may need to use manual memory management.  Likely said software will perform poorly anyway.

#3 depends on the above cases being true.  Some competition may happen indeed, but I think it would take more analysis to see how strongly this would affect things.

Neither is this a new problem; programs will always compete for main memory.  #3 is only a problem when taking into account the higher memory usage, and is negligible if the memory usage is not much higher.

I would say these problems are heavily dependent on application design, code quality, and purpose.  In other words, these are more problems for the programmer than for the garbage collector.  But this is only my opinion.

As for the risks... a I see as the OS's problem where #1 is an issue; #2 I see as a problem regardless of garbage collection; c I agree with, as mentioned above, these should be balanced - only when it is not of benefit should the "leak" be left for the GC.

I'm afraid I'm not terribly familiar with the dining philosopher's problem, but again I think this is a problem only somewhat aggravated by garbage collection.

Most of your post seems to be wholly concerned with applications that use at least the exact figure of Too Much Memory (tm).  While I realize there are several special-cases where such usage is necessary or acceptable, I seem at a loss to think of any general or practical reasons, aside from poor code quality... or database systems.

The software on my home computer which uses the most memory uses nothing even close to the amount of system memory I have.  Indeed, the sum of memory use on my machine with several programs running is still less than that typically shipped with machines even a few years ago (I'm putting that number at 512 megabytes.)

The servers I manage have fairly standard amounts of ram (average, let's say, 2 gigabytes.)  Typically, even in periods of high traffic use, they do not take much swap.  In fact, Apache is garbage collected (pools/subpools, that is) and doesn't seem to be a problem at all.  PHP, which is used on many of these servers, is not garbage collected (traditional memory management) and tends to hog memory just a bit.

MySQL and other database systems obviously take the largest chunk.  For such systems, you don't want any of your data paged, ever.  You typically have large, static cache areas which you don't even want garbage collected, and you never realloc/free until the end of the process.  These areas would not be garbage collected and the data in them would not be scanned by the garbage collector.

In fact, you'd normally want your database on a dedicated machine with extra helpings of memory for these reasons.  Whether or not it was garbage collected wouldn't affect whether it was suitable only as a single instance on a single machine.  As above, this is a matter of the software, not of the GC.

So my suggestion is that you look at the limitations of your software, design, and development team and make your decisions from there.  A sweeping statement that garbage collection causes a dining philosopher's problem just doesn't seem correct to me.

Thanks,
-[Unknown]


> I see three problems:
> 
> 1) The typical behaviour of a GC'ed application is to require more and more main memory but not to need it. Hence every GC'ed application forces the OS to diminish the size of the system cache held in main memory until the GC of the application kicks in.
> 
> 2) If the available main memory is unsufficient for the true memory requirements of the application and the OS provides virtual memory by swapping out to secondary storage, every run of the GC forces the OS to slowly swap back all data for this application from secondary storage and runs of the GC occur frequently, because main memory is tight.
> 
> 3) If there is more than one GC'ed application running, those applications compete for the available main memory.
> 
> 
> I see four risks:
> 
> a) from 1: The overall reaction of the system gets slower in favor for the GC'ed application.
> 
> b) from 2: Projects decomposed into several subtasks may face severe runtime problems when integrating the independent and succesful tested modules.
> 
> c) from 2 and b: The reduction of man time in the development and maintenance phases for not being forced to avoid memory leaks may be overly compensated by an increase of machine time by a factor of 50 or more.
> 
> d) from 1 and 3: A more complicated version of the dining philosophers problem is introduced. In this version every philosopher is allowed to rush around the table and grab all unused forks and declare them used, before he starts to eat---and nobody can force him to put them back on the table.
> 
> 
> Conclusion:
> 
> I know that solving the original dining philosophers problem took several years and I do not see any awareness towards this more complicated version arising by using a GC.
> Risks c) and d) are true killers.
> Therefore GC'ed applications currently seem to be suitable only if they are running single instance on a machine well equipped with main memory and no other GC'ed applications are used.
> To assure that these conditions hold, the GC should maintain statistics on the duration of its runs and frequency of calls. This would allows the GC to throw an "Almost out of memory".

July 23, 2006

Re: General Problems for GC'ed Applications?

Posted by Andrew Fedoniouk
in reply to Unknown W. Brackets

Permalink

Andrew Fedoniouk

Posted in reply to Unknown W. Brackets

Permalink

I agree on 100%.

I would add only here:
GC is one of possible memory management technics.
Explicit new/delete (heap) is another, reference counting is
here too.
Each memory management technic is optimal in its
own area and use case.
No one of these is a mega-super-silver-bullet for all problems.

As D allows you to use both worlds it is potentially suitable for engineering optimal or suboptimal applications (in respect of memory consuming)

Yes, D does not allow you to use smart pointers/reference counting (as a memory management technic) but this is not a principal problem and I beleive will be implemented in D one day.

Andrew Fedoniouk.
http://terrainformatica.com

July 23, 2006

Re: General Problems for GC'ed Applications?

Posted by Walter Bright
in reply to Andrew Fedoniouk

Permalink

Walter Bright

Posted in reply to Andrew Fedoniouk

Permalink

Andrew Fedoniouk wrote:
> Yes, D does not allow you to use smart pointers/reference counting
> (as a memory management technic) but this is not a principal
> problem and I beleive will be implemented in D one day.

You can do reference counting in D programming, you just have to do it manually (I've been doing it manually for years, it isn't that bad).

July 23, 2006

Re: General Problems for GC'ed Applications?

Posted by Unknown W. Brackets
in reply to Unknown W. Brackets

Permalink

Unknown W. Brackets

Posted in reply to Unknown W. Brackets

Permalink

Oops.... I meant, of course:

As for the risks... (a) I see as the OS's problem, where #1 is an issue; (b) I see as a problem regardless of garbage collection; (c) I agree with, as I mentioned above, and these should be balanced - only when it is not of more use to track down the memory should it be left to the GC.

That's what I get for typing such a response at that time of night :/. "a", "#2", "c"... man...

-[Unknown]


> As for the risks... a I see as the OS's problem where #1 is an issue; #2 I see as a problem regardless of garbage collection; c I agree with, as mentioned above, these should be balanced - only when it is not of benefit should the "leak" be left for the GC.

July 23, 2006

Re: General Problems for GC'ed Applications?

Posted by Andrew Fedoniouk
in reply to Walter Bright

Permalink

Andrew Fedoniouk

Posted in reply to Walter Bright

Permalink

"Walter Bright" <newshound@digitalmars.com> wrote in message news:ea0e80$bkg$2@digitaldaemon.com...
> Andrew Fedoniouk wrote:
>> Yes, D does not allow you to use smart pointers/reference counting (as a memory management technic) but this is not a principal problem and I beleive will be implemented in D one day.
>
> You can do reference counting in D programming, you just have to do it manually (I've been doing it manually for years, it isn't that bad).

It is about smart pointers - automatic reference counting.

I mean ideal thing shall allow automatic GC *and* automatic RC. Both as first class citizens.

Currently you can do in D automatic GC but manual RC and
in C++ automatic RC and manual GC.

This situation is not doing anyone of them significantly better, at least for tasks in my area of competence.

I am personally using both - GC and RC so
I am making decisions (C++ or D) based on different criterias
(next step in decision making graph).

The problem is that market is dichotomic now: either do
everything non-manageable (RC) either manageable (GC).
But not both. (That ugly attempt named MC++ is just
a design disaster, IMO).

Having tool allowing to use best of both worlds will
put such tool on the next step of the ladder.

Andrew.

July 24, 2006

Re: General Problems for GC'ed Applications?

Posted by Karen Lanrap
in reply to Unknown W. Brackets

Permalink

Karen Lanrap

Posted in reply to Unknown W. Brackets

Permalink

Unknown W. Brackets wrote:

> Yes, a collect will cause swapping - if you have that much
> memory used.
>   Ideally, collects won't happen often (since they can't just
>   happen
> whenever anyway, they happen when you use up milestones of
> memory) and you can disable/enable the GC and run collects
> manually when it makes the most sense for your software.
> 
> Failing that, software which is known to use a large amount of memory may need to use manual memory management.  Likely said software will perform poorly anyway.

I disagree. Assume a non GC'ed program that allocates 1.5 GB to 1.7 GB memory, from which 0.7 GB to 0.9 GB are vital data. If you run this program on a machine equipped with 1 GB, the OS will swap out the 0.8 GB data that is accessed infrequently. Therefore this program cause swapping only if it accesses data from the swapped out part of data and the size of the swapped data will be approximately bounded by doubling the size of the data needed to be swapped back.

This changes dramatically if you GC it, because on every allocation the available main memory is exhausted and the GC requires the OS to swap all 0.8 GB back, doesn't it.

> I'm afraid I'm not terribly familiar with the dining philosopher's problem, but again I think this is a problem only somewhat aggravated by garbage collection.
> 
> Most of your post seems to be wholly concerned with applications that use at least the exact figure of Too Much Memory (tm).

It is not only somewhat aggravated. Assume the example given above is doubled by two instances of that program and the main memory is not only doubled to 2GB but increased to 4GB or even more.

Again both non GC'ed version of the program run without any performance problems, but the GC'ed versions do not---although the memory size is increased by a factor that enables the OS to not swap out any allocated data in case of the non GC'ed versions.

This is because both programs at least slowly increase their allocations of main memory.

This goes without performance problems unitl the available main memory is exhausted. The first program that hits the limit starts GC'ing its allocated memory---and forces the OS to swap all in. Hence this first program is getting in the danger that all memory freed by its GC is immetiately eaten up by the other instance, that continues running unaffected because its thirst for main memory is accompülished by the GC of the other instance, if that GC is freeing memory as the GC recognizes it.

At the time when this GC run ends there are at least two cases
distinguishable:
a) the main memory at the end of the run is still insufficient,
because the other application ate it all up. Then this instance
stops with "out of memory".
b) the main memory at the end of the run by chance is sufficient,
because the other application was not that hungry. Then this
instance will start being performant again. But only for the short
time until the limit is reached again.

This is a simple example with only one processor and two competing applications---and I believe that case a) can happen.

So I feel unable to prove that on multi-core machines running several GC'ed applications the case a) will never happen.

And even if case a) never happens there might be always at least one application that is running its GC. Hence swapping si always on the run.

> A sweeping statement that garbage collection causes
> a dining philosopher's problem just doesn't seem correct to me.

Then prove me wrong.

July 24, 2006

Re: General Problems for GC'ed Applications?

Posted by Dave
in reply to Karen Lanrap

Permalink

Dave

Posted in reply to Karen Lanrap

Permalink

Karen Lanrap wrote:
> I see three problems:
> 
> 1) The typical behaviour of a GC'ed application is to require more and more main memory but not to need it. Hence every GC'ed application forces the OS to diminish the size of the system cache held in main memory until the GC of the application kicks in.
> 

That's a pretty big assertion - have you some evidence, particularly w.r.t. mark/sweep which the current D implementation uses? Have you looked at the current implementation to see if your assertion jives? Have you (well) written D applications that show this to be the case?

http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all&calc=Calculate&xfullcpu=1&xmem=1

Yes these are trivial benchmarks, but they were not written to be especially memory efficient (more concerned with wall clock performance although of course they are often one in the same):

D does pretty well there and all of the tests use the GC.

Plus, with some form of well implemented moving/compacting collector D would even do better with regard to both speed and space.

The reason the implementation is important is because now when D requests more memory from the GC, the GC will first try to satisfy that request by collecting. It's not a real "greedy" allocator. So, unless a program is for the length of the program holding onto a lot of memory (e.g.: using a lot of global reference vars) then what you describe should not really happen that much more often than by using the crt to manage memory.

> 2) If the available main memory is unsufficient for the true memory requirements of the application and the OS provides virtual memory by swapping out to secondary storage, every run of the GC forces the OS to slowly swap back all data for this application from secondary storage and runs of the GC occur frequently, because main memory is tight.
> 

This will and does happen, but it's no different than if you cache a series of large files (or whatever) in memory using space allocated by malloc and then repeatedly access widely different offsets of those chunks of memory. Actually in that case if you'd used the GC you may be _less_ likely to repeatedly swap. In other words, if you're judicious with your memory usage w/ either the GC or crt you may be as likely to run into the problem with either.

> 3) If there is more than one GC'ed application running, those applications compete for the available main memory.
> 

Yes but if assertion #1 is not necessarily true, then malloc/free is no magic bullet here over a GC either (obviously).

> 
> I see four risks:
> 
> a) from 1: The overall reaction of the system gets slower in favor for the GC'ed application.
> 
> b) from 2: Projects decomposed into several subtasks may face severe runtime problems when integrating the independent and succesful tested modules.
> 
> c) from 2 and b: The reduction of man time in the development and maintenance phases for not being forced to avoid memory leaks may be overly compensated by an increase of machine time by a factor of 50 or more.
> 
> d) from 1 and 3: A more complicated version of the dining philosophers problem is introduced. In this version every philosopher is allowed to rush around the table and grab all unused forks and declare them used, before he starts to eat---and nobody can force him to put them back on the table.
> 
> 
> Conclusion:
> 
> I know that solving the original dining philosophers problem took several years and I do not see any awareness towards this more complicated version arising by using a GC.
> Risks c) and d) are true killers.
> Therefore GC'ed applications currently seem to be suitable only if they are running single instance on a machine well equipped with main memory and no other GC'ed applications are used.
> To assure that these conditions hold, the GC should maintain statistics on the duration of its runs and frequency of calls. This would allows the GC to throw an "Almost out of memory".

July 24, 2006

Re: General Problems for GC'ed Applications?

Posted by Dave
in reply to Karen Lanrap

Permalink

Dave

Posted in reply to Karen Lanrap

Permalink

Karen Lanrap wrote:
> Unknown W. Brackets wrote:
> 
>> Yes, a collect will cause swapping - if you have that much
>> memory used.   Ideally, collects won't happen often (since they can't just
>>   happen whenever anyway, they happen when you use up milestones of
>> memory) and you can disable/enable the GC and run collects
>> manually when it makes the most sense for your software.
>>
>> Failing that, software which is known to use a large amount of
>> memory may need to use manual memory management.  Likely said
>> software will perform poorly anyway.
> 
> I disagree. Assume a non GC'ed program that allocates 1.5 GB to 1.7 GB memory, from which 0.7 GB to 0.9 GB are vital data. If you run this program on a machine equipped with 1 GB, the OS will swap out the 0.8 GB data that is accessed infrequently. Therefore this program cause swapping only if it accesses data from the swapped out part of data and the size of the swapped data will be approximately bounded by doubling the size of the data needed to be swapped back.
> 
> This changes dramatically if you GC it, because on every allocation the available main memory is exhausted and the GC requires the OS to swap all 0.8 GB back, doesn't it. 
> 
> 
>> I'm afraid I'm not terribly familiar with the dining
>> philosopher's problem, but again I think this is a problem only
>> somewhat aggravated by garbage collection.
>>
>> Most of your post seems to be wholly concerned with applications
>> that use at least the exact figure of Too Much Memory (tm). 
> 
> It is not only somewhat aggravated. Assume the example given above is doubled by two instances of that program and the main memory is not only doubled to 2GB but increased to 4GB or even more.
> 
> Again both non GC'ed version of the program run without any performance problems, but the GC'ed versions do not---although the memory size is increased by a factor that enables the OS to not swap out any allocated data in case of the non GC'ed versions.
> 
> This is because both programs at least slowly increase their allocations of main memory.
> 
> This goes without performance problems unitl the available main memory is exhausted. The first program that hits the limit starts GC'ing its allocated memory---and forces the OS to swap all in. Hence this first program is getting in the danger that all memory freed by its GC is immetiately eaten up by the other instance, that continues running unaffected because its thirst for main memory is accompülished by the GC of the other instance, if that GC is freeing memory as the GC recognizes it.
> 
> At the time when this GC run ends there are at least two cases distinguishable:
> a) the main memory at the end of the run is still insufficient, because the other application ate it all up. Then this instance stops with "out of memory".
> b) the main memory at the end of the run by chance is sufficient, because the other application was not that hungry. Then this instance will start being performant again. But only for the short time until the limit is reached again.
> 
> This is a simple example with only one processor and two competing applications---and I believe that case a) can happen.
> 
> So I feel unable to prove that on multi-core machines running several GC'ed applications the case a) will never happen.
> 
> And even if case a) never happens there might be always at least one application that is running its GC. Hence swapping si always on the run. 
> 

Someone else pointed out earlier that "stupid is as stupid does" with regard to memory mgmt., whether or not your using a GC. This has historically been a big problem with how Java programs are written. D OHOH allows manual deletion, plus the combination of the GC along with things like easy array slicing should allow for memory efficient design patterns that are not only feasible but efficient to develop and maintain.

Plus of course D allows crt memory mgmt. should you really need that.

>  
>> A sweeping statement that garbage collection causes
>> a dining philosopher's problem just doesn't seem correct to me.
> 
> Then prove me wrong.

You're making the original assertion that it's a problem - I believe the onus is on you to prove that it would apply to efficient design patterns using D <g>

July 24, 2006

Re: General Problems for GC'ed Applications?

Posted by renox
in reply to Karen Lanrap

Permalink

renox

Posted in reply to Karen Lanrap

Permalink

Karen Lanrap wrote:

> I see three problems:
> 
> 1) The typical behaviour of a GC'ed application is to require more and more main memory but not to need it. 

I think you're going overboard here, but it's nonetheless true that there are conflicting needs: the less GC pass there are, the more efficient is your application running (for non-interactive application of course, otherwise 'pause time' is a problem) .. until you run out of memory and either swap starts to be a big problem or disk access efficiency is reduced.

But there is research to make the OS and the GC cooperates, for example:
http://www.cs.umass.edu/~emery/pubs/04-16.pdf

Now the problem is obviously that the OS won't implement this until a GC needs it and vice versa, with Linux this could be probably done (that's what the researchers used), for Windows, forget it (unless .Net implement this of course and even in this case, it's not obvious that it would be an 'open' API).

Regards,
RenoX

Top | Forum index | About this forum

Forums