Thread overview
Big Downside of Mark & Sweep
May 15, 2002
Russ Lewis
May 15, 2002
Walter
May 16, 2002
Matthew Wilson
May 16, 2002
Walter
May 15, 2002
As I understand OS architecture, SOME operating systems will allocate memory for you without actually allocating the pages.  I know AIX does this, and I think Linux does as well.  Basically, the idea is that the OS will allow a malloc() to happen, and mark off that you have x amount of pages now legally usable at a given address, but it will not modify the page table (or allocate physical or virtual memory for you) until you actually touch the page (and get a page fault on it).

This works very well for situations where you might allocate big blocks of memory and in some situations not use them all.  However, a mark & sweep collector ruins this.  The mark & sweep collector, as I understand it, must scan all allocated memory for references to other memory blocks; thus, even if you allocate 100 MB of memory and never touch it, the next time that the GC runs every byte in that block will be examined, and so thousands of useless page faults will occur.  The OS may even run out of memory trying to allocate pages for those useless hits.  (At least on AIX, the OS will allow you to allocate more memory than its virtual paging can actually handle...but if you use all of it (and nobody else deallocates in the meantime), you will start to get "low memory" signals rampaging through the system.)

This problem gets worse if the GC runs periodically in the background. Imagine a large program, with these large blocks of data kept around for a long time.  The GC decides to run, and suddenly the whole computer bogs down as the mark-and-sweep causes a storm of page swapping in the virtual memory system.  When it's done, all of those pages it touched have to be moved back to the virtual memory and the useful pages returned to physical memory.

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]


May 15, 2002
Those are good points, and why gc is not a panacea for all memory problems. There are several workarounds:

1) Mark the large allocation as "atomic" which means it doesn't get scanned.
2) Allocate & manage the large object separately from the gc. Nothing in D
requires use of the gc, unlike Java. The gc itself uses memory mapped files
for its pools.
3) Internal improvements can be made to the gc itself when D supports
reflection to avoid unnecessary scanning.

"Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:3CE29544.1AAC2863@deming-os.org...
> As I understand OS architecture, SOME operating systems will allocate memory for you without actually allocating the pages.  I know AIX does this, and I think Linux does as well.  Basically, the idea is that the OS will allow a malloc() to happen, and mark off that you have x amount of pages now legally usable at a given address, but it will not modify the page table (or allocate physical or virtual memory for you) until you actually touch the page (and get a page fault on it).
>
> This works very well for situations where you might allocate big blocks of memory and in some situations not use them all.  However, a mark & sweep collector ruins this.  The mark & sweep collector, as I understand it, must scan all allocated memory for references to other memory blocks; thus, even if you allocate 100 MB of memory and never touch it, the next time that the GC runs every byte in that block will be examined, and so thousands of useless page faults will occur.  The OS may even run out of memory trying to allocate pages for those useless hits.  (At least on AIX, the OS will allow you to allocate more memory than its virtual paging can actually handle...but if you use all of it (and nobody else deallocates in the meantime), you will start to get "low memory" signals rampaging through the system.)
>
> This problem gets worse if the GC runs periodically in the background. Imagine a large program, with these large blocks of data kept around for a long time.  The GC decides to run, and suddenly the whole computer bogs down as the mark-and-sweep causes a storm of page swapping in the virtual memory system.  When it's done, all of those pages it touched have to be moved back to the virtual memory and the useful pages returned to physical memory.
>
> --
> The Villagers are Online! villagersonline.com
>
> .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
> .[ (a version.of(English).(precise.more)) is(possible) ]
> ?[ you want.to(help(develop(it))) ]
>
>


May 16, 2002
Walter

Can you point me to a resource describing the details of D's GC architecture? I had not realised it was using mark and sweep, and would be very interested in reading around your current design (and any previous incarnations).

Matthew

"Walter" <walter@digitalmars.com> wrote in message news:abudjd$evt$1@digitaldaemon.com...
> Those are good points, and why gc is not a panacea for all memory
problems.
> There are several workarounds:
>
> 1) Mark the large allocation as "atomic" which means it doesn't get
scanned.
> 2) Allocate & manage the large object separately from the gc. Nothing in D requires use of the gc, unlike Java. The gc itself uses memory mapped
files
> for its pools.
> 3) Internal improvements can be made to the gc itself when D supports
> reflection to avoid unnecessary scanning.
>
> "Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:3CE29544.1AAC2863@deming-os.org...
> > As I understand OS architecture, SOME operating systems will allocate memory for you without actually allocating the pages.  I know AIX does this, and I think Linux does as well.  Basically, the idea is that the OS will allow a malloc() to happen, and mark off that you have x amount of pages now legally usable at a given address, but it will not modify the page table (or allocate physical or virtual memory for you) until you actually touch the page (and get a page fault on it).
> >
> > This works very well for situations where you might allocate big blocks of memory and in some situations not use them all.  However, a mark & sweep collector ruins this.  The mark & sweep collector, as I understand it, must scan all allocated memory for references to other memory blocks; thus, even if you allocate 100 MB of memory and never touch it, the next time that the GC runs every byte in that block will be examined, and so thousands of useless page faults will occur.  The OS may even run out of memory trying to allocate pages for those useless hits.  (At least on AIX, the OS will allow you to allocate more memory than its virtual paging can actually handle...but if you use all of it (and nobody else deallocates in the meantime), you will start to get "low memory" signals rampaging through the system.)
> >
> > This problem gets worse if the GC runs periodically in the background. Imagine a large program, with these large blocks of data kept around for a long time.  The GC decides to run, and suddenly the whole computer bogs down as the mark-and-sweep causes a storm of page swapping in the virtual memory system.  When it's done, all of those pages it touched have to be moved back to the virtual memory and the useful pages returned to physical memory.
> >
> > --
> > The Villagers are Online! villagersonline.com
> >
> > .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
> > .[ (a version.of(English).(precise.more)) is(possible) ]
> > ?[ you want.to(help(develop(it))) ]
> >
> >
>
>


May 16, 2002
Even better, the complete source comes with the D compiler.

"Matthew Wilson" <mwilson@nextgengaming.com> wrote in message news:abvdkj$19d3$1@digitaldaemon.com...
> Walter
>
> Can you point me to a resource describing the details of D's GC architecture? I had not realised it was using mark and sweep, and would be very interested in reading around your current design (and any previous incarnations).
>
> Matthew