Thread overview
Odd behaviour with large arrays
Feb 08, 2005
Niko Korhonen
Feb 08, 2005
Dave
Feb 08, 2005
pragma
Feb 08, 2005
John Demme
Feb 09, 2005
Niko Korhonen
Feb 09, 2005
Ilya Minkov
February 08, 2005
Consider the following D program:

int main()
{
	const int ARRAY_SIZE = 1024 * 1024 * 128;
	
	ubyte[] array = new ubyte[ARRAY_SIZE];
	
	printf("Walking through the array...\n");
	
	for (int i = 0; i < ARRAY_SIZE; i++)
		array[i] = cast(ubyte)i;
		
	printf("Walk complete\n");	
	return 0;
}

As you can see, we allocate a large array and walk through it, setting the values to something.

When compiled with DMD 0.112 this program executes strangely on my machine. The message "Walk complete" gets printed pretty quickly, but *after* that (between printing the message and returning from the program) the program hangs with 100% CPU usage for a very long time.

What is happening here? Can someone reproduce this? I reckon it has something to do with GC because if I replace new byte[...] with malloc, it works properly.
February 08, 2005
Niko Korhonen wrote:

> When compiled with DMD 0.112 this program executes strangely on my machine. The message "Walk complete" gets printed pretty quickly, but *after* that (between printing the message and returning from the program) the program hangs with 100% CPU usage for a very long time.
> 
> What is happening here? Can someone reproduce this? I reckon it has something to do with GC because if I replace new byte[...] with malloc, it works properly.

Same behaviour with GDC. A small pause before it starts "walking",
and an enormous CPU-eating hang at the very end of the program...
(not sure if it completes, since I terminated it when it got boring)

Replacing GC with malloc/free shows no pauses, and returns quickly.
Porting same program to Java it's slow to start, but finishes quickly...
Seems like the garbage collection still has a long way to go in D ?

--anders
February 08, 2005
In article <cuaalt$1lqi$1@digitaldaemon.com>, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
>
>Niko Korhonen wrote:
>
>> When compiled with DMD 0.112 this program executes strangely on my machine. The message "Walk complete" gets printed pretty quickly, but *after* that (between printing the message and returning from the program) the program hangs with 100% CPU usage for a very long time.
>> 
>> What is happening here? Can someone reproduce this? I reckon it has something to do with GC because if I replace new byte[...] with malloc, it works properly.
>
>Same behaviour with GDC. A small pause before it starts "walking", and an enormous CPU-eating hang at the very end of the program... (not sure if it completes, since I terminated it when it got boring)
>
>Replacing GC with malloc/free shows no pauses, and returns quickly. Porting same program to Java it's slow to start, but finishes quickly... Seems like the garbage collection still has a long way to go in D ?
>
>--anders

On Linux it finishes immediately with DMD v0.112, but thrashes with GDC v0.10. On Windows DMD v0.112 thrashes as you mention.

Here's what I get on Linux:

# time test_dmd
Walking through the array...
Walk complete

real    0m0.744s
user    0m0.254s
sys     0m0.350s

# time java -client -Xmx256M Test
Walking through the array...
Walk complete

real    0m1.279s
user    0m0.742s
sys     0m0.329s

# time java -server -Xmx256M Test
Walking through the array...
Walk complete

real    0m1.015s
user    0m0.489s
sys     0m0.328s

# time test_gdc
Walking through the array...
Walk complete

real    0m13.854s
user    0m12.692s
sys     0m0.355s

- Dave


February 08, 2005
Dave wrote:

>>Same behaviour with GDC. A small pause before it starts "walking",
>>and an enormous CPU-eating hang at the very end of the program...
>>(not sure if it completes, since I terminated it when it got boring)
> 
> On Linux it finishes immediately with DMD v0.112, but thrashes with GDC v0.10.
> On Windows DMD v0.112 thrashes as you mention.

I should have mentioned that I tested on Mac OS X, with GDC 0.10.
After trashing around for 30-40 seconds, it eventually completed.

--anders
February 08, 2005
In article <cua7va$1fcg$1@digitaldaemon.com>, Niko Korhonen says...
>
>Consider the following D program:
>
>int main()
>{
>	const int ARRAY_SIZE = 1024 * 1024 * 128;
>
>	ubyte[] array = new ubyte[ARRAY_SIZE];
>
>	printf("Walking through the array...\n");
>
>	for (int i = 0; i < ARRAY_SIZE; i++)
>		array[i] = cast(ubyte)i;
>
>	printf("Walk complete\n");
>	return 0;
>}
>
>As you can see, we allocate a large array and walk through it, setting the values to something.
>
>When compiled with DMD 0.112 this program executes strangely on my machine. The message "Walk complete" gets printed pretty quickly, but *after* that (between printing the message and returning from the program) the program hangs with 100% CPU usage for a very long time.
>
>What is happening here? Can someone reproduce this? I reckon it has something to do with GC because if I replace new byte[...] with malloc, it works properly.

Just a thought: I think the GC is trying to scan your 128MB of array data for additional references.

I haven't tried this, but perhaps use "std.gc.removeRange(array)" to tell the GC to "keep out".  Seeing as how it's raw data (all ubytes), the GC shouldn't need to scan it anyway.  Leaving it scannable just increases the likelyhood of false positives for object references while scanning... so this is really a best practice with large blobs of data.

Also, you can test the theory further by using "std.gc.disable()" and
"std.gc.enable()" to see *where* the GC is slowing things down.

- EricAnderton at yahoo
February 08, 2005
pragma wrote:

> I haven't tried this, but perhaps use "std.gc.removeRange(array)" to tell the GC
> to "keep out".  Seeing as how it's raw data (all ubytes), the GC shouldn't need
> to scan it anyway.  Leaving it scannable just increases the likelyhood of false
> positives for object references while scanning... so this is really a best
> practice with large blobs of data.

Adding std.gc.removeRange(array) did not make a difference, on Mac OS X.

default:
> 62.82s user 2.95s system 55% cpu 1:58.41 total

removeRange:
> 70.60s user 2.22s system 58% cpu 2:03.72 total

malloc/free:
> 3.09s user 1.00s system 67% cpu 6.096 total

Still takes several minutes to complete the program, when using GC...

--anders
February 08, 2005
pragma wrote:
> Also, you can test the theory further by using "std.gc.disable()" and
> "std.gc.enable()" to see *where* the GC is slowing things down.
> 
> - EricAnderton at yahoo

Unless it's been implemented without me noticing, these don't do anything. :)
February 09, 2005
pragma wrote:
> Just a thought: I think the GC is trying to scan your 128MB of array data for
> additional references.

That's what I though but then it should trash on Linux too. Aynway, since the GC knows it's a 128MB array of POD, why it would scan through it? And why it doesn't do so with DMD/Linux?

I tested it with a more realistic use case:

int main()
{
	const int ARRAY_SIZE = 1024 * 1024 * 32;
	
	ubyte[] array1 = new ubyte[ARRAY_SIZE];
	ubyte[] array2 = new ubyte[ARRAY_SIZE];
	ubyte[] array3 = new ubyte[ARRAY_SIZE];
	ubyte[] array4 = new ubyte[ARRAY_SIZE];
	
	printf("Walking through the arrays...\n");
	
	for (int i = 0; i < ARRAY_SIZE; i++)
		array1[i] = cast(ubyte)i;
		
	for (int i = 0; i < ARRAY_SIZE; i++)
		array2[i] = cast(ubyte)i;

	for (int i = 0; i < ARRAY_SIZE; i++)
		array3[i] = cast(ubyte)i;

	for (int i = 0; i < ARRAY_SIZE; i++)
		array4[i] = cast(ubyte)i;

	printf("Walk complete\n");
	
	return 0;
}

I'm still allocating 128 MB, but in smaller chunks. It works nice and smooth, no trashing.

Using smaller arrays definitely cures the problem, but I still would like to know whether it is normal/expected behaviour? And why it doesn't trash with DMD/Linux?
February 09, 2005
The following happens: when the program finishes, the heap needs to be walked and analyzed for pointers to something that is registered with the garbage collector as an object that needs finilization. The walking itself is not the problem, but the check for everything which looks like it might be a reference into GC memory requieres walking fairly large structures and perhaps one or another function call. Thus it is slow.

For the future, the gabage collector is likely to be made more intelligent to avoid scanning blocks of non-pointer data, but now it doesn't have this sophistication. Besides, better algorithms for discarding references which are definately not into GC heap might help - but i'm not sure whether there is any room for improvement left in this regard. One clue would be to compare different implementations - DMD on Windows uses it's own GC, GDC uses Boehm as far as i remember. I don't know what GC DMD on Linux uses.

For now i would just allocate such massive non-pointer data with malloc and encapsulate it in an object which has a destructor to de-allocate the memory.

-i.

Niko Korhonen wrote:
> Consider the following D program:
> 
> int main()
> {
>     const int ARRAY_SIZE = 1024 * 1024 * 128;
>         ubyte[] array = new ubyte[ARRAY_SIZE];
>         printf("Walking through the array...\n");
>         for (int i = 0; i < ARRAY_SIZE; i++)
>         array[i] = cast(ubyte)i;
>            printf("Walk complete\n");       return 0;
> }
> 
> As you can see, we allocate a large array and walk through it, setting the values to something.
> 
> When compiled with DMD 0.112 this program executes strangely on my machine. The message "Walk complete" gets printed pretty quickly, but *after* that (between printing the message and returning from the program) the program hangs with 100% CPU usage for a very long time.
> 
> What is happening here? Can someone reproduce this? I reckon it has something to do with GC because if I replace new byte[...] with malloc, it works properly.