Thread overview
Static arrays problem
Dec 17, 2008
bearophile
Dec 17, 2008
Sean Kelly
Dec 17, 2008
bearophile
Dec 17, 2008
BCS
Dec 17, 2008
bearophile
Dec 17, 2008
BCS
Dec 17, 2008
bearophile
Dec 18, 2008
BCS
Dec 17, 2008
Sean Kelly
Dec 18, 2008
Denis Koroskin
December 17, 2008
At the bottom of this post there are two little test programs that may show a problem with static arrays.

D code compiled with DMD 1.038:
dmd test_d.d
Compile time: 8.29 s
Resulting exe and obj:
17.756.188 test_d.exe
     2.390 test_d.map
17.734.261 test_d.obj


C code compiled with DMC 8.42n:
dmc test_c.c
Compile time: 0.04 s
Resulting exe:
 51.740 test_c.exe


C code compiled with GCC (4.2.1):
gcc test_c.c -o test_c
Compile time: 0.16 s
Resulting exe:
    16.781 test_c.exe


C code compiled with LLVM-GCC (2.5):
llvm-gcc test_c.c -o test_c
Compile time: 0.11 s
Resulting exe:
    13.462 test_c.exe


(The following code is just reduced toy, but the full program runs up to about 13 times faster with static arrays, so I'd like to use them instead of the dynamic ones.)

Using obj2asm on the test_d.obj it's made mostly of:

_D6test_d1aG3G65G65G130f:
	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
...

-------------------------

// C code:

#include "stdio.h"

#define N 65
#define M (N+N)

float a[3][N][N][M];
float b[4][N][N][M];
float c[N][N][M];

int main() {
    int i, j, k;

    for (i = 0; i < N; i++)
        for (j = 0; j < N; j++)
            for (k = 0; k < M; k++) {
                a[0][i][j][k] = 0.0;
                a[1][i][j][k] = 0.0;
                a[2][i][j][k] = 0.0;
                b[0][i][j][k] = 0.0;
                b[1][i][j][k] = 0.0;
                b[2][i][j][k] = 0.0;
                b[3][i][j][k] = 0.0;
                c[i][j][k] = 0.0;
            }

    printf("%f %f %f\n", a[2][10][10][10], a[2][10][10][10], c[10][10][10]);
    return 0;
}

-------------------------

// D code, it's the same,
// I have just changed the first 3 lines.

import std.c.stdio: printf;

const int N = 65;
const int M = (N+N);

float a[3][N][N][M];
float b[4][N][N][M];
float c[N][N][M];

int main() {
    int i, j, k;

    for (i = 0; i < N; i++)
        for (j = 0; j < N; j++)
            for (k = 0; k < M; k++) {
                a[0][i][j][k] = 0.0;
                a[1][i][j][k] = 0.0;
                a[2][i][j][k] = 0.0;
                b[0][i][j][k] = 0.0;
                b[1][i][j][k] = 0.0;
                b[2][i][j][k] = 0.0;
                b[3][i][j][k] = 0.0;
                c[i][j][k] = 0.0;
            }

    printf("%f %f %f\n", a[2][10][10][10], a[2][10][10][10], c[10][10][10]);
    return 0;
}

Bye,
bearophile
December 17, 2008
== Quote from bearophile (bearophileHUGS@lycos.com)'s article
> At the bottom of this post there are two little test programs that may show a problem with static
arrays.

Could you describe what the problem is?

> -------------------------
> // D code, it's the same,
> // I have just changed the first 3 lines.
> import std.c.stdio: printf;
> const int N = 65;
> const int M = (N+N);
> float a[3][N][N][M];
> float b[4][N][N][M];
> float c[N][N][M];

Consider changing this to:

float a[3][N][N][M] = void;
float b[4][N][N][M] = void;
float c[N][N][M] = void;

You don't need default initialization if you're just going to manually initialize to 0.0 anyway, so eliminate it.


Sean
December 17, 2008
Sean Kelly:
> Could you describe what the problem is?

The problem is that the D code produces an huge executable.


> Consider changing this to:
> float a[3][N][N][M] = void;
> float b[4][N][N][M] = void;
> float c[N][N][M] = void;
> You don't need default initialization if you're just going to
> manually initialize to 0.0 anyway, so eliminate it.

Thank you, that fixes the problem, the exe size is now a more normal 180.252 bytes :-)

I think the D compiler has to take a look at the size of the static arrays, if they are too much large, then their initialization has to be done with a loop, avoiding the creation of such huge executables.

Bye,
bearophile
December 17, 2008
Reply to bearophile,

> the exe size is now a more normal 180.252 bytes :-)
> 
> I think the D compiler has to take a look at the size of the static
> arrays, if they are too much large, then their initialization has to
> be done with a loop, avoiding the creation of such huge executables.
> 

OTOH that could end up large boot times.


December 17, 2008
BCS:
> OTOH that could end up large boot times.

Why? Is a "cleaning loop" with memset (done with modern CPU instructions) slower than the loading of a 17 MB executable?

Bye,
bearophile
December 17, 2008
Reply to bearophile,

> BCS:
> 
>> OTOH that could end up large boot times.
>> 
> Why? Is a "cleaning loop" with memset (done with modern CPU
> instructions) slower than the loading of a 17 MB executable?
> 
> Bye,
> bearophile

It depends. If the array is not accessed all at once, the OS needs only to set up the virtual memory maps and then page fault in the pages as needed. If the array is larger than the amount of memory that the OS would like to allow the program to keep in memory it could get even worse as it will start paging stuff out before the app even gets to main() (OTOH IIRC OPTLINK won't allow that big a static array).


December 17, 2008
== Quote from bearophile (bearophileHUGS@lycos.com)'s article
> I think the D compiler has to take a look at the size of the static arrays, if they are too much large, then their initialization has to be
done with a loop, avoiding the creation of such huge executables.

TypeInfo should probably contain an init(void[]) routine instead of a void[]
member which is byte copied onto the new block.  Then, initializers that
are easy to decompose could at least be turned into a series of memset
operations.  For built-in types, this should be very easy to do, since the
functions could be hand-coded as opposed to requiring code generation
from the compiler.


Sean
December 17, 2008
BCS:
> It depends. If the array is not accessed all at once, the OS needs only to set up the virtual memory maps and then page fault in the pages as needed. If [...]

I have timed tree versions of that little code:
D without = void: first run 1.85 s, successive ones: 0.11 s
D with = void: first run 0.24 s, successive ones: 0.10 s
C compiled with DMC: first run: 0.10 s, successive ones: 0.06 s
C compiled with GCC: first run: 0.07 s, successive ones: 0.05 s

(Compilations with no arguments. GCC compiled with -s -Os). So that huge executable is bad for the speed too.

Bye,
bearophile
December 18, 2008
Reply to bearophile,

> BCS:
> 
>> It depends. If the array is not accessed all at once, the OS needs
>> only to set up the virtual memory maps and then page fault in the
>> pages as needed. If [...]
>> 
> I have timed tree versions of that little code:
> D without = void: first run 1.85 s, successive ones: 0.11 s
> D with = void: first run 0.24 s, successive ones: 0.10 s
> C compiled with DMC: first run: 0.10 s, successive ones: 0.06 s
> C compiled with GCC: first run: 0.07 s, successive ones: 0.05 s
> (Compilations with no arguments. GCC compiled with -s -Os). So that
> huge executable is bad for the speed too.
> 
> Bye,
> bearophile

That dosn't test what I was looking at. Try it without the init loop, I'll bet you won't see any difference between the cases at all.


December 18, 2008
On Thu, 18 Dec 2008 00:52:43 +0300, bearophile <bearophileHUGS@lycos.com> wrote:

> At the bottom of this post there are two little test programs that may show a problem with static arrays.
>
> D code compiled with DMD 1.038:
> dmd test_d.d
> Compile time: 8.29 s
> Resulting exe and obj:
> 17.756.188 test_d.exe
>      2.390 test_d.map
> 17.734.261 test_d.obj
>
>
> C code compiled with DMC 8.42n:
> dmc test_c.c
> Compile time: 0.04 s
> Resulting exe:
>  51.740 test_c.exe
>
>
> C code compiled with GCC (4.2.1):
> gcc test_c.c -o test_c
> Compile time: 0.16 s
> Resulting exe:
>     16.781 test_c.exe
>
>
> C code compiled with LLVM-GCC (2.5):
> llvm-gcc test_c.c -o test_c
> Compile time: 0.11 s
> Resulting exe:
>     13.462 test_c.exe
>
>
> (The following code is just reduced toy, but the full program runs up to about 13 times faster with static arrays, so I'd like to use them instead of the dynamic ones.)
>
> Using obj2asm on the test_d.obj it's made mostly of:
>
> _D6test_d1aG3G65G65G130f:
> 	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
> 	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
> 	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
> 	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
> ...
>
> -------------------------
>
> // C code:
>
> #include "stdio.h"
>
> #define N 65
> #define M (N+N)
>
> float a[3][N][N][M];
> float b[4][N][N][M];
> float c[N][N][M];
>
> int main() {
>     int i, j, k;
>
>     for (i = 0; i < N; i++)
>         for (j = 0; j < N; j++)
>             for (k = 0; k < M; k++) {
>                 a[0][i][j][k] = 0.0;
>                 a[1][i][j][k] = 0.0;
>                 a[2][i][j][k] = 0.0;
>                 b[0][i][j][k] = 0.0;
>                 b[1][i][j][k] = 0.0;
>                 b[2][i][j][k] = 0.0;
>                 b[3][i][j][k] = 0.0;
>                 c[i][j][k] = 0.0;
>             }
>
>     printf("%f %f %f\n", a[2][10][10][10], a[2][10][10][10], c[10][10][10]);
>     return 0;
> }
>
> -------------------------
>
> // D code, it's the same,
> // I have just changed the first 3 lines.
>
> import std.c.stdio: printf;
>
> const int N = 65;
> const int M = (N+N);
>
> float a[3][N][N][M];
> float b[4][N][N][M];
> float c[N][N][M];
>
> int main() {
>     int i, j, k;
>
>     for (i = 0; i < N; i++)
>         for (j = 0; j < N; j++)
>             for (k = 0; k < M; k++) {
>                 a[0][i][j][k] = 0.0;
>                 a[1][i][j][k] = 0.0;
>                 a[2][i][j][k] = 0.0;
>                 b[0][i][j][k] = 0.0;
>                 b[1][i][j][k] = 0.0;
>                 b[2][i][j][k] = 0.0;
>                 b[3][i][j][k] = 0.0;
>                 c[i][j][k] = 0.0;
>             }
>
>     printf("%f %f %f\n", a[2][10][10][10], a[2][10][10][10], c[10][10][10]);
>     return 0;
> }
>
> Bye,
> bearophile


This is a linker issue. DMD stores a, b,c AND a.init, b.init, c.init (doubling data size).
Intelligent linker could strip a.init, b.init and c.init if they are never accessed.