Thread overview
D vs C code generation
Jul 17, 2004
Wayne Scott
Jul 19, 2004
Stephen Waits
Jul 20, 2004
Wayne Scott
Jul 20, 2004
Stephen Waits
July 17, 2004
I was doing some tests using the gdc compiler and comparing it to gcc.

First I created C version of the example wc program:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>

char *
readfile(char *file)
{
	char	*ret;
	int	fd;
	struct stat sb;

	stat(file, &sb);

	ret = malloc(sb.st_size + 1);
	fd = open(file, O_RDONLY);
	read(fd, ret, sb.st_size);
	ret[sb.st_size] = 0;
	close(fd);
	return (ret);
}

int
main (int ac, char **av)
{
	int	w_total = 0;
	int	l_total = 0;
	int	c_total = 0;
	int	i;

	printf ("   lines   words   bytes file\n");
	for (i = 1; i < ac; i++) {
		char	*input;
		int	w_cnt = 0, l_cnt = 0, c_cnt = 0;
		int	inword = 0;
		char	*p;

		input = readfile(av[i]);

		p = input;
		while (*p) {
			if (*p == '\n') ++l_cnt;
			if (*p != ' ') {
				if (!inword) {
					inword = 1;
					++w_cnt;
				}
			} else {
				inword = 0;
			}
			++c_cnt;
			++p;
		}
		free(input);
		printf ("%8u%8u%8u %s\n", l_cnt, w_cnt, c_cnt, av[i]);
		l_total += l_cnt;
		w_total += w_cnt;
		c_total += c_cnt;
	}
	if (ac > 2) {
		printf ("--------------------------------------\n"
"%8u%8u%8u total\n",
		    l_total, w_total, c_total);
	}
	return (0);
}

Then I compiled both versions with -O2 and used cachegrind to find out exactly how many instruction each one needed to run.  Here are the results with the C version first.

(This is over 2 megs of C source)
$ valgrind --tool=cachegrind ./wc_c ~/bk/bk-3.3.x/src/*.c > /dev/null
==3349== I   refs:      22,529,481
==3349== I1  misses:           784
==3349== L2i misses:           778
==3349== I1  miss rate:        0.0%
==3349== L2i miss rate:        0.0%
==3349==
==3349== D   refs:       2,393,366  (2,262,770 rd + 130,596 wr)
==3349== D1  misses:        10,159  (    9,671 rd +     488 wr)
==3349== L2d misses:         9,680  (    9,315 rd +     365 wr)
==3349== D1  miss rate:        0.4% (      0.4%   +     0.3%  )
==3349== L2d miss rate:        0.4% (      0.4%   +     0.2%  )
==3349==
==3349== L2 refs:           10,943  (   10,455 rd +     488 wr)
==3349== L2 misses:         10,458  (   10,093 rd +     365 wr)
==3349== L2 miss rate:         0.0% (      0.0%   +     0.2%  )
farm Dlang $ valgrind --tool=cachegrind ./wc_d ~/bk/bk-3.3.x/src/*.c > /dev/null
==3351== Cachegrind, an I1/D1/L2 cache profiler for x86-linux.
==3351== I   refs:      29,081,497
==3351== I1  misses:         1,216
==3351== L2i misses:         1,199
==3351== I1  miss rate:        0.0%
==3351== L2i miss rate:        0.0%
==3351==
==3351== D   refs:       4,891,118  (3,663,754 rd + 1,227,364 wr)
==3351== D1  misses:        61,871  (   24,677 rd +    37,194 wr)
==3351== L2d misses:        60,880  (   23,757 rd +    37,123 wr)
==3351== D1  miss rate:        1.2% (      0.6%   +       3.0%  )
==3351== L2d miss rate:        1.2% (      0.6%   +       3.0%  )
==3351==
==3351== L2 refs:           63,087  (   25,893 rd +    37,194 wr)
==3351== L2 misses:         62,079  (   24,956 rd +    37,123 wr)
==3351== L2 miss rate:         0.1% (      0.0%   +       3.0%  )

As you can see the D version of the code used 30% more instructions and
100% more data accesses.
(BTW the system wc program was a lot slower than both of these...)

That is not too bad for the benefits, but I was hoping they would
be closer.  Originally I was seeing MUCH different results, but I was
using smaller input sets.  D has a much higer startup overhead.

Next I tried making the D code look like my C version without the dynamic arrays and just using pointers.  It didn't really change the numbers at all.  Also adding -fno-bounds-check didn't help.  That is a good sign because it means that the array code generates the same code you would write using pointer.

Anyway I thought the result was interesting...

-Wayne
July 19, 2004
Wayne Scott wrote:
> Anyway I thought the result was interesting...

Interesting ideed.  Can you please say which versions of dmd and gcc you used?

Thanks,
Steve
July 20, 2004
In article <cdhcbc$no3$2@digitaldaemon.com>,
Stephen Waits  <steve@waits.net> wrote:
>Wayne Scott wrote:
>> Anyway I thought the result was interesting...
>
>Interesting ideed.  Can you please say which versions of dmd and gcc you used?
>
>Thanks,
>Steve


Ahh yes, I did leave out that information.

I used release 1f of the D gcc frontend from here:
	http://home.earthlink.net/~dvdfrdmn/d/

build on top of GCC 3.3.4.   And compared it to that same gcc.

I tried rebuilding with the linux version of the official compiler and I get this result with -O -release:

==22599== Cachegrind, an I1/D1/L2 cache profiler for x86-linux.
==22599== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote.
==22599== Using valgrind-2.1.1, a program supervision framework for x86-linux.
==22599== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward.
==22599== For more details, rerun with: -v
==22599==
==22599==
==22599== I   refs:      23,711,446
==22599== I1  misses:         1,066
==22599== L2i misses:         1,055
==22599== I1  miss rate:        0.0%
==22599== L2i miss rate:        0.0%
==22599==
==22599== D   refs:       7,230,055  (6,404,429 rd + 825,626 wr)
==22599== D1  misses:        48,292  (   10,964 rd +  37,328 wr)
==22599== L2d misses:        46,787  (    9,685 rd +  37,102 wr)
==22599== D1  miss rate:        0.6% (      0.1%   +     4.5%  )
==22599== L2d miss rate:        0.6% (      0.1%   +     4.4%  )
==22599==
==22599== L2 refs:           49,358  (   12,030 rd +  37,328 wr)
==22599== L2 misses:         47,842  (   10,740 rd +  37,102 wr)
==22599== L2 miss rate:         0.1% (      0.0%   +     4.4%  )

That is similar to the number of instructions in the C version, but over 3X the number of D refs.   The number of D1 misses was the same so the extra loads and stores were probably all on the stack.

-Wayne

PS: Does anyone read this newgroup or should I have posted this stuff
    to the digitalmars.D newgroup?
July 20, 2004
Wayne Scott wrote:

> PS: Does anyone read this newgroup or should I have posted this stuff
>     to the digitalmars.D newgroup?

Probably wouldn't hurt to cross post it over there.  It does involve DMD in addition to gcc, so it's on-topic in both groups.

You'll definitely get more response in the main group.

--Steve