January 28, 2003
Russ Lewis wrote:
> Burton Radons wrote:
> 
> 
>>Maybe we should get rid of function pointer types altogether.  They're
>>in the same fix as wchar; a language interface type that is badly
>>supported and atrophying.  You'll still be able to get the address of a
>>function, but it'll be as a void pointer.
> 
> You could do that, except that we would need some syntax for interfacing with
> C code that requires function pointers.

Yeah, I think removing them wouldn't work; any method I can think of for calling them would be too asstastic, and unlike bitfields, there's going to be more function pointers in the future.  Once delegates can be taken from a function, the pressure to have parallel APIs for both delegates and function pointers will be removed.

You could even cast a function pointer to a delegate and vice versa with a little dynamic machine code generation.  Hm.

January 28, 2003
Burton Radons <loth@users.sourceforge.net> wrote:
> The delegate of a function would be a minifunction that wraps the call properly:
> 
>      popl %eax // Put the return EIP in EAX
>      movl %eax, (%esp) // Cover the null "this" pointer
>      call function // Execute the real function
>      jmp (%esp) // Jump to the caller
> 
> Because the caller cleans up the stack in extern (D), we can't just substitute a "jmp function" in there and skip the last instruction; if we could, this could just be a couple bytes right before the real function and not have a jmp at all.

What about modifying the ABI so that "this" is the last argument, rather than the first?  Then, if it's not needed, it sits harmlessly on the stack like any other local variable of the caller.

For ISAs with more registers, it may be better to dedicate an argument register to holding "this", so that the change doesn't incur a performance penalty by moving "this" from a register to the stack when the supply of argument registers is exhausted.  Of course, this assumes that non-static methods are more frequent than static methods and plain functions, which would have one less argument register...

-Scott
January 28, 2003
Burton Radons <loth@users.sourceforge.net> wrote:
> If we put the this pointer at the end of the arguments this wouldn't be a problem.

Oops, I missed this... :-(

>  But that makes COM interfacing impossible.

Though, the spec says that COM functions already need to use
extern (Windows), so a change specifically to the D ABI shouldn't
affect them.  Of course, that raises the issue of what to do when
extern (Windows (or any other ABI that may be supported)) functions are
placed in a delegate, but it's better than having to apply the
workaround to all functions.

-Scott
January 30, 2003
Ilya Minkov wrote:
...
> The run-time performance of a Dino's parser is about 30_000 C code lines per second on a 500Mhz P6, which i consider usually enough. And it requieres very little time to read in the grammar.  It seems to me that parsing speed is not that important, since GCC uses a very fast parser, and is yet slow as hell. In fact the absolutely slowest compiler I've ever experienced. General design is of major importance.

GCC isn't poorly designed as far as I can tell; it is slow as hell though. Assuming my cursory profiling of gcc is right, the single most expensive thing gcc does is garbage collection.  The next most is parsing.  Both of them make up the so much of compilation time that the backend seems irrelevent.  I've been playing with 3.3 and 3.4 via CVS lately and follow some of the lists, and it looks like with a few gc tunables you can instantly squeeze out 25-40% more performance from it.

Why they seem to have been neglected I have no idea.  Maybe the gcc hackers are using super beefy hardware and haven't started seriously looking at the problem until lately when gcc 3.2.x was widely available to provide them with lots of complaints.

General "design" as I read here and usually see tends to be restricted to considering long-term dominant characteristics.  Have you ever seen how fast tinycc compiles C code?  Lots of other compilers use the same kind of "algorithm" but so far as I can tell, the reason why its so fast is because it parses everything in one pass.



January 30, 2003
Have I missed something here, but who cares how fast/slow the compiler is
the important fact is does it generate fast code
in an x86 the order is, (from some table I saw online last year)
lcc (very poor)
bc++, dmc (sorry walter, just going from figures I've seen), some gcc's are
all about the same
egcs and newer gcc a bit better
VC++ generated code also twice the speed of lcc and 10 to 25% faster than
bcc
Intel's plugin for VS the top

the next point of consern to me is; can I write code that's readable and know the compiler will optimise it fully, or do I have to write optimised C to get the performance.

gcc suffers slightly from having such a range of backends, unlike tinycc,
gcc generate an intermediate form of the code and passes that to the
backend, and I would expect that its optimiser uses up a few cycles, I
believe it performs at least two optimiser phases, first on the code
(looking for loop invariants etc) then a simple peep hole optimiser on the
generated code.
and it makes tempary files for transfering info between front and backend
afaik and it is this write read file access that will kill performance
especially with files bigger than the file cache. if you are playing with
gcc, you might want to try using named pipes for the connection between
front end and backend on win32.

just times some gcc compiler, I have a 6Mb 75 file GBA project
compiling it 4 times (4 different configs)
takes 1min 30sec at -O3 ; 1 min 25 at -O1 and 1 min 20 with no optimisation
the longest file is a 4Mb lookuptable (int lut[16][0x8000]) which takes 12
seconds to compile
only build into one version and that verison takes about 35 to 40 seconds to
build
this is on a Duron 800 512Mb RAM, UDMA66 disks Win2K (and all manner of junk
running in the bg)

I remember having a 2Mb Turbo Vision project that used to take over 30
minutes to compile on a P90
that, I call slow, but gcc, on current hardware (800 not exactly fast these
days) I don't consider slow.
I'll have to try on a realy slow machine (celeron 400)

gc can be a killer to performance if you have HUGE amounts of object to walk, I made the mistake once of setting the java heap size bigger than my physical memory before running javadoc over some source, (this was at 6pm) I went out, stayed at a friends over night, got home and it was still parsing the files, every gc cycle was causing swapping, I think it took about 46 hour in the end, later I set the heap size just below the available mem, it took 4 hours instead :)

like oo and templating, gc is just another double edged sword the programmer has to learn to work with.

Mike.

"Garen Parham" <garen_nospam_@wsu.edu> wrote in message news:b1b7na$65g$1@digitaldaemon.com...
> Ilya Minkov wrote:
> ...
> > The run-time performance of a Dino's parser is about 30_000 C code lines per second on a 500Mhz P6, which i consider usually enough. And it requieres very little time to read in the grammar.  It seems to me that parsing speed is not that important, since GCC uses a very fast parser, and is yet slow as hell. In fact the absolutely slowest compiler I've ever experienced. General design is of major importance.
>
> GCC isn't poorly designed as far as I can tell; it is slow as hell though. Assuming my cursory profiling of gcc is right, the single most expensive thing gcc does is garbage collection.  The next most is parsing.  Both of them make up the so much of compilation time that the backend seems irrelevent.  I've been playing with 3.3 and 3.4 via CVS lately and follow some of the lists, and it looks like with a few gc tunables you can instantly squeeze out 25-40% more performance from it.
>
> Why they seem to have been neglected I have no idea.  Maybe the gcc
hackers
> are using super beefy hardware and haven't started seriously looking at
the
> problem until lately when gcc 3.2.x was widely available to provide them with lots of complaints.
>
> General "design" as I read here and usually see tends to be restricted to considering long-term dominant characteristics.  Have you ever seen how fast tinycc compiles C code?  Lots of other compilers use the same kind of "algorithm" but so far as I can tell, the reason why its so fast is
because
> it parses everything in one pass.
>
>
>


January 30, 2003
Mike Wynn wrote:

> Have I missed something here, but who cares how fast/slow the compiler is
> the important fact is does it generate fast code
> in an x86 the order is, (from some table I saw online last year)
> lcc (very poor)
> bc++, dmc (sorry walter, just going from figures I've seen), some gcc's are
> all about the same
> egcs and newer gcc a bit better
> VC++ generated code also twice the speed of lcc and 10 to 25% faster than
> bcc
> Intel's plugin for VS the top

Code generation is more important, but compile time performance is very important.  When testing huge source trees it can mean a difference in days of time lost.  All the waiting when developing adds up real fast too.

I use icc 7.0 regularly and it has -O2 on by default and is 100-200% faster than gcc/g++ with no optimization, so I don't think its slow at all. It also uses the best C++ front end IMO and generates superior error messages.

> the next point of consern to me is; can I write code that's readable and know the compiler will optimise it fully, or do I have to write optimised C to get the performance.
> 
> gcc suffers slightly from having such a range of backends, unlike tinycc, gcc generate an intermediate form of the code and passes that to the backend, and I would expect that its optimiser uses up a few cycles, I believe it performs at least two optimiser phases, first on the code (looking for loop invariants etc) then a simple peep hole optimiser on the generated code.

There are lots of optimization passes that can be enabled, but the total time they take is miniscule IME.

> and it makes tempary files for transfering info between front and backend afaik and it is this write read file access that will kill performance especially with files bigger than the file cache. if you are playing with gcc, you might want to try using named pipes for the connection between front end and backend on win32.

Using the -pipe flag won't generate temporaries.

> 
> I remember having a 2Mb Turbo Vision project that used to take over 30
> minutes to compile on a P90
> that, I call slow, but gcc, on current hardware (800 not exactly fast these
> days) I don't consider slow.
> I'll have to try on a realy slow machine (celeron 400)
> 

When I first setup my environment to use tcc instead, I hit F8 to compile.
It was so fast I just sat there wondering why nothing had happened.
I didn't realize it had compiled already!


> 
> like oo and templating, gc is just another double edged sword the programmer has to learn to work with.
> 

I don't follow that one.

January 30, 2003

> >
> > like oo and templating, gc is just another double edged sword the
programmer
> > has to learn to work with.
> >
>
> I don't follow that one.
>

people were complaining that gcc is slow becasue it has gc (garbage
collection)
if used properly it can be faster (you avoid all those copy constructors,
and code to keep track of live objects)
you can also code complex meshes of objects without worring about who 'owns'
what.
on the down side, memory footprint can be bigger (you have to wait for the
gc to run before you get your memory back)
it all depends on the type of app you are writing.
wjhat can I say it can be great if used in the right place, is can be a pain
if used when it should not be.
just like void*, OO, templates, innerclasses, nested classes, closures,
inline asm etc etc etc
all have their uses, and the more you used them the more you know when its
right to do X and when X is going to bite back when your not looking.




January 30, 2003
Garen Parham wrote:
> GCC isn't poorly designed as far as I can tell; it is slow as hell though. Assuming my cursory profiling of gcc is right, the single most expensive
> thing gcc does is garbage collection.  The next most is parsing.  Both of
> them make up the so much of compilation time that the backend seems
> irrelevent.  I've been playing with 3.3 and 3.4 via CVS lately and follow
> some of the lists, and it looks like with a few gc tunables you can
> instantly squeeze out 25-40% more performance from it. 

Does it do GC? Then why does it swap like hell on my 64mb notebook running lightweight win98? I've seen my allocated virtual memory constantly grow. It looks like it's plugged in where it doesn't do much? Perhaps non-GC-friendly data organisation?

Besides, if it's boehm GC, it shouldn't be a significant performance loss. At least if it doesn't run the whole time. But yes, it would run the whole time if a system is forced to swap. :/

> Why they seem to have been neglected I have no idea.  Maybe the gcc hackers
> are using super beefy hardware and haven't started seriously looking at the
> problem until lately when gcc 3.2.x was widely available to provide them
> with lots of complaints.

:/

> General "design" as I read here and usually see tends to be restricted to
> considering long-term dominant characteristics.  Have you ever seen how
> fast tinycc compiles C code?  Lots of other compilers use the same kind of
> "algorithm" but so far as I can tell, the reason why its so fast is because
> it parses everything in one pass.

"It uses multiple simple short passes", it says in tinycc docs. And it also says that the only optimisations made are constant wrapping and replacements within single instructions (MUL to shift, ADD to INC, and so on), i.e. at generation. Unlike GCC or even LCC, it doesn't have means to edit generated code in any manner. It doesn't store an IR, i guess. And of course, it uses no intermediate assembly language file.

I have a huge number of documents on my HD, describing different back-end generators for LCC. The major topic is rewriting the IR tree, selecting optimal choices guided by system-specific instruction-costs. I haven't had time to read them though and i won't for a short while.

And yet, LCC is reasinably fast. LCC-Win32 gains additional performance because it doesn't save ASM files like original LCC and GCC do, but feeds it to internal assembler. But the assembly is still text, which is IMO simply stupid. It could be some uniform-sized binary data, which is easy to analyse and can be converted to real machine code with only a few shifts. LCC-Win32 adds a peephole optimizer to LCC, which tagges each text assembly instruction with a simple binary "description", and then does a simple pattern-search with replace between labels, using the tags as a primary guidance and also parsing single instructions when needed. Due to tags, the optimisation phase is very fast, and seems to add about 1/5 to compilation time. Simply imagine what if it had to parse ASM over and over. And GCC performance is very low as well with optimizations turned off. The LCC-Win32 author claims that a small number of simple optimisations leads to about 90% of GCC 2.95 code performance on P6 class machines, so it doesn't seem much like a speed-quality tradeoff, rather some deficiency in GCC.

Avoiding the assembly phase is actually very simple, VCODE solves it in a following ANSI C -compliant manner: a number of preprocessor macros are written, one for each opcode, which generates the corresponding binary instruction using a couple of ANDs, ORs and shifts, and pushes it onto a kind of software-stack. Then, these only need to be placed instead of generating assembly text. Well, they obviously cannot be stored in an IR like text can, but some intermediate solution is imaginable. VCODE's IR storage is ICODE, just that these are made for runtime code generation, and represent a generalized RISC command set. But "back-ends" generating all kinds of machine code out of them exist.

-i.

January 30, 2003
Mike Wynn wrote:
> I remember having a 2Mb Turbo Vision project that used to take over 30
> minutes to compile on a P90
> that, I call slow, but gcc, on current hardware (800 not exactly fast these
> days) I don't consider slow.
> I'll have to try on a realy slow machine (celeron 400)

My main development computer is a notebook, Pentium MMX 233 MHz, 64 MB, Win98, which is lightweight enough that i carry it often with me to the university. Surprisingly enough, my math professor has a similar notebook, although he's not on a limited budget like i am.

> gc can be a killer to performance if you have HUGE amounts of object to
> walk, I made the mistake once of setting the java heap size bigger than my
> physical memory before running javadoc over some source, (this was at 6pm) I
> went out, stayed at a friends over night, got home and it was still parsing
> the files, every gc cycle was causing swapping, I think it took about 46
> hour in the end, later I set the heap size just below the available mem, it
> took 4 hours instead :)

Probably because it did a re-scan every time it hit the mem border. And/or went into swapping. Until it doesn't scan often, there should be no significant performance loss.

> like oo and templating, gc is just another double edged sword the programmer
> has to learn to work with.

Sure. But like OO and Templating are very useful (even if not for all tasks), GC is as well.

-i.

February 01, 2003
Ilya Minkov wrote:

> 
> Does it do GC? Then why does it swap like hell on my 64mb notebook running lightweight win98? I've seen my allocated virtual memory constantly grow. It looks like it's plugged in where it doesn't do much? Perhaps non-GC-friendly data organisation?
> 
> Besides, if it's boehm GC, it shouldn't be a significant performance loss. At least if it doesn't run the whole time. But yes, it would run the whole time if a system is forced to swap. :/
> 

Yeah, GCC uses the boehm GC.  C and C++ supposedly aren't very amenable to being GC'd but could hardly think it would amount to as much slowness as GCC shows.

> "It uses multiple simple short passes", it says in tinycc docs. And it also says that the only optimisations made are constant wrapping and replacements within single instructions (MUL to shift, ADD to INC, and so on), i.e. at generation. Unlike GCC or even LCC, it doesn't have means to edit generated code in any manner. It doesn't store an IR, i guess. And of course, it uses no intermediate assembly language file.

Yeah it doesn't do hardly anything at all for optimization.  But with other compilers not getting even close without any optimization settings turned on it seems they can do way better.

> I have a huge number of documents on my HD, describing different back-end generators for LCC. The major topic is rewriting the IR tree, selecting optimal choices guided by system-specific instruction-costs. I haven't had time to read them though and i won't for a short while.
> 
...

I've heard LCC was a good compiler to study but haven't read/used it. Have done some cursory browsing and it and the Zephyr/NCI projects seem pretty cool but look like they're dead.