A benchmark, mostly GC (page 4)

December 14, 2011

Re: A benchmark, mostly GC

Posted by Paulo Pinto
in reply to Robert Jacques

Permalink

Paulo Pinto

Posted in reply to Robert Jacques

Permalink

No need to reference Wikipedia articles, I am well aware of the implementation specifics.

Still Singularity was just an example. I can pinpoint several other operating systems done in GC enabled systems languages, where besides the given language, only assembly was used, Native Oberon is such an OS for example.

Many times C and C++ get used in such operating systems because of available tools that speed up development, instead of spending time rewriting them.

I think we will eventually get there as the trend nowadays is that everything is slowly done in GC enabled languages anyway. Even Microsoft and Apple are extending their systems languages (Objective-C and C+) to offer some form of automatic memory management.

--
Paulo


Robert Jacques Wrote:

> On Tue, 13 Dec 2011 02:22:00 -0500, Paulo Pinto <pjmlp@progtools.org> wrote:
> 
> > Robert Jacques Wrote:
> >> >>>> Second, being a systems language means that D can not implement a lot of
> >> >>> GC algorithms including copying, generational and the good concurrent collectors.
> >
> > I disagree, as there are other system languages with better GC algorithms as D,  because they offer more safe features than D, lack of inline assembler being one of them.
> >
> > And regardless of what you may think about these languages suitability for doing systems programming, there are research operating systems written in them with lots of papers to read from. Something that I am yet to see from D.
> >
> > Yet when reading how Singularity was implemented, there are lots of parallels between what Sing# offers and what D does. So I really see that there is quite some possibilities to improve D's GC still.
> >
> 
>  From the Singularity Wikipedia article:
> The lowest-level x86 interrupt dispatch code is written in assembly language and C. Once this code has done its job, it invokes the kernel, whose runtime and garbage collector are written in Sing# (an extended version of Spec#, itself an extension of C#) and runs in unprotected mode. The hardware abstraction layer is written in C++ and runs in protected mode. There is also some C code to handle debugging. The computer's BIOS is invoked during the 16-bit real mode bootstrap stage; once in 32-bit mode, Singularity never invokes the BIOS again, but invokes device drivers written in Sing#. During installation, Common Intermediate Language (CIL) opcodes are compiled into x86 opcodes using the Bartok compiler.
> 
>  From BitC website
> A less obvious issue is the absence of first-class union value types in the managed subset of the Common Language Runtime (CLR) or the corresponding parts of the Java Virtual Machine (JVM). These are absolutely necessary for low-level systems programming, so one must either abandon Java/C#/Spec# to implement these low-level objects (thereby abandoning the foundation for checking), or one must find a more appropriate language.
> 
> In addition to the problems of expressiveness, neither Java nor C# was designed with formal property checking in mind. Spec# [3], a language developed by Microsoft Research to retrofit formal property checking to C#, has been forced to introduce some fairly severe language warts to support precondition and postcondition checking, but the language does not attempt to address the underlying performance issues of C#.
> 
> So, no, Singularity isn't written purely in Sing#; all its low-level systems access is written in ASM/C/C++, like pretty much every single other operating system. (It's still an impressive microkernel)
> 
> Now BitC and Coyotos are another interesting language/OS pair, though they currently use a C/C++ conservative garbage collector.
> 
> At the end of the day, I'm really excited about the growth in the systems programming arena and I'd love to see the combination of the ideals in C4, L4, Singularity and Coyotos into some new OS and/or language. But that doesn't really change the limitations of running on top of Windows or iOS.

On Wed, 14 Dec 2011 02:38:17 -0500, Paulo Pinto <pjmlp@progtools.org> wrote: > No need to reference Wikipedia articles, I am well aware of the implementation specifics. > > Still Singularity was just an example. I can pinpoint several other operating systems done in GC enabled systems languages, where besides the given language, only assembly was used, Native Oberon is such an OS for example. > > Many times C and C++ get used in such operating systems because of available tools that speed up development, instead of spending time rewriting them. > > I think we will eventually get there as the trend nowadays is that everything is slowly done in GC enabled languages anyway. Even Microsoft and Apple are extending their systems languages (Objective-C and C+) to offer some form of automatic memory management. > > -- > Paulo From Oberon's FAQ: I know that the Native Oberon garbage collector is mark-and-sweep, not copying, but does it ever move objects, for instance to compact memory if it becomes excessively fragmented? A: No, objects are never moved by the Mark/Sweep collector. To avoid fragmentation with a program that continuously allocates memory, call the Garbage collector (Oberon.Collect) from time to time to free unused memory. Avoid calling it too much though, because it does take some milliseconds to run, depending on the number of reachable objects on the heap. I feel like we are having a bit of a semantic issue. My position is not that GC can't or shouldn't be used by or in systems programming, indeed it should, it's that generally speaking the types of garbage collectors available to high-performance systems languages are not the same as those available to more restrictive languages such as JAVA and C#. P.S. It seems my GC education is a bit out of date as there has been work into precise garbage collection for C in recent years (i.e. Magpie). I didn't see any references to more advanced garbage collection algorithms, but give graduate students hard problems and generally solutions fall out.

Am 11.12.2011, 14:48 Uhr, schrieb bearophile <bearophileHUGS@lycos.com>: > This program used here as a benchmark is a bit reduced from a rosettacode task, it finds how many ways are there to make change for 100_000 euro (the argument 'amount' represents cents of euro) using the common coins. > > The result is: > 992198221207406412424859964272600001 > > The timings, best of 3, seconds: > DMD: 22.5 > Python: 9.3 > Java: 2.9 > > DMD 2.057beta, -O -release -inline > Java SE 1.7.0_01-b08 (used without -server) Is -server still doing anything? I thought that behavior was default now. > Python 2.6.6 > 32 bit Windows system. Since I'm on a 64-bit machine I changed all int to ptrdiff_t first, for compatibility. And I am using DMD 2.056. dmd -inline -O -release gives me 21.680s (pretty similar) dmd -L-O1 -L-znodlopen -L-znorelro -L--no-copy-dt-needed-entries -L--relax -L--sort-common -L--gc-sections -L-lrt -L--as-needed -L--strip-all -inline -O -release -noboundscheck gives me 18.674s (black magic or something, but note worthy; shaves off 3 seconds for me) gdc -Wall -frelease -fno-assert -fno-bounds-check -ffunction-sections -fdata-sections -flto -march=native -O3 -Wl,--strip-all -Wl,-O1 -Wl,-znodlopen -Wl,-znorelro -Wl,--no-copy-dt-needed-entries -Wl,--relax -Wl,--sort-common -Wl,--gc-sections -Wl,-lrt -Wl,--as-needed gives me 14.846s -> The choice of compiler and parameters can have an unexpected impact on the runtime performance. :) But let's take a look at the non-inlined dmd version to do some (o)profiling (attached file). Looking at the call graphs, it looks to me like a total of ~63 % of the time is spend in memory management routines while the rest goes to BigInt. I don't have a set up here to quickly check out 2.057. The numbers may differ significantly there, but that alone wont close the gap to Java (which I actually find impressive here, how do they do this?)

Marco Leise: > Looking at the call graphs, it looks to me like a total of ~63 % of the time is spend in memory management routines while the rest goes to BigInt. But dsimcha said: > My optimizations make very little difference on this benchmark, but for good reason: It's not a very good GC benchmark. I ran it with my GC profiling code enabled and it only spends around 10% of its execution time in GC. We need to figure out why else this benchmark may be so slow. How is this possible? Bye, bearophile

Am 18.12.2011, 23:15 Uhr, schrieb bearophile <bearophileHUGS@lycos.com>: > Marco Leise: > >> Looking at the call graphs, it looks to me like a total of ~63 % of the >> time is spend in memory management routines while the rest goes to BigInt. > > But dsimcha said: > >> My optimizations make very little difference on this benchmark, but for >> good reason: It's not a very good GC benchmark. I ran it with my GC >> profiling code enabled and it only spends around 10% of its execution >> time in GC. We need to figure out why else this benchmark may be so slow. > > How is this possible? > > Bye, > bearophile I could imagine these differences: I tested the stock 2.056 version - but I'll check 2.057 when I write the Gentoo ebuild. The profiling method: oprofile is a sampling profiler, while dsimcha probably instrumented the code. Scope of profiled functions: dsimcha talks about the GC, while I talk about memory management functions in general. Allocating an array or setting its size is functionality I accounted to memory management. I know to little about the GC to know which functions do the garbage collection cycles, and I wont just add up all functions having GC in their name, so a conservative guess could confirm what dsimcha said. If you want to you can take a look at the report file. The unindented lines contain the percentage for that function excluding sub-calls. The binary was compiled with dmd -O -release -noboundscheck -g, in 64-bit mode.

Forums