Jump to page: 1 24  
Page
Thread overview
A benchmark, mostly GC
Dec 11, 2011
bearophile
Dec 11, 2011
dsimcha
Dec 11, 2011
Dejan Lekic
Dec 11, 2011
deadalnix
Dec 11, 2011
Timon Gehr
Dec 11, 2011
deadalnix
Dec 11, 2011
Timon Gehr
Dec 11, 2011
deadalnix
Dec 11, 2011
Timon Gehr
Dec 11, 2011
deadalnix
Dec 12, 2011
Timon Gehr
Dec 11, 2011
dsimcha
Dec 12, 2011
Timon Gehr
Dec 12, 2011
dsimcha
Dec 12, 2011
Timon Gehr
Dec 12, 2011
Robert Jacques
Dec 12, 2011
Brad Anderson
Dec 12, 2011
Robert Jacques
Dec 12, 2011
Timon Gehr
Dec 12, 2011
Manu
Dec 12, 2011
Jakob Ovrum
Dec 12, 2011
Jonathan M Davis
Dec 13, 2011
Robert Jacques
Dec 13, 2011
Paulo Pinto
Dec 14, 2011
Robert Jacques
Dec 14, 2011
Paulo Pinto
Dec 15, 2011
Robert Jacques
Dec 13, 2011
Don
Dec 13, 2011
bearophile
Dec 13, 2011
bearophile
Dec 12, 2011
bearophile
Dec 18, 2011
Marco Leise
Dec 18, 2011
bearophile
Dec 19, 2011
Marco Leise
December 11, 2011
This program used here as a benchmark is a bit reduced from a rosettacode task, it finds how many ways are there to make change for 100_000 euro (the argument 'amount' represents cents of euro) using the common coins.

The result is:
992198221207406412424859964272600001

The timings, best of 3, seconds:
  DMD:    22.5
  Python:  9.3
  Java:    2.9

DMD 2.057beta, -O -release -inline
Java SE 1.7.0_01-b08 (used without -server)
Python 2.6.6
32 bit Windows system.

The D version is about 7.7 times slower than the Java client version. I have seen that most running time in the D code is spent in this line that performs the BigInt sum:

table[pos] += table[pos - 1];

With some tests I think most of the run time of the D version is used by the GC, so this is mostly a GC benchmark (so here the sum routines of D BigInts are probably good enough). I think in this benchmark the low D GC performance is not caused by the not precise nature of the D GC (because during the running the amount of memory used is constantly 7.7 MB), but mostly by the amount of garbage produced each second. So maybe it's the Young Generation of the Java GC that is helping a lot here. But even the reference count GC of Python is more efficient here.

I have not timed the GC performance with the recent experiments done by dsimcha on the D GC discussed here, a timing value with those changes is welcome:
https://github.com/dsimcha/druntime/wiki/GC-Optimizations-Round-2

--------------------------------

// D2 version
import std.stdio, std.bigint;

BigInt countChanges(in int amount, in int[] coins) {
    immutable int n = coins.length;
    int cycle = 0;
    foreach (int c; coins)
        if (c <= amount && c >= cycle)
            cycle = c + 1;
    cycle *= n;
    auto table = new BigInt[cycle];
    // table[0 .. n] = 1;
    table[0 .. n] = BigInt(1);

    int pos = n;
    for (int s = 1; s <= amount; s++) {
        for (int i = 0; i < n; i++) {
            if (i == 0 && pos >= cycle)
                pos = 0;
            if (coins[i] <= s) {
                immutable int q = pos - (coins[i] * n);
                table[pos] = (q >= 0) ? table[q] : table[q + cycle];
            }
            if (i != 0)
                table[pos] += table[pos - 1];//most time spent here
            pos++;
        }
    }

    return table[pos - 1];
}

void main() {
    const int[] coins = [200, 100, 50, 20, 10, 5, 2, 1];
    writeln(countChanges(10000000, coins));
}

--------------------------------

// Java version
import java.util.Arrays;
import java.math.BigInteger;

final class Coins {
    private static BigInteger countChanges(int amount, int[] coins){
        final int n = coins.length;
        int cycle = 0;
        for (int c : coins)
            if (c <= amount && c >= cycle)
                cycle = c + 1;
        cycle *= n;
        BigInteger[] table = new BigInteger[cycle];
        Arrays.fill(table, 0, n, BigInteger.ONE);
        Arrays.fill(table, n, cycle, BigInteger.ZERO);

        int pos = n;
        for (int s = 1; s <= amount; s++) {
            for (int i = 0; i < n; i++) {
                if (i == 0 && pos >= cycle)
                    pos = 0;
                if (coins[i] <= s) {
                    final int q = pos - (coins[i] * n);
                    table[pos] = (q >= 0) ? table[q] : table[q + cycle];
                }
                if (i != 0)
                    table[pos] = table[pos].add(table[pos - 1]);
                pos++;
            }
        }

        return table[pos - 1];
    }

    public static void main(String[] args) {
        final int[] coins = {200, 100, 50, 20, 10, 5, 2, 1};
        System.out.println(countChanges(10000000, coins));
    }
}

--------------------------------

# Python2.6 + Psyco version
try:
    import psyco
    psyco.full()
except ImportError:
    pass

def count_changes(amount_cents, coins):
    n = len(coins)
    cycle = max([c+1 for c in coins if c <= amount_cents]) * n
    table = [0] * cycle
    for i in xrange(n):
        table[i] = 1

    pos = n
    for s in xrange(1, amount_cents + 1):
        for i in xrange(n):
            if i == 0 and pos >= cycle:
                pos = 0
            if coins[i] <= s:
                q = pos - coins[i] * n
                table[pos] = table[q] if (q >= 0) else table[q + cycle]
            if i:
                table[pos] += table[pos - 1]
            pos += 1
    return table[pos - 1]

def main():
    coins = [200, 100, 50, 20, 10, 5, 2, 1]
    print count_changes(10000000, coins)

main()

--------------------------------

Bye,
bearophile
December 11, 2011
On 12/11/2011 8:48 AM, bearophile wrote:
> This program used here as a benchmark is a bit reduced from a rosettacode task, it finds how many ways are there to make change for 100_000 euro (the argument 'amount' represents cents of euro) using the common coins.
>
> The result is:
> 992198221207406412424859964272600001
>
> The timings, best of 3, seconds:
>    DMD:    22.5
>    Python:  9.3
>    Java:    2.9
>
> DMD 2.057beta, -O -release -inline
> Java SE 1.7.0_01-b08 (used without -server)
> Python 2.6.6
> 32 bit Windows system.
>
> The D version is about 7.7 times slower than the Java client version. I have seen that most running time in the D code is spent in this line that performs the BigInt sum:
>
> table[pos] += table[pos - 1];
>
> With some tests I think most of the run time of the D version is used by the GC, so this is mostly a GC benchmark (so here the sum routines of D BigInts are probably good enough). I think in this benchmark the low D GC performance is not caused by the not precise nature of the D GC (because during the running the amount of memory used is constantly 7.7 MB), but mostly by the amount of garbage produced each second. So maybe it's the Young Generation of the Java GC that is helping a lot here. But even the reference count GC of Python is more efficient here.
>
> I have not timed the GC performance with the recent experiments done by dsimcha on the D GC discussed here, a timing value with those changes is welcome:
> https://github.com/dsimcha/druntime/wiki/GC-Optimizations-Round-2

My optimizations make very little difference on this benchmark, but for good reason:  It's not a very good GC benchmark.  I ran it with my GC profiling code enabled and it only spends around 10% of its execution time in GC.  We need to figure out why else this benchmark may be so slow.

Furthermore, because this benchmark uses so little GC time, my improvements are mostly buried in noise.  Here are some sample runs on 2.057 beta with and without my GC improvements.  However, I make no claim that these results are statistically significant, as when I ran the benchmark a few times the variance was quite high and I'm too lazy to run it a zillion times and get a mean and variance for each stage, etc.

Without my improvements:
992198221207406412424859964272600001
        Total GC prep time:  62 milliseconds
        Total mark time:  456 milliseconds
        Total sweep time:  1357 milliseconds
        Total page recovery time:  439 milliseconds
        Grand total GC time:  2314 milliseconds

Process returned 0 (0x0)   execution time : 20.776 s

With my improvements:
992198221207406412424859964272600001
        Total GC prep time:  16 milliseconds
        Total mark time:  393 milliseconds
        Total sweep time:  1049 milliseconds
        Total page recovery time:  297 milliseconds
        Grand total GC time:  1755 milliseconds

Process returned 0 (0x0)   execution time : 19.287 s
December 11, 2011
It would be really great to implement a generational GC similar to the C4 mentioned in this presentation: http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection . There is a research-paper about C4 as well: http://dl.acm.org/citation.cfm?id=1993491 .
December 11, 2011
On 12/11/11 9:23 AM, dsimcha wrote:
> My optimizations make very little difference on this benchmark, but for
> good reason: It's not a very good GC benchmark. I ran it with my GC
> profiling code enabled and it only spends around 10% of its execution
> time in GC. We need to figure out why else this benchmark may be so slow.

I'll venture an opinion (without having profiled the code).

The code uses BigInt, which makes eager allocations and custom work for each operation. But a good portion of the loop is spent using small numbers, so using the small integer optimization and native operations helps a lot.

I think we need to add reference counting and small int optimization to BigInt to make it competitive.


Andrei
December 11, 2011
Le 11/12/2011 16:43, Dejan Lekic a écrit :
>
> It would be really great to implement a generational GC similar to the
> C4 mentioned in this presentation:
> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
> . There is a research-paper about C4 as well:
> http://dl.acm.org/citation.cfm?id=1993491 .

This GC (like all compacting GC) require that the GC knows what is a reference/pointer and what isn't.

This is possible in D using the compile time reflexion to build a runtime reflexion. But the standard lib is far away from that. This will also bloat the executable with reflexion's data, thing that some don't want.

By the way, this show up the fact that, depending on needs, people will require different GC and that the lib should allow us to choose the one we want. For exemple with a compiler switch (gc-module=xxx for example).
December 11, 2011
On 12/11/2011 07:02 PM, deadalnix wrote:
> Le 11/12/2011 16:43, Dejan Lekic a écrit :
>>
>> It would be really great to implement a generational GC similar to the
>> C4 mentioned in this presentation:
>> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
>> . There is a research-paper about C4 as well:
>> http://dl.acm.org/citation.cfm?id=1993491 .
>
> This GC (like all compacting GC) require that the GC knows what is a
> reference/pointer and what isn't.
>
> This is possible in D using the compile time reflexion to build a
> runtime reflexion. But the standard lib is far away from that. This will
> also bloat the executable with reflexion's data, thing that some don't
> want.

The information has to be generated by the compiler, not by the library.

>
> By the way, this show up the fact that, depending on needs, people will
> require different GC and that the lib should allow us to choose the one
> we want. For exemple with a compiler switch (gc-module=xxx for example).

December 11, 2011
Le 11/12/2011 21:55, Timon Gehr a écrit :
> On 12/11/2011 07:02 PM, deadalnix wrote:
>> Le 11/12/2011 16:43, Dejan Lekic a écrit :
>>>
>>> It would be really great to implement a generational GC similar to the
>>> C4 mentioned in this presentation:
>>> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
>>> . There is a research-paper about C4 as well:
>>> http://dl.acm.org/citation.cfm?id=1993491 .
>>
>> This GC (like all compacting GC) require that the GC knows what is a
>> reference/pointer and what isn't.
>>
>> This is possible in D using the compile time reflexion to build a
>> runtime reflexion. But the standard lib is far away from that. This will
>> also bloat the executable with reflexion's data, thing that some don't
>> want.
>
> The information has to be generated by the compiler, not by the library.
>

This is not the way choosen by D. And the way choosen by D is, IMO, supperior.

D choosed to have compiler time reflexion, but no runtime reflexion (compiler won't generate it). Given that, and the great capability of D for generic programming, the best solution is to generate information at using compile time reflexion using a lib, to get runtime reflexion.

Many application do not want the bloat created by runtime reflexion. This is up to the user to compile with such a lib or not.
December 11, 2011
On 12/11/2011 10:13 PM, deadalnix wrote:
> Le 11/12/2011 21:55, Timon Gehr a écrit :
>> On 12/11/2011 07:02 PM, deadalnix wrote:
>>> Le 11/12/2011 16:43, Dejan Lekic a écrit :
>>>>
>>>> It would be really great to implement a generational GC similar to the
>>>> C4 mentioned in this presentation:
>>>> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
>>>>
>>>> . There is a research-paper about C4 as well:
>>>> http://dl.acm.org/citation.cfm?id=1993491 .
>>>
>>> This GC (like all compacting GC) require that the GC knows what is a
>>> reference/pointer and what isn't.
>>>
>>> This is possible in D using the compile time reflexion to build a
>>> runtime reflexion. But the standard lib is far away from that. This will
>>> also bloat the executable with reflexion's data, thing that some don't
>>> want.
>>
>> The information has to be generated by the compiler, not by the library.
>>
>
> This is not the way choosen by D. And the way choosen by D is, IMO,
> supperior.
>
> D choosed to have compiler time reflexion, but no runtime reflexion
> (compiler won't generate it). Given that, and the great capability of D
> for generic programming, the best solution is to generate information at
> using compile time reflexion using a lib, to get runtime reflexion.
>
> Many application do not want the bloat created by runtime reflexion.
> This is up to the user to compile with such a lib or not.

We are talking about supporting precise GC, not about custom runtime reflection. There is no way to get precise GC right without compiler support.
December 11, 2011
Le 11/12/2011 22:26, Timon Gehr a écrit :
> On 12/11/2011 10:13 PM, deadalnix wrote:
>> Le 11/12/2011 21:55, Timon Gehr a écrit :
>>> On 12/11/2011 07:02 PM, deadalnix wrote:
>>>> Le 11/12/2011 16:43, Dejan Lekic a écrit :
>>>>>
>>>>> It would be really great to implement a generational GC similar to the
>>>>> C4 mentioned in this presentation:
>>>>> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
>>>>>
>>>>>
>>>>> . There is a research-paper about C4 as well:
>>>>> http://dl.acm.org/citation.cfm?id=1993491 .
>>>>
>>>> This GC (like all compacting GC) require that the GC knows what is a
>>>> reference/pointer and what isn't.
>>>>
>>>> This is possible in D using the compile time reflexion to build a
>>>> runtime reflexion. But the standard lib is far away from that. This
>>>> will
>>>> also bloat the executable with reflexion's data, thing that some don't
>>>> want.
>>>
>>> The information has to be generated by the compiler, not by the library.
>>>
>>
>> This is not the way choosen by D. And the way choosen by D is, IMO,
>> supperior.
>>
>> D choosed to have compiler time reflexion, but no runtime reflexion
>> (compiler won't generate it). Given that, and the great capability of D
>> for generic programming, the best solution is to generate information at
>> using compile time reflexion using a lib, to get runtime reflexion.
>>
>> Many application do not want the bloat created by runtime reflexion.
>> This is up to the user to compile with such a lib or not.
>
> We are talking about supporting precise GC, not about custom runtime
> reflection. There is no way to get precise GC right without compiler
> support.

It is clearly possible by marking not movable what is possibly pointer in the stack and registers and some runtime reflexion used by the GC.
December 11, 2011
On 12/11/2011 11:27 PM, deadalnix wrote:
> Le 11/12/2011 22:26, Timon Gehr a écrit :
>> On 12/11/2011 10:13 PM, deadalnix wrote:
>>> Le 11/12/2011 21:55, Timon Gehr a écrit :
>>>> On 12/11/2011 07:02 PM, deadalnix wrote:
>>>>> Le 11/12/2011 16:43, Dejan Lekic a écrit :
>>>>>>
>>>>>> It would be really great to implement a generational GC similar to
>>>>>> the
>>>>>> C4 mentioned in this presentation:
>>>>>> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
>>>>>>
>>>>>>
>>>>>>
>>>>>> . There is a research-paper about C4 as well:
>>>>>> http://dl.acm.org/citation.cfm?id=1993491 .
>>>>>
>>>>> This GC (like all compacting GC) require that the GC knows what is a
>>>>> reference/pointer and what isn't.
>>>>>
>>>>> This is possible in D using the compile time reflexion to build a
>>>>> runtime reflexion. But the standard lib is far away from that. This
>>>>> will
>>>>> also bloat the executable with reflexion's data, thing that some don't
>>>>> want.
>>>>
>>>> The information has to be generated by the compiler, not by the
>>>> library.
>>>>
>>>
>>> This is not the way choosen by D. And the way choosen by D is, IMO,
>>> supperior.
>>>
>>> D choosed to have compiler time reflexion, but no runtime reflexion
>>> (compiler won't generate it). Given that, and the great capability of D
>>> for generic programming, the best solution is to generate information at
>>> using compile time reflexion using a lib, to get runtime reflexion.
>>>
>>> Many application do not want the bloat created by runtime reflexion.
>>> This is up to the user to compile with such a lib or not.
>>
>> We are talking about supporting precise GC, not about custom runtime
>> reflection. There is no way to get precise GC right without compiler
>> support.
>
> It is clearly possible by marking not movable what is possibly pointer
> in the stack and registers and some runtime reflexion used by the GC.

In *precise GC* speak, 'possibly pointer' is a banned term. You understand that? Also compile-time reflection is IMPOSSIBLE if you don't have the source code. It cannot work in a meaningful way.

GC support must be built-in. Every attempt to make such a fundamental feature depend on the standard library is comical.
« First   ‹ Prev
1 2 3 4