Thread overview | |||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
December 11, 2011 A benchmark, mostly GC | ||||
---|---|---|---|---|
| ||||
This program used here as a benchmark is a bit reduced from a rosettacode task, it finds how many ways are there to make change for 100_000 euro (the argument 'amount' represents cents of euro) using the common coins. The result is: 992198221207406412424859964272600001 The timings, best of 3, seconds: DMD: 22.5 Python: 9.3 Java: 2.9 DMD 2.057beta, -O -release -inline Java SE 1.7.0_01-b08 (used without -server) Python 2.6.6 32 bit Windows system. The D version is about 7.7 times slower than the Java client version. I have seen that most running time in the D code is spent in this line that performs the BigInt sum: table[pos] += table[pos - 1]; With some tests I think most of the run time of the D version is used by the GC, so this is mostly a GC benchmark (so here the sum routines of D BigInts are probably good enough). I think in this benchmark the low D GC performance is not caused by the not precise nature of the D GC (because during the running the amount of memory used is constantly 7.7 MB), but mostly by the amount of garbage produced each second. So maybe it's the Young Generation of the Java GC that is helping a lot here. But even the reference count GC of Python is more efficient here. I have not timed the GC performance with the recent experiments done by dsimcha on the D GC discussed here, a timing value with those changes is welcome: https://github.com/dsimcha/druntime/wiki/GC-Optimizations-Round-2 -------------------------------- // D2 version import std.stdio, std.bigint; BigInt countChanges(in int amount, in int[] coins) { immutable int n = coins.length; int cycle = 0; foreach (int c; coins) if (c <= amount && c >= cycle) cycle = c + 1; cycle *= n; auto table = new BigInt[cycle]; // table[0 .. n] = 1; table[0 .. n] = BigInt(1); int pos = n; for (int s = 1; s <= amount; s++) { for (int i = 0; i < n; i++) { if (i == 0 && pos >= cycle) pos = 0; if (coins[i] <= s) { immutable int q = pos - (coins[i] * n); table[pos] = (q >= 0) ? table[q] : table[q + cycle]; } if (i != 0) table[pos] += table[pos - 1];//most time spent here pos++; } } return table[pos - 1]; } void main() { const int[] coins = [200, 100, 50, 20, 10, 5, 2, 1]; writeln(countChanges(10000000, coins)); } -------------------------------- // Java version import java.util.Arrays; import java.math.BigInteger; final class Coins { private static BigInteger countChanges(int amount, int[] coins){ final int n = coins.length; int cycle = 0; for (int c : coins) if (c <= amount && c >= cycle) cycle = c + 1; cycle *= n; BigInteger[] table = new BigInteger[cycle]; Arrays.fill(table, 0, n, BigInteger.ONE); Arrays.fill(table, n, cycle, BigInteger.ZERO); int pos = n; for (int s = 1; s <= amount; s++) { for (int i = 0; i < n; i++) { if (i == 0 && pos >= cycle) pos = 0; if (coins[i] <= s) { final int q = pos - (coins[i] * n); table[pos] = (q >= 0) ? table[q] : table[q + cycle]; } if (i != 0) table[pos] = table[pos].add(table[pos - 1]); pos++; } } return table[pos - 1]; } public static void main(String[] args) { final int[] coins = {200, 100, 50, 20, 10, 5, 2, 1}; System.out.println(countChanges(10000000, coins)); } } -------------------------------- # Python2.6 + Psyco version try: import psyco psyco.full() except ImportError: pass def count_changes(amount_cents, coins): n = len(coins) cycle = max([c+1 for c in coins if c <= amount_cents]) * n table = [0] * cycle for i in xrange(n): table[i] = 1 pos = n for s in xrange(1, amount_cents + 1): for i in xrange(n): if i == 0 and pos >= cycle: pos = 0 if coins[i] <= s: q = pos - coins[i] * n table[pos] = table[q] if (q >= 0) else table[q + cycle] if i: table[pos] += table[pos - 1] pos += 1 return table[pos - 1] def main(): coins = [200, 100, 50, 20, 10, 5, 2, 1] print count_changes(10000000, coins) main() -------------------------------- Bye, bearophile |
December 11, 2011 Re: A benchmark, mostly GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 12/11/2011 8:48 AM, bearophile wrote: > This program used here as a benchmark is a bit reduced from a rosettacode task, it finds how many ways are there to make change for 100_000 euro (the argument 'amount' represents cents of euro) using the common coins. > > The result is: > 992198221207406412424859964272600001 > > The timings, best of 3, seconds: > DMD: 22.5 > Python: 9.3 > Java: 2.9 > > DMD 2.057beta, -O -release -inline > Java SE 1.7.0_01-b08 (used without -server) > Python 2.6.6 > 32 bit Windows system. > > The D version is about 7.7 times slower than the Java client version. I have seen that most running time in the D code is spent in this line that performs the BigInt sum: > > table[pos] += table[pos - 1]; > > With some tests I think most of the run time of the D version is used by the GC, so this is mostly a GC benchmark (so here the sum routines of D BigInts are probably good enough). I think in this benchmark the low D GC performance is not caused by the not precise nature of the D GC (because during the running the amount of memory used is constantly 7.7 MB), but mostly by the amount of garbage produced each second. So maybe it's the Young Generation of the Java GC that is helping a lot here. But even the reference count GC of Python is more efficient here. > > I have not timed the GC performance with the recent experiments done by dsimcha on the D GC discussed here, a timing value with those changes is welcome: > https://github.com/dsimcha/druntime/wiki/GC-Optimizations-Round-2 My optimizations make very little difference on this benchmark, but for good reason: It's not a very good GC benchmark. I ran it with my GC profiling code enabled and it only spends around 10% of its execution time in GC. We need to figure out why else this benchmark may be so slow. Furthermore, because this benchmark uses so little GC time, my improvements are mostly buried in noise. Here are some sample runs on 2.057 beta with and without my GC improvements. However, I make no claim that these results are statistically significant, as when I ran the benchmark a few times the variance was quite high and I'm too lazy to run it a zillion times and get a mean and variance for each stage, etc. Without my improvements: 992198221207406412424859964272600001 Total GC prep time: 62 milliseconds Total mark time: 456 milliseconds Total sweep time: 1357 milliseconds Total page recovery time: 439 milliseconds Grand total GC time: 2314 milliseconds Process returned 0 (0x0) execution time : 20.776 s With my improvements: 992198221207406412424859964272600001 Total GC prep time: 16 milliseconds Total mark time: 393 milliseconds Total sweep time: 1049 milliseconds Total page recovery time: 297 milliseconds Grand total GC time: 1755 milliseconds Process returned 0 (0x0) execution time : 19.287 s |
December 11, 2011 Re: A benchmark, mostly GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to dsimcha | It would be really great to implement a generational GC similar to the C4 mentioned in this presentation: http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection . There is a research-paper about C4 as well: http://dl.acm.org/citation.cfm?id=1993491 . |
December 11, 2011 Re: A benchmark, mostly GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to dsimcha | On 12/11/11 9:23 AM, dsimcha wrote:
> My optimizations make very little difference on this benchmark, but for
> good reason: It's not a very good GC benchmark. I ran it with my GC
> profiling code enabled and it only spends around 10% of its execution
> time in GC. We need to figure out why else this benchmark may be so slow.
I'll venture an opinion (without having profiled the code).
The code uses BigInt, which makes eager allocations and custom work for each operation. But a good portion of the loop is spent using small numbers, so using the small integer optimization and native operations helps a lot.
I think we need to add reference counting and small int optimization to BigInt to make it competitive.
Andrei
|
December 11, 2011 Re: A benchmark, mostly GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dejan Lekic | Le 11/12/2011 16:43, Dejan Lekic a écrit :
>
> It would be really great to implement a generational GC similar to the
> C4 mentioned in this presentation:
> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
> . There is a research-paper about C4 as well:
> http://dl.acm.org/citation.cfm?id=1993491 .
This GC (like all compacting GC) require that the GC knows what is a reference/pointer and what isn't.
This is possible in D using the compile time reflexion to build a runtime reflexion. But the standard lib is far away from that. This will also bloat the executable with reflexion's data, thing that some don't want.
By the way, this show up the fact that, depending on needs, people will require different GC and that the lib should allow us to choose the one we want. For exemple with a compiler switch (gc-module=xxx for example).
|
December 11, 2011 Re: A benchmark, mostly GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On 12/11/2011 07:02 PM, deadalnix wrote: > Le 11/12/2011 16:43, Dejan Lekic a écrit : >> >> It would be really great to implement a generational GC similar to the >> C4 mentioned in this presentation: >> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection >> . There is a research-paper about C4 as well: >> http://dl.acm.org/citation.cfm?id=1993491 . > > This GC (like all compacting GC) require that the GC knows what is a > reference/pointer and what isn't. > > This is possible in D using the compile time reflexion to build a > runtime reflexion. But the standard lib is far away from that. This will > also bloat the executable with reflexion's data, thing that some don't > want. The information has to be generated by the compiler, not by the library. > > By the way, this show up the fact that, depending on needs, people will > require different GC and that the lib should allow us to choose the one > we want. For exemple with a compiler switch (gc-module=xxx for example). |
December 11, 2011 Re: A benchmark, mostly GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | Le 11/12/2011 21:55, Timon Gehr a écrit :
> On 12/11/2011 07:02 PM, deadalnix wrote:
>> Le 11/12/2011 16:43, Dejan Lekic a écrit :
>>>
>>> It would be really great to implement a generational GC similar to the
>>> C4 mentioned in this presentation:
>>> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
>>> . There is a research-paper about C4 as well:
>>> http://dl.acm.org/citation.cfm?id=1993491 .
>>
>> This GC (like all compacting GC) require that the GC knows what is a
>> reference/pointer and what isn't.
>>
>> This is possible in D using the compile time reflexion to build a
>> runtime reflexion. But the standard lib is far away from that. This will
>> also bloat the executable with reflexion's data, thing that some don't
>> want.
>
> The information has to be generated by the compiler, not by the library.
>
This is not the way choosen by D. And the way choosen by D is, IMO, supperior.
D choosed to have compiler time reflexion, but no runtime reflexion (compiler won't generate it). Given that, and the great capability of D for generic programming, the best solution is to generate information at using compile time reflexion using a lib, to get runtime reflexion.
Many application do not want the bloat created by runtime reflexion. This is up to the user to compile with such a lib or not.
|
December 11, 2011 Re: A benchmark, mostly GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On 12/11/2011 10:13 PM, deadalnix wrote:
> Le 11/12/2011 21:55, Timon Gehr a écrit :
>> On 12/11/2011 07:02 PM, deadalnix wrote:
>>> Le 11/12/2011 16:43, Dejan Lekic a écrit :
>>>>
>>>> It would be really great to implement a generational GC similar to the
>>>> C4 mentioned in this presentation:
>>>> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
>>>>
>>>> . There is a research-paper about C4 as well:
>>>> http://dl.acm.org/citation.cfm?id=1993491 .
>>>
>>> This GC (like all compacting GC) require that the GC knows what is a
>>> reference/pointer and what isn't.
>>>
>>> This is possible in D using the compile time reflexion to build a
>>> runtime reflexion. But the standard lib is far away from that. This will
>>> also bloat the executable with reflexion's data, thing that some don't
>>> want.
>>
>> The information has to be generated by the compiler, not by the library.
>>
>
> This is not the way choosen by D. And the way choosen by D is, IMO,
> supperior.
>
> D choosed to have compiler time reflexion, but no runtime reflexion
> (compiler won't generate it). Given that, and the great capability of D
> for generic programming, the best solution is to generate information at
> using compile time reflexion using a lib, to get runtime reflexion.
>
> Many application do not want the bloat created by runtime reflexion.
> This is up to the user to compile with such a lib or not.
We are talking about supporting precise GC, not about custom runtime reflection. There is no way to get precise GC right without compiler support.
|
December 11, 2011 Re: A benchmark, mostly GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | Le 11/12/2011 22:26, Timon Gehr a écrit :
> On 12/11/2011 10:13 PM, deadalnix wrote:
>> Le 11/12/2011 21:55, Timon Gehr a écrit :
>>> On 12/11/2011 07:02 PM, deadalnix wrote:
>>>> Le 11/12/2011 16:43, Dejan Lekic a écrit :
>>>>>
>>>>> It would be really great to implement a generational GC similar to the
>>>>> C4 mentioned in this presentation:
>>>>> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
>>>>>
>>>>>
>>>>> . There is a research-paper about C4 as well:
>>>>> http://dl.acm.org/citation.cfm?id=1993491 .
>>>>
>>>> This GC (like all compacting GC) require that the GC knows what is a
>>>> reference/pointer and what isn't.
>>>>
>>>> This is possible in D using the compile time reflexion to build a
>>>> runtime reflexion. But the standard lib is far away from that. This
>>>> will
>>>> also bloat the executable with reflexion's data, thing that some don't
>>>> want.
>>>
>>> The information has to be generated by the compiler, not by the library.
>>>
>>
>> This is not the way choosen by D. And the way choosen by D is, IMO,
>> supperior.
>>
>> D choosed to have compiler time reflexion, but no runtime reflexion
>> (compiler won't generate it). Given that, and the great capability of D
>> for generic programming, the best solution is to generate information at
>> using compile time reflexion using a lib, to get runtime reflexion.
>>
>> Many application do not want the bloat created by runtime reflexion.
>> This is up to the user to compile with such a lib or not.
>
> We are talking about supporting precise GC, not about custom runtime
> reflection. There is no way to get precise GC right without compiler
> support.
It is clearly possible by marking not movable what is possibly pointer in the stack and registers and some runtime reflexion used by the GC.
|
December 11, 2011 Re: A benchmark, mostly GC | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On 12/11/2011 11:27 PM, deadalnix wrote:
> Le 11/12/2011 22:26, Timon Gehr a écrit :
>> On 12/11/2011 10:13 PM, deadalnix wrote:
>>> Le 11/12/2011 21:55, Timon Gehr a écrit :
>>>> On 12/11/2011 07:02 PM, deadalnix wrote:
>>>>> Le 11/12/2011 16:43, Dejan Lekic a écrit :
>>>>>>
>>>>>> It would be really great to implement a generational GC similar to
>>>>>> the
>>>>>> C4 mentioned in this presentation:
>>>>>> http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection
>>>>>>
>>>>>>
>>>>>>
>>>>>> . There is a research-paper about C4 as well:
>>>>>> http://dl.acm.org/citation.cfm?id=1993491 .
>>>>>
>>>>> This GC (like all compacting GC) require that the GC knows what is a
>>>>> reference/pointer and what isn't.
>>>>>
>>>>> This is possible in D using the compile time reflexion to build a
>>>>> runtime reflexion. But the standard lib is far away from that. This
>>>>> will
>>>>> also bloat the executable with reflexion's data, thing that some don't
>>>>> want.
>>>>
>>>> The information has to be generated by the compiler, not by the
>>>> library.
>>>>
>>>
>>> This is not the way choosen by D. And the way choosen by D is, IMO,
>>> supperior.
>>>
>>> D choosed to have compiler time reflexion, but no runtime reflexion
>>> (compiler won't generate it). Given that, and the great capability of D
>>> for generic programming, the best solution is to generate information at
>>> using compile time reflexion using a lib, to get runtime reflexion.
>>>
>>> Many application do not want the bloat created by runtime reflexion.
>>> This is up to the user to compile with such a lib or not.
>>
>> We are talking about supporting precise GC, not about custom runtime
>> reflection. There is no way to get precise GC right without compiler
>> support.
>
> It is clearly possible by marking not movable what is possibly pointer
> in the stack and registers and some runtime reflexion used by the GC.
In *precise GC* speak, 'possibly pointer' is a banned term. You understand that? Also compile-time reflection is IMPOSSIBLE if you don't have the source code. It cannot work in a meaningful way.
GC support must be built-in. Every attempt to make such a fundamental feature depend on the standard library is comical.
|
Copyright © 1999-2021 by the D Language Foundation