| Thread overview | |||||||||
|---|---|---|---|---|---|---|---|---|---|
|
May 22, 2009 why allocation of large amount of small objects so slow (x10) in D? | ||||
|---|---|---|---|---|
| ||||
$ g++ alloc.cpp -o alloc $ time ./alloc real 0m1.946s user 0m1.688s sys 0m0.256s $ dmd -O -release allocd.d $ time ./allocd real 0m22.734s user 0m22.353s sys 0m0.360s $ cat alloc.cpp #include <vector> typedef std::vector<int> intvec; typedef intvec* intvecp; int main() { int i, n = 20000000; intvecp* iva; iva = new intvecp[n]; for (i = n; i-- > 0; ) { iva[i] = new intvec(); } return 0; } $ cat allocd.d int main() { int i, n = 20000000; Object[] oa; oa = new Object[n]; for (i = n; i-- > 0; ) { oa[i] = new Object(); } return 0; } | ||||
May 22, 2009 Re: why allocation of large amount of small objects so slow (x10) in D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to nobody | nobody, el 22 de mayo a las 00:08 me escribiste: > $ g++ alloc.cpp -o alloc > $ time ./alloc > real 0m1.946s > user 0m1.688s > sys 0m0.256s > > $ dmd -O -release allocd.d > $ time ./allocd > real 0m22.734s > user 0m22.353s > sys 0m0.360s > > $ cat alloc.cpp > #include <vector> > > typedef std::vector<int> intvec; > typedef intvec* intvecp; > > int main() { > int i, n = 20000000; > intvecp* iva; > iva = new intvecp[n]; > for (i = n; i-- > 0; ) { > iva[i] = new intvec(); > } > > return 0; > } > > $ cat allocd.d > int main() { > int i, n = 20000000; > Object[] oa; > oa = new Object[n]; > for (i = n; i-- > 0; ) { > oa[i] = new Object(); > } > > return 0; > } You mean why GCC C++ compiler is faster than DM D compiler. You are comparing 2 different compiler implementations for 2 different languages. It's really hard to get any conclusion from that. It would be much more meaningfull if you compare g++ vs gdc or dmd vs dmc or ldc vs llvm-g++ (they at least use the same backend). -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Oiganmen ñatos de corazón, es más posible que un potus florezca en primavera a que un ángel pase con una remera. -- Peperino Pómoro | |||
May 22, 2009 Re: why allocation of large amount of small objects so slow (x10) in D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to nobody | == Quote from nobody (no@where.com)'s article
> $ g++ alloc.cpp -o alloc
> $ time ./alloc
> real 0m1.946s
> user 0m1.688s
> sys 0m0.256s
> $ dmd -O -release allocd.d
> $ time ./allocd
> real 0m22.734s
> user 0m22.353s
> sys 0m0.360s
> $ cat alloc.cpp
> #include <vector>
> typedef std::vector<int> intvec;
> typedef intvec* intvecp;
> int main() {
> int i, n = 20000000;
> intvecp* iva;
> iva = new intvecp[n];
> for (i = n; i-- > 0; ) {
> iva[i] = new intvec();
> }
> return 0;
> }
> $ cat allocd.d
> int main() {
> int i, n = 20000000;
> Object[] oa;
> oa = new Object[n];
> for (i = n; i-- > 0; ) {
> oa[i] = new Object();
> }
> return 0;
> }
You've probably hit a corner case for the garbage collector. When you allocate 20,000,000 Object instances and hold onto the references, the garbage collector probably runs about a zillion times and never finds anything to delete. If this is a bottleneck, you should temporarily disable the garbage collector in these situations, until you are done with all these allocations, then re-enable it.
| |||
May 22, 2009 Re: why allocation of large amount of small objects so slow (x10) in D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | > You've probably hit a corner case for the garbage collector. When you allocate 20,000,000 Object instances and hold onto the references, the garbage collector probably runs about a zillion times and never finds anything to delete. If this is a bottleneck, you should temporarily disable the garbage collector in these situations, until you are done with all these allocations, then re-enable it.
Thanks. Disble gc improve the speed by 5x:
$ dmd -O -release allocd.d
$ time ./allocd
real 0m4.080s
user 0m3.644s
sys 0m0.420s
$ cat allocd.d
import core.memory;
int main() {
core.memory.GC.disable();
int i, n = 20000000;
Object[] oa;
oa = new Object[n];
for (i = n; i-- > 0; ) {
oa[i] = new Object();
}
core.memory.GC.enable();
return 0;
}
| |||
May 22, 2009 Re: why allocation of large amount of small objects so slow (x10) in D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to nobody | nobody wrote: > $ g++ alloc.cpp -o alloc > $ time ./alloc > real 0m1.946s > user 0m1.688s > sys 0m0.256s > > $ dmd -O -release allocd.d > $ time ./allocd > real 0m22.734s > user 0m22.353s > sys 0m0.360s > > $ cat alloc.cpp > #include <vector> > > typedef std::vector<int> intvec; > typedef intvec* intvecp; > > int main() { > int i, n = 20000000; > intvecp* iva; > iva = new intvecp[n]; > for (i = n; i-- > 0; ) { > iva[i] = new intvec(); > } > > return 0; > } > > $ cat allocd.d > int main() { > int i, n = 20000000; > Object[] oa; > oa = new Object[n]; > for (i = n; i-- > 0; ) { > oa[i] = new Object(); > } > > return 0; > } I use this a structure for arena-based memory allocation (attached). Example of use: import candy.util.MemPool MemStack!() stack; class MyObject { mixin MemPoolNew!(stack); } int main() { stack.push(); int i, n = 20000000; MyObject[] oa; oa = new MyObject[n]; for (i = n; i-- > 0; ) { oa[i] = new MyObject(); } stack.pop(); return 0; } The push() and pop() allows memory to be allocated and deallocated as large blocks. However, you shouldn't need to deallocate manually -- it's GCed memory, so ideally the GC should free it when it's no longer referenced. That being said, I've run some tests, and the GC will free it *eventually*, but it allocates 4-6x as much memory as it needs before it starts freeing it, even when GC.collect() is called manually. -------------------- /** * Provides a pool of GCed memory to allocate things from a block. * This maintains cache coherency for related types (i.e. tree nodes). * It doesn't garuntee any ordering, though, the array struct should be * used for that. Also, everything has to be freed at once, freeing one * portion of this has no effect. * * Based on a similar concept posted by bearophile at: * http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=88227 */ public struct MemPool(size_t BLOCK_SIZE = 1 << 14) { private void* next; // Next available block private void* end; // End of the current block private void*[] blocks; public void* alloc(size_t sz) { sz = ((sz + 7) & ~7); // Next multiple of 8 if this isn't a multiple of 8 if (this.next + sz >= this.end) { void* blk = GC.calloc(BLOCK_SIZE); this.blocks.length = this.blocks.length + 1; this.blocks[$ - 1] = blk; this.next = blk; this.end = blk + BLOCK_SIZE; } void* ret = this.next; this.next += sz; return ret; } public void free() { foreach(blk; this.blocks) GC.free(blk); this.blocks = null; this.blocks.length = 0; this.next = null; this.end = null; } } /** * Wrapper for MemPool that allocates the given struct */ public struct StructPool(T) { private MemPool!() pool; public T* alloc() { return cast(T*) pool.alloc(T.sizeof); } } public struct MemStack(size_t BLOCK_SIZE = 1 << 14) { private Stack!(MemPool!(BLOCK_SIZE)*, 16, true, true) stack; public static const size_t MAX_ALLOC = BLOCK_SIZE; public void* alloc(size_t sz) { return stack.peek().alloc(sz); } public void push() { stack.push(new MemPool!(BLOCK_SIZE)); } public void pop() { stack.pop().free(); } } /** * Placement new mixin for allocating from a memory pool. Benchmarks show this * as faster than the D new in real usage (i.e. the parser runs about 1.2x * faster using this). */ public template MemPoolNew(alias Pool) { version(NoMemPool) { } else { public final new(uint sz) { return Pool.alloc(sz); } public final delete(void *p) { } } } | |||
May 22, 2009 Re: why allocation of large amount of small objects so slow (x10) in D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Robert Fraser | Robert Fraser wrote:
> nobody wrote:
>> $ g++ alloc.cpp -o alloc
>> $ time ./alloc
>> real 0m1.946s
>> user 0m1.688s
>> sys 0m0.256s
>>
>> $ dmd -O -release allocd.d
>> $ time ./allocd
>> real 0m22.734s
>> user 0m22.353s
>> sys 0m0.360s
>>
>> $ cat alloc.cpp
>> #include <vector>
>>
>> typedef std::vector<int> intvec;
>> typedef intvec* intvecp;
>>
>> int main() {
>> int i, n = 20000000;
>> intvecp* iva;
>> iva = new intvecp[n];
>> for (i = n; i-- > 0; ) {
>> iva[i] = new intvec();
>> }
>>
>> return 0;
>> }
>>
>> $ cat allocd.d
>> int main() {
>> int i, n = 20000000;
>> Object[] oa;
>> oa = new Object[n];
>> for (i = n; i-- > 0; ) {
>> oa[i] = new Object();
>> }
>>
>> return 0;
>> }
>
> I use this a structure for arena-based memory allocation (attached).
>
> Example of use:
>
> import candy.util.MemPool
>
> MemStack!() stack;
>
> class MyObject
> {
> mixin MemPoolNew!(stack);
> }
>
> int main()
> {
> stack.push();
> int i, n = 20000000;
> MyObject[] oa;
> oa = new MyObject[n];
> for (i = n; i-- > 0; )
> {
> oa[i] = new MyObject();
> }
> stack.pop();
> return 0;
> }
>
> The push() and pop() allows memory to be allocated and deallocated as large blocks. However, you shouldn't need to deallocate manually -- it's GCed memory, so ideally the GC should free it when it's no longer referenced. That being said, I've run some tests, and the GC will free it *eventually*, but it allocates 4-6x as much memory as it needs before it starts freeing it, even when GC.collect() is called manually.
>
> --------------------
>
> /**
> * Provides a pool of GCed memory to allocate things from a block.
> * This maintains cache coherency for related types (i.e. tree nodes).
> * It doesn't garuntee any ordering, though, the array struct should be
> * used for that. Also, everything has to be freed at once, freeing one
> * portion of this has no effect.
> *
> * Based on a similar concept posted by bearophile at:
> *
> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=88227
>
> */
> public struct MemPool(size_t BLOCK_SIZE = 1 << 14)
> {
> private void* next; // Next available block
> private void* end; // End of the current block
> private void*[] blocks;
>
> public void* alloc(size_t sz)
> {
> sz = ((sz + 7) & ~7); // Next multiple of 8 if this isn't a
> multiple of 8
> if (this.next + sz >= this.end)
> {
> void* blk = GC.calloc(BLOCK_SIZE);
> this.blocks.length = this.blocks.length + 1;
> this.blocks[$ - 1] = blk;
> this.next = blk;
> this.end = blk + BLOCK_SIZE;
> }
>
> void* ret = this.next;
> this.next += sz;
> return ret;
> }
>
> public void free()
> {
> foreach(blk; this.blocks)
> GC.free(blk);
> this.blocks = null;
> this.blocks.length = 0;
> this.next = null;
> this.end = null;
> }
> }
>
> /**
> * Wrapper for MemPool that allocates the given struct
> */
> public struct StructPool(T)
> {
> private MemPool!() pool;
> public T* alloc() { return cast(T*) pool.alloc(T.sizeof); }
> }
>
> public struct MemStack(size_t BLOCK_SIZE = 1 << 14)
> {
> private Stack!(MemPool!(BLOCK_SIZE)*, 16, true, true) stack;
> public static const size_t MAX_ALLOC = BLOCK_SIZE;
>
> public void* alloc(size_t sz) { return
> stack.peek().alloc(sz); }
> public void push() { stack.push(new
> MemPool!(BLOCK_SIZE)); }
> public void pop() {
> stack.pop().free(); }
> }
>
> /**
> * Placement new mixin for allocating from a memory pool. Benchmarks
> show this
> * as faster than the D new in real usage (i.e. the parser runs about 1.2x
> * faster using this).
> */
> public template MemPoolNew(alias Pool)
> {
> version(NoMemPool) { } else
> {
> public final new(uint sz) { return Pool.alloc(sz); }
> public final delete(void *p) { }
> }
> }
Oops, that needs another module. Okay, both are attached in a compile-able form.
| |||
May 22, 2009 Re: why allocation of large amount of small objects so slow (x10) in D? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to nobody | nobody wrote: >> You've probably hit a corner case for the garbage collector. When you allocate >> 20,000,000 Object instances and hold onto the references, the garbage collector >> probably runs about a zillion times and never finds anything to delete. If this >> is a bottleneck, you should temporarily disable the garbage collector in these >> situations, until you are done with all these allocations, then re-enable it. > > Thanks. Disble gc improve the speed by 5x: > > $ dmd -O -release allocd.d > $ time ./allocd > real 0m4.080s > user 0m3.644s > sys 0m0.420s > > > $ cat allocd.d > import core.memory; > > int main() { > core.memory.GC.disable(); > int i, n = 20000000; > Object[] oa; > oa = new Object[n]; > for (i = n; i-- > 0; ) { > oa[i] = new Object(); > } > core.memory.GC.enable(); > > return 0; > } Try: import core.memory; int main() { core.memory.GC.reserve(80000000); core.memory.GC.disable(); int i, n = 20000000; Object[] oa; oa = new Object[n]; for (i = n; i-- > 0; ) { oa[i] = new Object(); } core.memory.GC.enable(); return 0; } If the call to reserve fails you could try breaking it up into two calls for half the space each. | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply