View mode: basic / threaded / horizontal-split · Log in · Help
July 21, 2010
emplace, scope, enforce [Was: Re: Manual...]
Andrei Alexandrescu:

> emplace(), defined in std.conv, is relatively new. I haven't yet added
> emplace() for class objects, and this is as good an opportunity as any:
> http://www.dsource.org/projects/phobos/changeset/1752

Thank you, I have used this, and later I have done few tests too. 

The "scope" for class instantiations can be deprecated once there is an acceptable alternative. You can't deprecate features before you have found a good enough alternative.

---------------------

A first problem is the syntax, to allocate an object on the stack you need something like:

// is testbuf correctly aligned?
ubyte[__traits(classInstanceSize, Test)] testbuf = void;
Test t = emplace!(Test)(cast(void[])testbuf, arg1, arg2);


That is too much worse looking, hairy and error prone than:
scope Test t = new Test(arg1, arg2);


I have tried to build a helper to improve the situation, like something that looks:
Test t = StackAlloc!(Test, arg1, arg2);

But failing that, my second try was this, not good enough:
mixin(stackAlloc!(Test, Test)("t", "arg1, arg2"));

---------------------

A second problem is that this program compiles with no errors:

import std.conv: emplace;

final class Test {
   int x, y;
   this(int xx, int yy) {
       this.x = xx;
       this.y = yy;
   }
}

Test foo(int x, int y) {
   ubyte[__traits(classInstanceSize, Test)] testbuf = void;
   Test t = emplace!(Test)(cast(void[])testbuf, x, y);
   return t;
}

void main() {
   foo(1, 2);
}



While the following one gives:
test.d(13): Error: escaping reference to scope local t


import std.conv: emplace;

final class Test {
   int x, y;
   this(int xx, int yy) {
       this.x = xx;
       this.y = yy;
   }
}

Test foo(int x, int y) {
   scope t = new Test(x, y);
   return t;
}

void main() {
   foo(1, 2);
}


So the compiler is aware that the scoped object can't escape, while using emplace things become more bug-prone. "scope" can cause other bugs, time ago I have filed a bug report about one problem, but it avoids the most common bug. (I am not sure the emplace solves that problem with scope, I think it shares the same problem, plus adds new ones).

---------------------

A third problem is that the ctor doesn't get called:


import std.conv: emplace;
import std.c.stdio: puts;

final class Test {
   this() {
   }
   ~this() { puts("killed"); }
}

void main() {
   ubyte[__traits(classInstanceSize, Test)] testbuf = void;
   Test t = emplace!(Test)(cast(void[])testbuf);
}


That prints nothing. Using scope it gets called (even if it's not present!).

---------------------

This is not a problem of emplace(), it's a problem of the dmd optimizer.
I have done few tests for the performance too. I have used this basic pseudocode:

while (i < Max)
{
  create testObject(i, i, i, i, i, i)
  testObject.doSomething(i, i, i, i, i, i)
  testObject.doSomething(i, i, i, i, i, i)
  testObject.doSomething(i, i, i, i, i, i)
  testObject.doSomething(i, i, i, i, i, i)
  destroy testObject
  i++
}


Coming from here:
http://www.drdobbs.com/java/184401976
And its old timings:
http://www.ddj.com/java/184401976?pgno=9


The Java version of the code is simple:

final class Obj {
   int i1, i2, i3, i4, i5, i6;

   Obj(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
       this.i1 = ii1;
       this.i2 = ii2;
       this.i3 = ii3;
       this.i4 = ii4;
       this.i5 = ii5;
       this.i6 = ii6;
   }

   void doSomething(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
   }
}

class Test {
   public static void main(String args[]) {
       final int N = 100_000_000;
       int i = 0;
       while (i < N) {
           Obj testObject = new Obj(i, i, i, i, i, i);
           testObject.doSomething(i, i, i, i, i, i);
           testObject.doSomething(i, i, i, i, i, i);
           testObject.doSomething(i, i, i, i, i, i);
           testObject.doSomething(i, i, i, i, i, i);
           // testObject = null; // makes no difference
           i++;
       }
   }
}



This is a D version that uses emplace() (if you don't use emplace here the performance of the D code is very bad compared to the Java one):

// program #1
import std.conv: emplace;

final class Test { // 32 bytes each instance
   int i1, i2, i3, i4, i5, i6;
   this(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
       this.i1 = ii1;
       this.i2 = ii2;
       this.i3 = ii3;
       this.i4 = ii4;
       this.i5 = ii5;
       this.i6 = ii6;
   }
   void doSomething(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
   }
}

void main() {
   enum int N = 100_000_000;

   int i;
   while (i < N) {
       ubyte[__traits(classInstanceSize, Test)] buf = void;
       Test testObject = emplace!(Test)(cast(void[])buf, i, i, i, i, i, i);
       // Test testObject = new Test(i, i, i, i, i, i);
       // scope Test testObject = new Test(i, i, i, i, i, i);        
       testObject.doSomething(i, i, i, i, i, i);
       testObject.doSomething(i, i, i, i, i, i);
       testObject.doSomething(i, i, i, i, i, i);
       testObject.doSomething(i, i, i, i, i, i);
       testObject = null;
       i++;
   }
}


The Java code (server) runs in about 0.25 seconds here.
The D code (that doesn't do heap allocations at all) run in about 3.60 seconds.

With a bit of experiments I have seen that emplace() doesn't get inlined, and the cause is it contains enforce(). enforce contains a throw, and it seems dmd doesn't inline functions that can throw, you can test it with a little test program like this:


import std.c.stdlib: atoi;
void foo(int b) {
   if (b)
       throw new Throwable(null);
}
void main() {
   int b = atoi("0");
   foo(b);
}


So if you comment out the two enforce() inside emplace() dmd inlines emplace() and the running time becomes about 2.30 seconds, less than ten times slower than Java.

If emplace() doesn't contain calls to enforce() then the loop in main() becomes (dmd 2.047, optmized build):


L1A:		push	dword ptr 02Ch[ESP]
		mov	EDX,_D10test6_good4Test7__ClassZ[0Ch]
		mov	EAX,_D10test6_good4Test7__ClassZ[08h]
		push	EDX
		push	ESI
		call	near ptr _memcpy
		mov	ECX,03Ch[ESP]
		mov	8[ECX],EBX
		mov	0Ch[ECX],EBX
		mov	010h[ECX],EBX
		mov	014h[ECX],EBX
		mov	018h[ECX],EBX
		mov	01Ch[ECX],EBX
		inc	EBX
		add	ESP,0Ch
		cmp	EBX,05F5E100h
		jb	L1A


(The memcpy is done by emplace to initialize the object before calling its ctor. You must perform the initialization because it needs the pointer to the virtual table and monitor. The monitor here was null. I think a future LDC2 can optimize away more stuff in that loop, so it's not so bad).


If you use this in program #1:
scope Test testObject = new Test(i, i, i, i, i, i);
It runs in about 6 seconds (also because the ctor is called even if's missing).

If in program #1 you use just new, without scope, the runtime is about 27.2 seconds, about 110 times slower than Java.

Bye,
bearophile
July 21, 2010
Re: emplace, scope, enforce [Was: Re: Manual...]
On Wed, 21 Jul 2010 03:58:33 +0200, bearophile <bearophileHUGS@lycos.com>  
wrote:

> Andrei Alexandrescu:
>
>> emplace(), defined in std.conv, is relatively new. I haven't yet added
>> emplace() for class objects, and this is as good an opportunity as any:
>> http://www.dsource.org/projects/phobos/changeset/1752
>
> Thank you, I have used this, and later I have done few tests too.
>
> The "scope" for class instantiations can be deprecated once there is an  
> acceptable alternative. You can't deprecate features before you have  
> found a good enough alternative.
>
> ---------------------
>
> A first problem is the syntax, to allocate an object on the stack you  
> need something like:
>
> // is testbuf correctly aligned?
> ubyte[__traits(classInstanceSize, Test)] testbuf = void;
> Test t = emplace!(Test)(cast(void[])testbuf, arg1, arg2);
>
>
> That is too much worse looking, hairy and error prone than:
> scope Test t = new Test(arg1, arg2);
>
>
> I have tried to build a helper to improve the situation, like something  
> that looks:
> Test t = StackAlloc!(Test, arg1, arg2);
>
> But failing that, my second try was this, not good enough:
> mixin(stackAlloc!(Test, Test)("t", "arg1, arg2"));
>
> ---------------------
>
> A second problem is that this program compiles with no errors:
>
> import std.conv: emplace;
>
> final class Test {
>     int x, y;
>     this(int xx, int yy) {
>         this.x = xx;
>         this.y = yy;
>     }
> }
>
> Test foo(int x, int y) {
>     ubyte[__traits(classInstanceSize, Test)] testbuf = void;
>     Test t = emplace!(Test)(cast(void[])testbuf, x, y);
>     return t;
> }
>
> void main() {
>     foo(1, 2);
> }
>
>
>
> While the following one gives:
> test.d(13): Error: escaping reference to scope local t
>
>
> import std.conv: emplace;
>
> final class Test {
>     int x, y;
>     this(int xx, int yy) {
>         this.x = xx;
>         this.y = yy;
>     }
> }
>
> Test foo(int x, int y) {
>     scope t = new Test(x, y);
>     return t;
> }
>
> void main() {
>     foo(1, 2);
> }
>
>
> So the compiler is aware that the scoped object can't escape, while  
> using emplace things become more bug-prone. "scope" can cause other  
> bugs, time ago I have filed a bug report about one problem, but it  
> avoids the most common bug. (I am not sure the emplace solves that  
> problem with scope, I think it shares the same problem, plus adds new  
> ones).
>
> ---------------------
>
> A third problem is that the ctor doesn't get called:
>
>
> import std.conv: emplace;
> import std.c.stdio: puts;
>
> final class Test {
>     this() {
>     }
>     ~this() { puts("killed"); }
> }
>
> void main() {
>     ubyte[__traits(classInstanceSize, Test)] testbuf = void;
>     Test t = emplace!(Test)(cast(void[])testbuf);
> }
>
>
> That prints nothing. Using scope it gets called (even if it's not  
> present!).
>
> ---------------------
>
> This is not a problem of emplace(), it's a problem of the dmd optimizer.
> I have done few tests for the performance too. I have used this basic  
> pseudocode:
>
> while (i < Max)
> {
>    create testObject(i, i, i, i, i, i)
>    testObject.doSomething(i, i, i, i, i, i)
>    testObject.doSomething(i, i, i, i, i, i)
>    testObject.doSomething(i, i, i, i, i, i)
>    testObject.doSomething(i, i, i, i, i, i)
>    destroy testObject
>    i++
> }
>
>
> Coming from here:
> http://www.drdobbs.com/java/184401976
> And its old timings:
> http://www.ddj.com/java/184401976?pgno=9
>
>
> The Java version of the code is simple:
>
> final class Obj {
>     int i1, i2, i3, i4, i5, i6;
>
>     Obj(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
>         this.i1 = ii1;
>         this.i2 = ii2;
>         this.i3 = ii3;
>         this.i4 = ii4;
>         this.i5 = ii5;
>         this.i6 = ii6;
>     }
>
>     void doSomething(int ii1, int ii2, int ii3, int ii4, int ii5, int  
> ii6) {
>     }
> }
>
> class Test {
>     public static void main(String args[]) {
>         final int N = 100_000_000;
>         int i = 0;
>         while (i < N) {
>             Obj testObject = new Obj(i, i, i, i, i, i);
>             testObject.doSomething(i, i, i, i, i, i);
>             testObject.doSomething(i, i, i, i, i, i);
>             testObject.doSomething(i, i, i, i, i, i);
>             testObject.doSomething(i, i, i, i, i, i);
>             // testObject = null; // makes no difference
>             i++;
>         }
>     }
> }
>
>
>
> This is a D version that uses emplace() (if you don't use emplace here  
> the performance of the D code is very bad compared to the Java one):
>
> // program #1
> import std.conv: emplace;
>
> final class Test { // 32 bytes each instance
>     int i1, i2, i3, i4, i5, i6;
>     this(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
>         this.i1 = ii1;
>         this.i2 = ii2;
>         this.i3 = ii3;
>         this.i4 = ii4;
>         this.i5 = ii5;
>         this.i6 = ii6;
>     }
>     void doSomething(int ii1, int ii2, int ii3, int ii4, int ii5, int  
> ii6) {
>     }
> }
>
> void main() {
>     enum int N = 100_000_000;
>
>     int i;
>     while (i < N) {
>         ubyte[__traits(classInstanceSize, Test)] buf = void;
>         Test testObject = emplace!(Test)(cast(void[])buf, i, i, i, i, i,  
> i);
>         // Test testObject = new Test(i, i, i, i, i, i);
>         // scope Test testObject = new Test(i, i, i, i, i, i);
>         testObject.doSomething(i, i, i, i, i, i);
>         testObject.doSomething(i, i, i, i, i, i);
>         testObject.doSomething(i, i, i, i, i, i);
>         testObject.doSomething(i, i, i, i, i, i);
>         testObject = null;
>         i++;
>     }
> }
>
>
> The Java code (server) runs in about 0.25 seconds here.
> The D code (that doesn't do heap allocations at all) run in about 3.60  
> seconds.
>
> With a bit of experiments I have seen that emplace() doesn't get  
> inlined, and the cause is it contains enforce(). enforce contains a  
> throw, and it seems dmd doesn't inline functions that can throw, you can  
> test it with a little test program like this:
>
>
> import std.c.stdlib: atoi;
> void foo(int b) {
>     if (b)
>         throw new Throwable(null);
> }
> void main() {
>     int b = atoi("0");
>     foo(b);
> }
>
>
> So if you comment out the two enforce() inside emplace() dmd inlines  
> emplace() and the running time becomes about 2.30 seconds, less than ten  
> times slower than Java.
>
> If emplace() doesn't contain calls to enforce() then the loop in main()  
> becomes (dmd 2.047, optmized build):
>
>
> L1A:		push	dword ptr 02Ch[ESP]
> 		mov	EDX,_D10test6_good4Test7__ClassZ[0Ch]
> 		mov	EAX,_D10test6_good4Test7__ClassZ[08h]
> 		push	EDX
> 		push	ESI
> 		call	near ptr _memcpy
> 		mov	ECX,03Ch[ESP]
> 		mov	8[ECX],EBX
> 		mov	0Ch[ECX],EBX
> 		mov	010h[ECX],EBX
> 		mov	014h[ECX],EBX
> 		mov	018h[ECX],EBX
> 		mov	01Ch[ECX],EBX
> 		inc	EBX
> 		add	ESP,0Ch
> 		cmp	EBX,05F5E100h
> 		jb	L1A
>
>
> (The memcpy is done by emplace to initialize the object before calling  
> its ctor. You must perform the initialization because it needs the  
> pointer to the virtual table and monitor. The monitor here was null. I  
> think a future LDC2 can optimize away more stuff in that loop, so it's  
> not so bad).
>
>
> If you use this in program #1:
> scope Test testObject = new Test(i, i, i, i, i, i);
> It runs in about 6 seconds (also because the ctor is called even if's  
> missing).
>
> If in program #1 you use just new, without scope, the runtime is about  
> 27.2 seconds, about 110 times slower than Java.
>
> Bye,
> bearophile

Takes 18m27.720s in PHP :)
July 21, 2010
Re: emplace, scope, enforce [Was: Re: Manual...]
Rory McGuire wrote:

> On Wed, 21 Jul 2010 03:58:33 +0200, bearophile 
<bearophileHUGS@lycos.com>
> wrote:
> 
>> Andrei Alexandrescu:
>>
>>> emplace(), defined in std.conv, is relatively new. I haven't yet 
added
>>> emplace() for class objects, and this is as good an opportunity as 
any:
>>> http://www.dsource.org/projects/phobos/changeset/1752
>>
>> Thank you, I have used this, and later I have done few tests too.
>>
>> The "scope" for class instantiations can be deprecated once there 
is an
>> acceptable alternative. You can't deprecate features before you 
have
>> found a good enough alternative.
>>
>> ---------------------
>>
>> A first problem is the syntax, to allocate an object on the stack 
you
>> need something like:
>>
>> // is testbuf correctly aligned?
>> ubyte[__traits(classInstanceSize, Test)] testbuf = void;
>> Test t = emplace!(Test)(cast(void[])testbuf, arg1, arg2);
>>
>>
>> That is too much worse looking, hairy and error prone than:
>> scope Test t = new Test(arg1, arg2);
>>
>>
>> I have tried to build a helper to improve the situation, like 
something
>> that looks:
>> Test t = StackAlloc!(Test, arg1, arg2);
>>
>> But failing that, my second try was this, not good enough:
>> mixin(stackAlloc!(Test, Test)("t", "arg1, arg2"));
>>
>> ---------------------
>>
>> A second problem is that this program compiles with no errors:
>>
>> import std.conv: emplace;
>>
>> final class Test {
>>     int x, y;
>>     this(int xx, int yy) {
>>         this.x = xx;
>>         this.y = yy;
>>     }
>> }
>>
>> Test foo(int x, int y) {
>>     ubyte[__traits(classInstanceSize, Test)] testbuf = void;
>>     Test t = emplace!(Test)(cast(void[])testbuf, x, y);
>>     return t;
>> }
>>
>> void main() {
>>     foo(1, 2);
>> }
>>
>>
>>
>> While the following one gives:
>> test.d(13): Error: escaping reference to scope local t
>>
>>
>> import std.conv: emplace;
>>
>> final class Test {
>>     int x, y;
>>     this(int xx, int yy) {
>>         this.x = xx;
>>         this.y = yy;
>>     }
>> }
>>
>> Test foo(int x, int y) {
>>     scope t = new Test(x, y);
>>     return t;
>> }
>>
>> void main() {
>>     foo(1, 2);
>> }
>>
>>
>> So the compiler is aware that the scoped object can't escape, while
>> using emplace things become more bug-prone. "scope" can cause other
>> bugs, time ago I have filed a bug report about one problem, but it
>> avoids the most common bug. (I am not sure the emplace solves that
>> problem with scope, I think it shares the same problem, plus adds 
new
>> ones).
>>
>> ---------------------
>>
>> A third problem is that the ctor doesn't get called:
>>
>>
>> import std.conv: emplace;
>> import std.c.stdio: puts;
>>
>> final class Test {
>>     this() {
>>     }
>>     ~this() { puts("killed"); }
>> }
>>
>> void main() {
>>     ubyte[__traits(classInstanceSize, Test)] testbuf = void;
>>     Test t = emplace!(Test)(cast(void[])testbuf);
>> }
>>
>>
>> That prints nothing. Using scope it gets called (even if it's not
>> present!).
>>
>> ---------------------
>>
>> This is not a problem of emplace(), it's a problem of the dmd 
optimizer.
>> I have done few tests for the performance too. I have used this 
basic
>> pseudocode:
>>
>> while (i < Max)
>> {
>>    create testObject(i, i, i, i, i, i)
>>    testObject.doSomething(i, i, i, i, i, i)
>>    testObject.doSomething(i, i, i, i, i, i)
>>    testObject.doSomething(i, i, i, i, i, i)
>>    testObject.doSomething(i, i, i, i, i, i)
>>    destroy testObject
>>    i++
>> }
>>
>>
>> Coming from here:
>> http://www.drdobbs.com/java/184401976
>> And its old timings:
>> http://www.ddj.com/java/184401976?pgno=9
>>
>>
>> The Java version of the code is simple:
>>
>> final class Obj {
>>     int i1, i2, i3, i4, i5, i6;
>>
>>     Obj(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
>>         this.i1 = ii1;
>>         this.i2 = ii2;
>>         this.i3 = ii3;
>>         this.i4 = ii4;
>>         this.i5 = ii5;
>>         this.i6 = ii6;
>>     }
>>
>>     void doSomething(int ii1, int ii2, int ii3, int ii4, int ii5, 
int
>> ii6) {
>>     }
>> }
>>
>> class Test {
>>     public static void main(String args[]) {
>>         final int N = 100_000_000;
>>         int i = 0;
>>         while (i < N) {
>>             Obj testObject = new Obj(i, i, i, i, i, i);
>>             testObject.doSomething(i, i, i, i, i, i);
>>             testObject.doSomething(i, i, i, i, i, i);
>>             testObject.doSomething(i, i, i, i, i, i);
>>             testObject.doSomething(i, i, i, i, i, i);
>>             // testObject = null; // makes no difference
>>             i++;
>>         }
>>     }
>> }
>>
>>
>>
>> This is a D version that uses emplace() (if you don't use emplace 
here
>> the performance of the D code is very bad compared to the Java 
one):
>>
>> // program #1
>> import std.conv: emplace;
>>
>> final class Test { // 32 bytes each instance
>>     int i1, i2, i3, i4, i5, i6;
>>     this(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
>>         this.i1 = ii1;
>>         this.i2 = ii2;
>>         this.i3 = ii3;
>>         this.i4 = ii4;
>>         this.i5 = ii5;
>>         this.i6 = ii6;
>>     }
>>     void doSomething(int ii1, int ii2, int ii3, int ii4, int ii5, 
int
>> ii6) {
>>     }
>> }
>>
>> void main() {
>>     enum int N = 100_000_000;
>>
>>     int i;
>>     while (i < N) {
>>         ubyte[__traits(classInstanceSize, Test)] buf = void;
>>         Test testObject = emplace!(Test)(cast(void[])buf, i, i, i, 
i, i,
>> i);
>>         // Test testObject = new Test(i, i, i, i, i, i);
>>         // scope Test testObject = new Test(i, i, i, i, i, i);
>>         testObject.doSomething(i, i, i, i, i, i);
>>         testObject.doSomething(i, i, i, i, i, i);
>>         testObject.doSomething(i, i, i, i, i, i);
>>         testObject.doSomething(i, i, i, i, i, i);
>>         testObject = null;
>>         i++;
>>     }
>> }
>>
>>
>> The Java code (server) runs in about 0.25 seconds here.
>> The D code (that doesn't do heap allocations at all) run in about 
3.60
>> seconds.
>>
>> With a bit of experiments I have seen that emplace() doesn't get
>> inlined, and the cause is it contains enforce(). enforce contains a
>> throw, and it seems dmd doesn't inline functions that can throw, 
you can
>> test it with a little test program like this:
>>
>>
>> import std.c.stdlib: atoi;
>> void foo(int b) {
>>     if (b)
>>         throw new Throwable(null);
>> }
>> void main() {
>>     int b = atoi("0");
>>     foo(b);
>> }
>>
>>
>> So if you comment out the two enforce() inside emplace() dmd 
inlines
>> emplace() and the running time becomes about 2.30 seconds, less 
than ten
>> times slower than Java.
>>
>> If emplace() doesn't contain calls to enforce() then the loop in 
main()
>> becomes (dmd 2.047, optmized build):
>>
>>
>> L1A:		push	dword ptr 02Ch[ESP]
>> mov	EDX,_D10test6_good4Test7__ClassZ[0Ch]
>> mov	EAX,_D10test6_good4Test7__ClassZ[08h]
>> push	EDX
>> push	ESI
>> call	near ptr _memcpy
>> mov	ECX,03Ch[ESP]
>> mov	8[ECX],EBX
>> mov	0Ch[ECX],EBX
>> mov	010h[ECX],EBX
>> mov	014h[ECX],EBX
>> mov	018h[ECX],EBX
>> mov	01Ch[ECX],EBX
>> inc	EBX
>> add	ESP,0Ch
>> cmp	EBX,05F5E100h
>> jb	L1A
>>
>>
>> (The memcpy is done by emplace to initialize the object before 
calling
>> its ctor. You must perform the initialization because it needs the
>> pointer to the virtual table and monitor. The monitor here was 
null. I
>> think a future LDC2 can optimize away more stuff in that loop, so 
it's
>> not so bad).
>>
>>
>> If you use this in program #1:
>> scope Test testObject = new Test(i, i, i, i, i, i);
>> It runs in about 6 seconds (also because the ctor is called even 
if's
>> missing).
>>
>> If in program #1 you use just new, without scope, the runtime is 
about
>> 27.2 seconds, about 110 times slower than Java.
>>
>> Bye,
>> bearophile
> 
> Takes 18m27.720s in PHP :)

Takes 5m26.776s in Python.
Takes 0m1.008s in Java.

can't test D version I don't have emplace and dsource is ignoring me.
July 21, 2010
Re: emplace, scope, enforce [Was: Re: Manual...]
Rory McGuire:
> Takes 18m27.720s in PHP :)

You have lot of patience :-)


> can't test D version I don't have emplace and dsource is ignoring me.

This was Andrei's code before dsource went down (ddoc and unittest removed):

T emplace(T, Args...)(void[] chunk, Args args) if (is(T == class)) {
   enforce(chunk.length >= __traits(classInstanceSize, T));
   auto a = cast(size_t) chunk.ptr;
   enforce(a % real.alignof == 0);
   auto result = cast(typeof(return)) chunk.ptr;

   // Initialize the object in its pre-ctor state
   (cast(byte[]) chunk)[] = typeid(T).init[];

   // Call the ctor if any
   static if (is(typeof(result.__ctor(args))))
   {
       // T defines a genuine constructor accepting args
       // Go the classic route: write .init first, then call ctor
       result.__ctor(args);
   }
   else
   {
       static assert(args.length == 0 && !is(typeof(&T.__ctor)),
               "Don't know how to initialize an object of type "
               ~ T.stringof ~ " with arguments " ~ Args.stringof);
   }
   return result;
}

Bye,
bearophile
July 21, 2010
Re: emplace, scope, enforce [Was: Re: Manual...]
On 21.07.2010 5:58, bearophile wrote:
> Andrei Alexandrescu:
>
>    
>> emplace(), defined in std.conv, is relatively new. I haven't yet added
>> emplace() for class objects, and this is as good an opportunity as any:
>> http://www.dsource.org/projects/phobos/changeset/1752
>>      
> Thank you, I have used this, and later I have done few tests too.
>
> The "scope" for class instantiations can be deprecated once there is an acceptable alternative. You can't deprecate features before you have found a good enough alternative.
>
> ---------------------
>
> A first problem is the syntax, to allocate an object on the stack you need something like:
>
> // is testbuf correctly aligned?
> ubyte[__traits(classInstanceSize, Test)] testbuf = void;
> Test t = emplace!(Test)(cast(void[])testbuf, arg1, arg2);
>
>
>    
> That is too much worse looking, hairy and error prone than:
> scope Test t = new Test(arg1, arg2);
>
>
> I have tried to build a helper to improve the situation, like something that looks:
> Test t = StackAlloc!(Test, arg1, arg2);
>    
Well, I'm using this for structs, very straightforward:

T* create(T, Args...)(Args args)
if ( !is(T == class) ){
    return emplace!T(malloc(T.sizeof)[0..T.sizeof], args);
}

void destroy(T)(T* ptr) if ( !is(T == class) ){
    assert(ptr);
    clear(ptr);
    free(ptr);
}
//then
auto a =  create!T(params);

I guess one could easily patch it for classes.
> But failing that, my second try was this, not good enough:
> mixin(stackAlloc!(Test, Test)("t", "arg1, arg2"));
>
> ---------------------
>
> A second problem is that this program compiles with no errors:
>
> import std.conv: emplace;
>
> final class Test {
>      int x, y;
>      this(int xx, int yy) {
>          this.x = xx;
>          this.y = yy;
>      }
> }
>
> Test foo(int x, int y) {
>      ubyte[__traits(classInstanceSize, Test)] testbuf = void;
>      Test t = emplace!(Test)(cast(void[])testbuf, x, y);
>      return t;
> }
>
> void main() {
>      foo(1, 2);
> }
>    
This is just a pitfall of any stack allocation, and emplace is, in fact, 
about custom allocation, not scoped variables.
>
> While the following one gives:
> test.d(13): Error: escaping reference to scope local t
>
>
> import std.conv: emplace;
>
> final class Test {
>      int x, y;
>      this(int xx, int yy) {
>          this.x = xx;
>          this.y = yy;
>      }
> }
>
> Test foo(int x, int y) {
>      scope t = new Test(x, y);
>      return t;
> }
>
> void main() {
>      foo(1, 2);
> }
>
>
> So the compiler is aware that the scoped object can't escape, while using emplace things become more bug-prone. "scope" can cause other bugs, time ago I have filed a bug report about one problem, but it avoids the most common bug. (I am not sure the emplace solves that problem with scope, I think it shares the same problem, plus adds new ones).
>
> ---------------------
>
> A third problem is that the ctor doesn't get called:
>
>
> import std.conv: emplace;
> import std.c.stdio: puts;
>
> final class Test {
>      this() {
>      }
>      ~this() { puts("killed"); }
> }
>
> void main() {
>      ubyte[__traits(classInstanceSize, Test)] testbuf = void;
>      Test t = emplace!(Test)(cast(void[])testbuf);
> }
>    
This is dtor not get called, and it's because emplace is a library 
replacement for placement new( no pun).
Sure enough with manual memory management you need to call clear(t) at exit.

> That prints nothing. Using scope it gets called (even if it's not present!).
>
> ---------------------
>
> This is not a problem of emplace(), it's a problem of the dmd optimizer.
> I have done few tests for the performance too. I have used this basic pseudocode:
>
> while (i<  Max)
> {
>     create testObject(i, i, i, i, i, i)
>     testObject.doSomething(i, i, i, i, i, i)
>     testObject.doSomething(i, i, i, i, i, i)
>     testObject.doSomething(i, i, i, i, i, i)
>     testObject.doSomething(i, i, i, i, i, i)
>     destroy testObject
>     i++
> }
>
>
> Coming from here:
> http://www.drdobbs.com/java/184401976
> And its old timings:
> http://www.ddj.com/java/184401976?pgno=9
>
>
> The Java version of the code is simple:
>
> final class Obj {
>      int i1, i2, i3, i4, i5, i6;
>
>      Obj(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
>          this.i1 = ii1;
>          this.i2 = ii2;
>          this.i3 = ii3;
>          this.i4 = ii4;
>          this.i5 = ii5;
>          this.i6 = ii6;
>      }
>
>      void doSomething(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
>      }
> }
>
> class Test {
>      public static void main(String args[]) {
>          final int N = 100_000_000;
>          int i = 0;
>          while (i<  N) {
>              Obj testObject = new Obj(i, i, i, i, i, i);
>              testObject.doSomething(i, i, i, i, i, i);
>              testObject.doSomething(i, i, i, i, i, i);
>              testObject.doSomething(i, i, i, i, i, i);
>              testObject.doSomething(i, i, i, i, i, i);
>              // testObject = null; // makes no difference
>              i++;
>          }
>      }
> }
>
>
>
> This is a D version that uses emplace() (if you don't use emplace here the performance of the D code is very bad compared to the Java one):
>
> // program #1
> import std.conv: emplace;
>
> final class Test { // 32 bytes each instance
>      int i1, i2, i3, i4, i5, i6;
>      this(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
>          this.i1 = ii1;
>          this.i2 = ii2;
>          this.i3 = ii3;
>          this.i4 = ii4;
>          this.i5 = ii5;
>          this.i6 = ii6;
>      }
>      void doSomething(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
>      }
> }
>
> void main() {
>      enum int N = 100_000_000;
>
>      int i;
>      while (i<  N) {
>          ubyte[__traits(classInstanceSize, Test)] buf = void;
>          Test testObject = emplace!(Test)(cast(void[])buf, i, i, i, i, i, i);
>          // Test testObject = new Test(i, i, i, i, i, i);
>          // scope Test testObject = new Test(i, i, i, i, i, i);
>          testObject.doSomething(i, i, i, i, i, i);
>          testObject.doSomething(i, i, i, i, i, i);
>          testObject.doSomething(i, i, i, i, i, i);
>          testObject.doSomething(i, i, i, i, i, i);
>          testObject = null;
>          i++;
>      }
> }
>
>
> The Java code (server) runs in about 0.25 seconds here.
> The D code (that doesn't do heap allocations at all) run in about 3.60 seconds.
>
> With a bit of experiments I have seen that emplace() doesn't get inlined, and the cause is it contains enforce(). enforce contains a throw, and it seems dmd doesn't inline functions that can throw, you can test it with a little test program like this:
>
>
> import std.c.stdlib: atoi;
> void foo(int b) {
>      if (b)
>          throw new Throwable(null);
> }
> void main() {
>      int b = atoi("0");
>      foo(b);
> }
>
>
> So if you comment out the two enforce() inside emplace() dmd inlines emplace() and the running time becomes about 2.30 seconds, less than ten times slower than Java.
>
> If emplace() doesn't contain calls to enforce() then the loop in main() becomes (dmd 2.047, optmized build):
>
>
> L1A:		push	dword ptr 02Ch[ESP]
> 		mov	EDX,_D10test6_good4Test7__ClassZ[0Ch]
> 		mov	EAX,_D10test6_good4Test7__ClassZ[08h]
> 		push	EDX
> 		push	ESI
> 		call	near ptr _memcpy
> 		mov	ECX,03Ch[ESP]
> 		mov	8[ECX],EBX
> 		mov	0Ch[ECX],EBX
> 		mov	010h[ECX],EBX
> 		mov	014h[ECX],EBX
> 		mov	018h[ECX],EBX
> 		mov	01Ch[ECX],EBX
> 		inc	EBX
> 		add	ESP,0Ch
> 		cmp	EBX,05F5E100h
> 		jb	L1A
>
>
> (The memcpy is done by emplace to initialize the object before calling its ctor. You must perform the initialization because it needs the pointer to the virtual table and monitor. The monitor here was null. I think a future LDC2 can optimize away more stuff in that loop, so it's not so bad).
>
>
> If you use this in program #1:
> scope Test testObject = new Test(i, i, i, i, i, i);
> It runs in about 6 seconds (also because the ctor is called even if's missing).
>
> If in program #1 you use just new, without scope, the runtime is about 27.2 seconds, about 110 times slower than Java.
>
> Bye,
> bearophile
>    


-- 
Dmitry Olshansky
July 21, 2010
Re: emplace, scope, enforce [Was: Re: Manual...]
Rory Mcguire:
> Takes 5m26.776s in Python.
> Takes 0m1.008s in Java.

(I suggest you to round away milliseconds, they are never significant in such benchmarks.)
Python 2.7 uses its GC a bit better, so it can be a bit faster.
Your Java code has run four times slower than my slow PC, that's a lot. In Java have you used the -server switch?

Bye,
bearophile
July 21, 2010
Re: emplace, scope, enforce [Was: Re: Manual...]
Dmitry Olshansky:
> Well, I'm using this for structs, very straightforward:
> 
> T* create(T, Args...)(Args args)
> if ( !is(T == class) ){
>      return emplace!T(malloc(T.sizeof)[0..T.sizeof], args);
> }

That's not good enough, you are allocating on the (C) heap. If you use that in the D benchmark I have shown you probably can get bad timing results.


> This is dtor not get called, and it's because emplace is a library 
> replacement for placement new( no pun).
> Sure enough with manual memory management you need to call clear(t) at exit.

If the class is allocated on the stack it's much better if the destructor is called when the class gets out of scope. Otherwise it's like C programming.
(I suggest to edit your post, to remove useless parts of the original post.)

Bye,
bearophile
July 21, 2010
Re: emplace, scope, enforce [Was: Re: Manual...]
bearophile wrote:

> Rory Mcguire:
>> Takes 5m26.776s in Python.
>> Takes 0m1.008s in Java.
> 
> (I suggest you to round away milliseconds, they are never significant in
> such benchmarks.) Python 2.7 uses its GC a bit better, so it can be a bit
> faster. Your Java code has run four times slower than my slow PC, that's a
> lot. In Java have you used the -server switch?
> 
> Bye,
> bearophile

On ubuntu 10.04 64 I'm using `time` to get the timing.
I wan't using -server, with it I get 0m1.047s.

D version gets 0m8.162s using a 32bit chroot environment.

Processor is a core i7 @ (1.6Ghz * 8). 6GB ram.

Interesting thing about the python one is it used 3GB of ram most of the 
time.
July 21, 2010
Re: emplace, scope, enforce [Was: Re: Manual...]
Rory Mcguire wrote:

> bearophile wrote:
> 
>> Rory Mcguire:
>>> Takes 5m26.776s in Python.
>>> Takes 0m1.008s in Java.
>> 
>> (I suggest you to round away milliseconds, they are never significant in
>> such benchmarks.) Python 2.7 uses its GC a bit better, so it can be a bit
>> faster. Your Java code has run four times slower than my slow PC, that's
>> a lot. In Java have you used the -server switch?
>> 
>> Bye,
>> bearophile
> 
> On ubuntu 10.04 64 I'm using `time` to get the timing.
> I wan't using -server, with it I get 0m1.047s.
> 
> D version gets 0m8.162s using a 32bit chroot environment.
> 
> Processor is a core i7 @ (1.6Ghz * 8). 6GB ram.
> 
> Interesting thing about the python one is it used 3GB of ram most of the
> time.

Perhaps the slow times are because I'm reporting the real timings not the 
user/sys time.
July 21, 2010
Re: emplace, scope, enforce [Was: Re: Manual...]
On 21.07.2010 14:20, bearophile wrote:
> Dmitry Olshansky:
>    
>> Well, I'm using this for structs, very straightforward:
>>
>> T* create(T, Args...)(Args args)
>> if ( !is(T == class) ){
>>       return emplace!T(malloc(T.sizeof)[0..T.sizeof], args);
>> }
>>      
> That's not good enough, you are allocating on the (C) heap. If you use that in the D benchmark I have shown you probably can get bad timing results.
>
>    
Uh, yes I guess I should have read your post to the end :). Stack 
allocation is risky business to say least. Some kind of memory pool 
should be handy.
>> This is dtor not get called, and it's because emplace is a library
>> replacement for placement new( no pun).
>> Sure enough with manual memory management you need to call clear(t) at exit.
>>      
> If the class is allocated on the stack it's much better if the destructor is called when the class gets out of scope. Otherwise it's like C programming.
> (I suggest to edit your post, to remove useless parts of the original post.)
>
>    
To that end one should prefer vanilla structs with destructor, not final 
classes and scope. That's, of course, losing the inheritance and such.
The problem is designing such classes and then documenting: "you should 
always use it as 'scope' ", is awkward.
Moreover, the function which you pass the stack allocated class instance 
are unaware of that clever trick.

Slight modification of your benchmark:

import std.conv: emplace;
import std.contracts;

import std.stdio;
final class Test { // 32 bytes each instance
    int i1, i2, i3, i4, i5, i6;
    this(int ii1, int ii2, int ii3, int ii4, int ii5, int ii6) {
        this.i1 = ii1;
        this.i2 = ii2;
        this.i3 = ii3;
        this.i4 = ii4;
        this.i5 = ii5;
        this.i6 = ii6;
    }
    void doSomething(int ii1, int ii2, int ii3, int ii4, int ii5, int 
ii6) {
    }
}

Test hidden;

void fun(Test t){
    hidden = t;
}

void bench(){
    enum int N = 10_000_000;
    int i;

    while (i < N) {
        scope Test testObject = new Test(i, i, i, i, i, i);
        fun(testObject);
        testObject.doSomething(i, i, i, i, i, i);
        testObject.doSomething(i, i, i, i, i, i);
        testObject.doSomething(i, i, i, i, i, i);
        testObject.doSomething(i, i, i, i, i, i);
        i++;
    }

}

void main() {
    int a,b,c;//
    bench();
//what's the hidden now?
    writefln("%d %d", hidden.i1,hidden.i2);
    writefln("%d %d", hidden.i1,hidden.i2);
}

The second writefln prints garbage. I guess it's because of pointer to 
the long gone stackframe, which is ovewritten by the first writeln.

-- 
Dmitry Olshansky
« First   ‹ Prev
1 2 3
Top | Discussion index | About this forum | D home