November 26, 2008 Re: Scope storage class [Was: DMD 1.037 and 2.020 releases] | ||||
---|---|---|---|---|
| ||||
Posted in reply to Robert Fraser | On Wed, 26 Nov 2008 12:18:27 -0800, Robert Fraser wrote:
> bearophile wrote:
>> To test the new scoping features of D 2.021 I have used a small stressing program, coming from this page, originally by Knuth:
>>
>> http://www.rosettacode.org/wiki/Man_or_boy_test
>>
>> More info:
>> http://en.wikipedia.org/wiki/Man_or_boy_test
>>
>> My purpose is to see the D2 compiler being able to compute results up to N=25 on my PC (that has 2 GB RAM and a 32 bit operating system).
>>
>> ---------------------------
>>
>> This is the code I have used for DMD 1.037:
>>
>> import std.c.stdio: printf;
>>
>> int a(int k, lazy int x1, lazy int x2, lazy int x3, lazy int x4, lazy
>> int x5) {
>> int delegate() b;
>> b = {
>> k -= 1;
>> return a(k, b(), x1, x2, x3, x4);
>> };
>> return k <= 0 ? x4 + x5 : b();
>> }
>>
>> void main() {
>> printf("%d\n", a(15, 1, -1, -1, 1, 0)); // N is 15
>> }
>>
>> ---------------------------
>>
>> This is the code I have used for DMD 2.021. I was not sure how to use the scope, so if you have improvements please tell me:
>>
>> import std.c.stdio: printf;
>>
>> int a(scope int k, lazy int x1, lazy int x2, lazy int x3, lazy int x4,
>> lazy int x5) {
>> scope int delegate() b;
>> b = {
>> k -= 1;
>> return a(k, b(), x1, x2, x3, x4);
>> };
>> return k <= 0 ? x4 + x5 : b();
>> }
>>
>> void main() {
>> printf("%d\n", a(15, 1, -1, -1, 1, 0)); // N is 15
>> }
>>
>> ---------------------------
>>
>> The results for higher values of N are: k 21 22 23 24 25 A -389695 -865609 -1922362 -4268854 -9479595
>>
>> In both cases I have compiled the code with: dmd -O -release -inline -L/STACK:1700000000 man_or_boy.d
>>
>> The results:
>>
>> DMD V.1.037 (exe: 168_476: bytes):
>> N=24: 788 MB RAM, 3 seconds
>> N=25: 1.57 GB RAM, 6.42 seconds
>>
>> DMD V.2.021 (exe: 99_356 bytes):
>>
>> The code was too much slow in D2, so I have stopped it. Do you know if the D2 code can be improved to have performance similar to D1 ones?
>>
>> Bye,
>> bearophile
>
> Try marking all the "lazy" parameters as "scope" ("lazy" creates
> delegates).
From change log: "The lazy storage class now implies scope so that lazy arguments won't trigger a heap allocated closure."
|
November 26, 2008 Re: Scope storage class | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jarrett Billingsley | Jarrett Billingsley wrote:
> The reason I wonder is because I would expect that the compiler is
> still allocating the delegate on the heap if you use the first syntax.
> (the second is also shorter and clearer.)
There's no reason to suspect. Just obj2asm the output and see. Furthermore, better results will come from:
int b() {
k -= 1;
return a(k, b(), x1, x2, x3, x4);
};
instead of using the delegate.
|
November 26, 2008 Re: Scope storage class | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Wed, Nov 26, 2008 at 4:13 PM, Walter Bright <newshound1@digitalmars.com> wrote: > Jarrett Billingsley wrote: >> >> The reason I wonder is because I would expect that the compiler is >> still allocating the delegate on the heap if you use the first syntax. >> (the second is also shorter and clearer.) > > There's no reason to suspect. Just obj2asm the output and see. That'd be great if I had it. I don't want to get into that, though. > Furthermore, > better results will come from: > > int b() { > k -= 1; > return a(k, b(), x1, x2, x3, x4); > }; > > instead of using the delegate. So my suspicion is correct, then? That is: scope int delegate() b; b = { ... }; Will allocate on the heap? |
November 26, 2008 Re: Scope storage class | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright: > There's no reason to suspect. Just obj2asm the output and see. > Furthermore, better results will come from: > int b() { > k -= 1; > return a(k, b(), x1, x2, x3, x4); > }; > instead of using the delegate. I don't fully understand what's happening, maybe I am doing something wrong, so I just report what I have seen. -------------------- With DMD 1.036 this code works up to N=25, usign almost 1.6 GB of RAM (and I think a running time of just 6.4 seconds is very little, try to do this with any dynamic language, so I was impressed by D1): // code #1 import std.c.stdio: printf; int a(int k, lazy int x1, lazy int x2, lazy int x3, lazy int x4, lazy int x5) { int delegate() b; b = { k -= 1; return a(k, b(), x1, x2, x3, x4); }; return k <= 0 ? x4 + x5 : b(); } void main() { printf("%d\n", a(24, 1, -1, -1, 1, 0)); } -------------------- While on DMD 1.036 the following code: // code #2 import std.c.stdio: printf; int a(int k, lazy int x1, lazy int x2, lazy int x3, lazy int x4, lazy int x5) { int b() { k -= 1; return a(k, b(), x1, x2, x3, x4); } return k <= 0 ? x4 + x5 : b(); } void main() { printf("%d\n", a(24, 1, -1, -1, 1, 0)); } Outputs at compile time: man_or_boy2.d(6): delegate man_or_boy2.a.b.__dgliteral1 is a nested function and cannot be accessed from a man_or_boy2.d(6): delegate man_or_boy2.a.b.__dgliteral2 is a nested function and cannot be accessed from a man_or_boy2.d(6): delegate man_or_boy2.a.b.__dgliteral3 is a nested function and cannot be accessed from a man_or_boy2.d(6): delegate man_or_boy2.a.b.__dgliteral4 is a nested function and cannot be accessed from a man_or_boy2.d(6): delegate man_or_boy2.a.b.__dgliteral5 is a nested function and cannot be accessed from a But this code works on the codepad site, that uses DMD v.1.026 (I think with Tangobos), up to N=21 (then the site goes into timeout for safety): http://codepad.org/8GtKswm8 ------------------------------ DMD v2.021 outputs the same errors with code #2. ------------------------------ DMD is able to run code #1 and the following #3: // code #3 import std.c.stdio: printf; int a(scope int k, lazy int x1, lazy int x2, lazy int x3, lazy int x4, lazy int x5) { scope int delegate() b; b = { k -= 1; return a(k, b(), x1, x2, x3, x4); }; return k <= 0 ? x4 + x5 : b(); } void main() { printf("%d\n", a(22, 1, -1, -1, 1, 0)); } All the following compilations are done with and N=20 and: dmd -O -release -inline -L/STACK:1700000000 man_or_boy.d dmd -O -release -inline -L/STACK:1700000000 man_or_boy2.d Note: in D2 the code #1 and code #3 seem to use the same amount of RAM and time to run, 2.46 s and about 94MB RAM (N=20). The same code #1 in DMD 1.036 runs in about 0.16 s and I am not able to know how much RAM it uses (N=20). ------------------------------ ASM of code #1 compiled with DMD 1.036: _D10man_or_boy1aFiLiLiLiLiLiZi comdat assume CS:_D10man_or_boy1aFiLiLiLiLiLiZi sub ESP,0Ch push EBX mov dword ptr 8[ESP],0 mov dword ptr 0Ch[ESP],0 lea EAX,0Ch[ESP] mov ECX,offset FLAT:_D10man_or_boy1aFiLiLiLiLiLiZi12__dgliteral1MFZi mov 8[ESP],EAX mov 0Ch[ESP],ECX cmp dword ptr 03Ch[ESP],0 jg L56 mov EAX,01Ch[ESP] mov EDX,020h[ESP] mov EBX,01Ch[ESP] call EDX push EAX sub ESP,4 mov EAX,01Ch[ESP] mov EDX,020h[ESP] mov EBX,01Ch[ESP] call EDX mov ECX,EAX add ESP,4 pop EAX add EAX,ECX jmp short L62 L56: mov EAX,8[ESP] mov EDX,ECX mov EBX,8[ESP] call EDX L62: pop EBX add ESP,0Ch ret 02Ch ------------------------------ ASM of code #1 compiled with DMD 2.021: assume CS:_D10man_or_boy1aFiLiLiLiLiLiZi L0: sub ESP,0Ch push EBX push ESI push EDI push 030h call near ptr __d_allocmemory mov ESI,EAX mov dword ptr [ESI],0 mov EAX,048h[ESP] mov 4[ESI],EAX mov EDX,044h[ESP] mov EAX,040h[ESP] mov 010h[ESI],EAX mov 014h[ESI],EDX mov EDX,03Ch[ESP] mov EAX,038h[ESP] mov 018h[ESI],EAX mov 01Ch[ESI],EDX mov EDX,034h[ESP] mov EAX,030h[ESP] mov 020h[ESI],EAX mov 024h[ESI],EDX mov EDX,02Ch[ESP] mov EAX,028h[ESP] mov 028h[ESI],EAX mov 02Ch[ESI],EDX mov dword ptr 8[ESI],0 mov dword ptr 0Ch[ESI],0 mov EBX,ESI mov ECX,offset FLAT:_D10man_or_boy1aFiLiLiLiLiLiZi12__dgliteral1MFZi mov 8[ESI],EBX mov 0Ch[ESI],ECX add ESP,4 cmp dword ptr 4[ESI],0 jg L9F mov EAX,028h[ESI] mov EDX,02Ch[ESI] mov EDI,028h[ESI] call EDX push EAX sub ESP,4 mov EAX,024h[ESP] mov EDX,028h[ESP] mov EDI,024h[ESP] call EDX mov ECX,EAX add ESP,4 pop EAX add EAX,ECX jmp short LA4 L9F: mov EAX,8[ESI] call ECX LA4: pop EDI pop ESI pop EBX add ESP,0Ch ret 02Ch _D10man_or_boy1aFiLiLiLiLiLiZi ends ------------------------------ ASM of code #3 compiled with DMD 2.021: _D11man_or_boy31aFMiLiLiLiLiLiZi comdat assume CS:_D11man_or_boy31aFMiLiLiLiLiLiZi L0: sub ESP,0Ch push EBX push ESI push EDI push 030h call near ptr __d_allocmemory mov ESI,EAX mov dword ptr [ESI],0 mov EAX,048h[ESP] mov 4[ESI],EAX mov EDX,044h[ESP] mov EAX,040h[ESP] mov 010h[ESI],EAX mov 014h[ESI],EDX mov EDX,03Ch[ESP] mov EAX,038h[ESP] mov 018h[ESI],EAX mov 01Ch[ESI],EDX mov EDX,034h[ESP] mov EAX,030h[ESP] mov 020h[ESI],EAX mov 024h[ESI],EDX mov EDX,02Ch[ESP] mov EAX,028h[ESP] mov 028h[ESI],EAX mov 02Ch[ESI],EDX mov dword ptr 8[ESI],0 mov dword ptr 0Ch[ESI],0 mov EBX,ESI mov ECX,offset FLAT:_D11man_or_boy31aFMiLiLiLiLiLiZi12__dgliteral1MFZi mov 8[ESI],EBX mov 0Ch[ESI],ECX add ESP,4 cmp dword ptr 4[ESI],0 jg L9F mov EAX,028h[ESI] mov EDX,02Ch[ESI] mov EDI,028h[ESI] call EDX push EAX sub ESP,4 mov EAX,024h[ESP] mov EDX,028h[ESP] mov EDI,024h[ESP] call EDX mov ECX,EAX add ESP,4 pop EAX add EAX,ECX jmp short LA4 L9F: mov EAX,8[ESI] call ECX LA4: pop EDI pop ESI pop EBX add ESP,0Ch ret 02Ch _D11man_or_boy31aFMiLiLiLiLiLiZi ends ------------------------------ Bye, bearophile |
November 26, 2008 Re: Scope storage class | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jarrett Billingsley | On Wed, 26 Nov 2008 13:24:57 -0500, Jarrett Billingsley <jarrett.billingsley@gmail.com> wrote: > scope int b() { .. } > > The reason I wonder is because I would expect that the compiler is > still allocating the delegate on the heap if you use the first syntax. > (the second is also shorter and clearer.) Just as a point of reference (in D1) scope Object a = new Object(); // Stack Allocated scope Object b; b = new Object(); // Heap Allocated So there may be some merit to scope int b() { .. } vs scope int b(); b = {...} |
November 26, 2008 Re: Scope storage class | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | Wed, 26 Nov 2008 17:02:54 -0500, bearophile wrote: > While on DMD 1.036 the following code: > > // code #2 > import std.c.stdio: printf; > > int a(int k, lazy int x1, lazy int x2, lazy int x3, lazy int x4, lazy int x5) { > int b() { > k -= 1; > return a(k, b(), x1, x2, x3, x4); > } > return k <= 0 ? x4 + x5 : b(); > } > > void main() { > printf("%d\n", a(24, 1, -1, -1, 1, 0)); > } > > Outputs at compile time: > > man_or_boy2.d(6): delegate man_or_boy2.a.b.__dgliteral1 is a nested function and cannot be accessed from a man_or_boy2.d(6): delegate man_or_boy2.a.b.__dgliteral2 is a nested function and cannot be accessed from a man_or_boy2.d(6): delegate man_or_boy2.a.b.__dgliteral3 is a nested function and cannot be accessed from a man_or_boy2.d(6): delegate man_or_boy2.a.b.__dgliteral4 is a nested function and cannot be accessed from a man_or_boy2.d(6): delegate man_or_boy2.a.b.__dgliteral5 is a nested function and cannot be accessed from a > > But this code works on the codepad site, that uses DMD v.1.026 (I think with Tangobos), up to N=21 (then the site goes into timeout for safety): > http://codepad.org/8GtKswm8 [snip] > All the following compilations are done with and N=20 and: dmd -O -release -inline -L/STACK:1700000000 man_or_boy.d dmd -O -release -inline -L/STACK:1700000000 man_or_boy2.d Remove -inline from your compiler options, and #2 compiles and runs faster in both D1 and D2 than #1. lazy seems to do something funny when -inline is in effect. |
November 26, 2008 Re: Scope storage class [Was: DMD 1.037 and 2.020 releases] | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jesse Phillips | Wed, 26 Nov 2008 20:33:35 +0000 (UTC), Jesse Phillips wrote:
> On Wed, 26 Nov 2008 12:18:27 -0800, Robert Fraser wrote:
>> Try marking all the "lazy" parameters as "scope" ("lazy" creates
>> delegates).
>
> From change log: "The lazy storage class now implies scope so that lazy arguments won't trigger a heap allocated closure."
Also, the whole point of this test is delegate propagation. That's what makes it interesting.
|
November 26, 2008 Re: Scope storage class | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | Sergey Gromov: > Remove -inline from your compiler options, and #2 compiles and runs > faster in both D1 and D2 than #1. > lazy seems to do something funny when -inline is in effect. You are right, I have tested it on D1. I think the codepad doesn't use -inline, that's why #2 works there. #2 also uses less RAM on D1, for example N=24 requires about 707 MB instead of 788 MB, and about 2.8 s instead of about 3 s. Sergey Gromov: > Remove -inline from your compiler options, and #2 compiles and runs > faster in both D1 and D2 than #1. > lazy seems to do something funny when -inline is in effect. You are right, I have tested it on D1. I think the codepad doesn't use -inline, that's why #2 works there. #2 also uses less RAM on D1, for example N=24 requires about 710 MB instead of 788 MB, and about 2.8 s instead of about 3 s. This is the Asm of the #2 compiled with D1 with -O -release, it's shorter still (but note there are some other parts that I don't show here): _D11man_or_boy21aFiLiLiLiLiLiZi comdat assume CS:_D11man_or_boy21aFiLiLiLiLiLiZi L0: push EAX push EBX cmp dword ptr 034h[ESP],0 jg L33 mov EAX,014h[ESP] mov EDX,018h[ESP] mov EBX,014h[ESP] call EDX push EAX sub ESP,4 mov EAX,014h[ESP] mov EDX,018h[ESP] mov EBX,014h[ESP] call EDX mov ECX,EAX add ESP,4 pop EAX add EAX,ECX jmp short L3C L33: lea EAX,4[ESP] call near ptr _D11man_or_boy21aFiLiLiLiLiLiZi1bMFZi L3C: pop EBX pop ECX ret 02Ch _D11man_or_boy21aFiLiLiLiLiLiZi ends -------------------------- I have then tested #1 and #2 without -inline on D2, and the results are very different from each other: #1 is very slow and uses lot of memory, while #2 (that contains no scope) acts as D1, using "only" 707 MB with N=24 and working with n=25 too. The asm code is similar the one I have just shown here. So compiling #2 witout -inline in D2 fulfulls my original desire of computing up to N=25 with D2 :-) I presume the -inline uncovers a small bug of DMD, that will be fixed. But what interests me more now is to understand how to write such fast code in general in D2. Bye, bearophile |
November 26, 2008 Re: Scope storage class | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | Wed, 26 Nov 2008 18:19:38 -0500, bearophile wrote: > So compiling #2 witout -inline in D2 fulfulls my original desire of computing up to N=25 with D2 :-) > > I presume the -inline uncovers a small bug of DMD, that will be fixed. I think so, too. > But what interests me more now is to understand how to write such fast code in general in D2. Seems like right now delegates are stack allocated only if they're passed directly as scope parameters. Assigning a delegate to a variable is always considered escaping, even if it's a local scope variable. This is probably because scope for declarations is something completely different than scope for arguments. |
November 27, 2008 Re: Scope storage class | ||||
---|---|---|---|---|
| ||||
Posted in reply to Robert Jacques | 27.11.08 в 01:12 Robert Jacques в своём письме писал(а):
> On Wed, 26 Nov 2008 13:24:57 -0500, Jarrett Billingsley <jarrett.billingsley@gmail.com> wrote:
>> scope int b() { .. }
>>
>> The reason I wonder is because I would expect that the compiler is
>> still allocating the delegate on the heap if you use the first syntax.
>> (the second is also shorter and clearer.)
>
> Just as a point of reference (in D1)
> scope Object a = new Object(); // Stack Allocated
> scope Object b;
> b = new Object(); // Heap Allocated
>
> So there may be some merit to scope int b() { .. } vs scope int b(); b = {...}
If so, then why all the three usages
1) scope Object a;
2) scope Object a = b;
3) scope Object a = new Object();
are allowed when only 3rd one stack-allocates? I believe only third one should be allowed unless scope analisys is implemented (in its very basic form, at least):
scope Object a1 = new Object(); // ok
Object a2 = a1; // not ok
scope Object a3 = a1; // ok;
return a1; // not ok
return a2; // ok
|
Copyright © 1999-2021 by the D Language Foundation