August 12, 2004 Re: D stack allocation | ||||
---|---|---|---|---|
| ||||
Posted in reply to van eeshan | On Wed, 11 Aug 2004 15:21:50 -0700, van eeshan <vanee@hotmail.net> wrote: > "Sean Kelly s" <sean@f4.ca> wrote in message > news:cfe4hu$19cf$1@digitaldaemon.com... >> In article <cfdoer$13pt$1@digitaldaemon.com>, van eeshan says... >> > >> >1) making claims about D's perfomance superiority over Java due to the >> >former's ability to use the stack (versus always on the heap) is based >> on >> >speculation; not fact. Testing shows this to be entirely the opposite >> for >> >arrays beyond 64 bytes in length. Similarly, any kind of performance > testing >> >against C/C++ within this realm will be viewed as a negative mark >> against > D. >> >> Regan, if I remember correctly, reported a bug where executable size is >> proportional to the size of static arrays the application contains. This >> performance issue sounds like it may be related. > > AFAIK this is not related in the slightest, I'm afraid. This topic is purely > about CPU cycles, and the stack. Nothing to do with static data or > executable size. I'm not sure it was me who reported it initially, Arcane Jill was discussing it more recently.. I think what Sean was referring to by mentioning the size of executables is an indication of how static arrays are currently implemented, the array seems to be compiled into the executable, whether this is to increase performance or decrease complexity, we don't know. >> >2) Implicit initialization is great. However, it can clearly and easily > be >> >demonstrated to be a serious impediment. Given that D is supposed to >> be a >> >'systems' language, I humbly suggest that a mechanism be provided to > bypass >> >such nicities when one is prepared to assume full responsibility. For >> >example, there might be a qualifier for variables that disables the >> >automatic initialization (along the lines of the 'auto' qualifier). >> >> An impediment how? Just in terms of performance? In general, I would say > that >> default initialization is worth the performance cost for its ability to > prevent >> bugs. > > I don't think anything negative was said about default initialization as a > feature, Sean. If it was then I apologize. Default/implicit initialization > is great (as was noted; so we agree), but there are times when you need to > take full control over what's going on. It may not matter to you > specifically, but there are many situations whereby one needs to take full > responsibility over a select set of variables (such as temporary stack-based > buffers). Providing the mechanism as an /option/ is not detrimental to the > language. It's like applying "const", or one of the other optional > qualifiers. > > I'm just illustrating that the language could really use a means to disable > default-initialization for those cases where it is deemed both completely > unecessary and too costly. Surely that's not too much to consider? After seeing the cost of default initialization, I can see when someone might not want it, it should be optional IMO. Perhaps it's a good default behaviour, I am un-convinced, AFAIKS un-initialised data generally causes a spectaclar and obvious failure, in fact shouldn't good DBC catch them, so, if I have to choose between default initialization and efficient arrays I choose the latter. default initialization also seems to be responsible for hiding my un-used variables? or is dmd just not looking for them, I guess an 'error' is overkill for an un-used variable, and there might possibly be a reason to use one, can anyone think of one? I am a tidy programmer and these bug me. Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ |
August 12, 2004 Re: D stack allocation | ||||
---|---|---|---|---|
| ||||
Posted in reply to van eeshan | van eeshan wrote:
> "Ben Hinkle" <bhinkle4@juno.com> wrote in message news:cfe87l$1akg$1@digitaldaemon.com...
>> Have you tried alloca? I haven't tried running it but I'm thinking of
>>
>> void stack2()
>> {
>> void* x = alloca(16*1024);
>> }
>
> Tried that, Ben; alloca() also clears everything to zero.
Hmm. I just tried on Linux and got the times:
heap: .17sec
stack: 14
stack2: .06
looks like on Linux alloca is the fastest.
|
August 12, 2004 Re: D stack allocation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ben Hinkle | Well, blow me down with a feather. I must have really screwed that test up; thank you Ben ~ that gets me out of a hole for now. So, in light of this, where are we now? Currently, if you want to quickly allocate an array on the stack you must do something like the following: char[] array = (cast(char *) alloca(Size))[0..Size]; I feel it would be beneficial to both programmers and to the language reputation to provide something rather more "accessible" than this ... thing. Perhaps a globally available template? Something, where taking advantage of the stack is not hidden away in the darkest recesses. This is why I figured something like an optional qualifier on the variable (to inhibit default initialization) would be ideal: it's easy to apply, easy to understand, easy to document, and very likely easy to implement. If this sort of thing remains poorly documented, or is not easily accessible, people will be left wondering why their supposed fast stack-arrays are actually eating serious chunks of CPU time. After all, porting some C code is what led to this foray (temporary stack arrays are commonplace in C). Finally, the alloca() approach is still ~700% slower than the C/C++ standard mechanism, yet it is faster than simply declaring a "char[20]" D local variable (that's kinda' hoky when you think about it ... a char[20] declaration is ~750% slower in D than in C). What to do? Can we please have a simple, clean, optional mechanism for inhibiting default initialization? Is it really so much trouble? "Ben Hinkle" <bhinkle4@juno.com> wrote in message news:cfe87l$1akg$1@digitaldaemon.com... > Have you tried alloca? I haven't tried running it but I'm thinking of > > void stack2() > { > void* x = alloca(16*1024); > } > > > van eeshan wrote: > > > "Walter" <newshound@digitalmars.com> wrote in message news:cfbip7$ora$1@digitaldaemon.com... > >> > >> "van eeshan" <vanee@hotmail.net> wrote in message news:cfb72j$9l7$1@digitaldaemon.com... > >> > "Walter" <newshound@digitalmars.com> wrote in message news:cfb5v9$76u$1@digitaldaemon.com... > >> > > The problem, however, is that since Java does not allow any objects or arrays on the stack (or in static data), whenever you need one it > > needs > >> to > >> > > be allocated on the heap. This is a lot more expensive than stack allocation. Since one winds up allocating a lot more heap objects in > >> Java > >> > > than one does in C/C++, the end result is slower. Java offers no alternatives to gc allocation. > >> > > > >> > > This is why D allows stack based objects and value objects. > >> > > >> > Are you saying that D allows one to create Objects entirely upon the > >> stack? > >> > >> For auto objects, it is allowed for the compiler to put them there. But I > >> was primarilly referring to structs and arrays. > >> > >> > And entirely within static data? > >> > >> Structs and arrays, yes. Java does not allow structs, nor does it allow arrays on the stack. > > > > > > Thanks for clarifying that Walter. Would it be prudent to note that D allocates auto Objects on the heap? Stack allocation may be allowed for in > > the spec, but that's not how the compiler operates. > > > > Also, there is that point about heap allocations being a lot more expensive than stack allocations. Naturally, one would lean towards wholeheartedly agreeing; but this is just not true when it comes to D. Here's a little (contrived) test jig to illustrate: > > > > private import std.c.time; > > private import std.c.stdlib; > > private import std.c.windows.windows; > > > > void heap() > > { > > void *x = malloc (16 * 1024); > > free (x); > > } > > > > void stack() > > { > > ubyte[16 * 1024] x; > > } > > > > void trace (uint startTime) > > { > > FILETIME ft; > > GetSystemTimeAsFileTime(&ft); > > printf ("%u\n", (ft.dwLowDateTime - startTime) / 10000); > > } > > > > void main() > > { > > FILETIME start; > > > > GetSystemTimeAsFileTime(&start); > > trace (start.dwLowDateTime); > > > > sleep (3); > > trace (start.dwLowDateTime); > > > > for (int i=1_000_000; --i;) > > heap(); > > > > trace (start.dwLowDateTime); > > > > for (int i=1_000_000; --i;) > > stack(); > > > > trace (start.dwLowDateTime); > > } > > > > The results? On this machine it takes the heap() routine ~500ms to > > execute, and the stack() version ~3500ms. Yep, that's seven times longer. > > Why? What wasn't stated is that D ensures all stack variables are initialized ... including arrays. The innocuous stack allocation actually > > clears the 16KB to zero each time, which is why it takes so long. For a 32KB 'buffer', the stack() execution time goes up to ~8600ms versus the ~500ms for heap(); you can see there's some rather non-linear behavior for > > the stack version <g> > > > > The point here is not about the benefits/detriments of deterministic initialization. It's about clarifying a point of performance-tuning that is far from obvious. Without wishing to put too fine a point on it, the quoted post was somewhat misleading. That is; allocating an array on the D > > stack can consume far more cycles than the equivalent heap allocation. It > > depends upon the size of the array, the bus speed, the size of the cache, > > heap fragmentation and many other things. > > > > D stack allocations are simply not like C/C++ at all. Far from it. > > > > > > For the curious, here's the disassembly for both routines. It's worth noting that the array initialization could be sped up. Also worth noting is that the loop/call overhead was 10ms. > > > > 10: { > > 11: void *x = malloc (16 * 1024); > > 12: free (x); > > 00402011 push 4000h > > 00402016 call _malloc (0040574c) > > 0040201B add esp,4 > > 0040201E push eax > > 0040201F call _free (004057a4) > > 00402024 add esp,4 > > 13: } > > 00402027 pop eax > > 00402028 ret > > 14: > > 15: > > 16: void stack() > > 0040202C push ebp > > 0040202D mov ebp,esp > > 0040202F mov edx,4 > > 00402034 sub esp,1000h > > 0040203A test dword ptr [esp],esp > > 0040203D dec edx > > 0040203E jne _D5hello5stackFZv+8 (00402034) > > 00402040 sub esp,edx > > 00402042 mov ecx,4000h > > 00402047 xor eax,eax > > 17: { > > 18: ubyte[16 * 1024] x; > > 00402049 push edi > > 0040204A lea edi,[x] > > 00402050 rep stos byte ptr [edi] > > 19: } > > 00402052 pop edi > > 00402053 mov esp,ebp > > 00402055 pop ebp > > 00402056 ret > |
August 12, 2004 Re: D stack allocation | ||||
---|---|---|---|---|
| ||||
Posted in reply to van eeshan | "van eeshan" <vanee@hotmail.net> wrote in message news:cfdoer$13pt$1@digitaldaemon.com... > There's an interesting quirk at the 4096-byte mark: the codegen changes strategy slightly at this point, and I'd guess it is doing something akin to > paging (ensuring the stack page-set is available), though can't be sure. Take a look at the previously posted assembler for reference. That's because of how the underlying OS allocates space to the stack. This shift in strategy happens with C/C++, too. > If local (stack) arrays were to use the C/C++ approach, the time consumed would be ~10ms across the board, rather than the ~500ms for heap allocations. That is, for a 32KB local array, the initialization cost is ~865 times /more/ than it would be for the equivalent C/C++ approach (~10ms > versus ~8650ms); almost three orders of magnatude. To do an apples/apples comparison, benchmark against calloc/free, not malloc/free. I normally use calloc in C++, not malloc, because the uninitialization bugs aren't worth it. > This indicates two things: > > 1) making claims about D's perfomance superiority over Java due to the former's ability to use the stack (versus always on the heap) is based on speculation; not fact. Testing shows this to be entirely the opposite for No, it does not show that at all: 1) Your test was with C++, not Java. 2) You cannot allocate uninitialized arrays (or any uninitialized data) in Java. 3) I've written a Java compiler and have done a lot of performance tuning of Java apps trying to make them go faster. I'm very familiar with where the cycles got eaten. 4) I presume that someone allocating an array in C++ would presumably store something in it - just allocating it, doing nothing with it, and freeing it is about as representative as benchmarking the sequence of code: x = x + 1; x = x - 1; > arrays beyond 64 bytes in length. Most lightweight objects are less than 64 bytes in length. Allocating 1000 byte objects on the stack is very rare, not the usual case. > Similarly, any kind of performance testing > against C/C++ within this realm will be viewed as a negative mark against D. I've had problems with inappropriate use of benchmarks since my compiler executed the following loop: void foo() { int i; double d; for (i = 0; i < 1000; i++) d += 1; } in orders of magnitude faster than the competition. The journalist disassembled the code, found that the compiler had removed the floating point instructions, and so declared the compiler to be "buggy" in his writeup. He never bothered to contact us to find out why. What actually happened was that it was the ONLY compiler reviewed that did data flow analysis, and concluded that the benchmark routine did nothing and so deleted it. Today, nearly all compilers do that. > 2) Implicit initialization is great. However, it can clearly and easily be demonstrated to be a serious impediment. Given that D is supposed to be a 'systems' language, I humbly suggest that a mechanism be provided to bypass > such nicities when one is prepared to assume full responsibility. For example, there might be a qualifier for variables that disables the automatic initialization (along the lines of the 'auto' qualifier). The solution is improving the optimizer so it can determine when the initialization is redundant. It already does this for most variables, just not for arrays. |
August 12, 2004 Re: D stack allocation | ||||
---|---|---|---|---|
| ||||
Posted in reply to van eeshan | "van eeshan" <vanee@hotmail.net> wrote in message news:cfe60s$1a2i$1@digitaldaemon.com... > I don't think anything negative was said about default initialization as a feature, Sean. If it was then I apologize. Default/implicit initialization is great (as was noted; so we agree), but there are times when you need to take full control over what's going on. It may not matter to you specifically, but there are many situations whereby one needs to take full responsibility over a select set of variables (such as temporary stack-based > buffers). Providing the mechanism as an /option/ is not detrimental to the language. It's like applying "const", or one of the other optional qualifiers. > > I'm just illustrating that the language could really use a means to disable > default-initialization for those cases where it is deemed both completely unecessary and too costly. Surely that's not too much to consider? You can take full control in D by doing: byte* p = malloc(size); and p will point to an uninitialized buffer. |
August 12, 2004 Re: D stack allocation | ||||
---|---|---|---|---|
| ||||
Posted in reply to van eeshan | "van eeshan" <vanee@hotmail.net> wrote in message news:cfe8ob$1b85$1@digitaldaemon.com... > "Ben Hinkle" <bhinkle4@juno.com> wrote in message news:cfe87l$1akg$1@digitaldaemon.com... > > Have you tried alloca? I haven't tried running it but I'm thinking of > > > > void stack2() > > { > > void* x = alloca(16*1024); > > } > > Tried that, Ben; alloca() also clears everything to zero. No, it does not. It simply calls C's alloca(), and gets whatever C's alloca() does. Note, however, that under linux the OS seems to clear pages that are faulted into the stack, so it could *appear* to be 0 initialized when it really is just fresh snow <g>. |
August 12, 2004 Re: D stack allocation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | "Walter" <newshound@digitalmars.com> wrote > > If local (stack) arrays were to use the C/C++ approach, the time consumed > > would be ~10ms across the board, rather than the ~500ms for heap allocations. That is, for a 32KB local array, the initialization cost is ~865 times /more/ than it would be for the equivalent C/C++ approach > (~10ms > > versus ~8650ms); almost three orders of magnatude. > > To do an apples/apples comparison, benchmark against calloc/free, not malloc/free. I normally use calloc in C++, not malloc, because the uninitialization bugs aren't worth it. The whole point of that post was to selectively and optionally /inhibit/ initialization, where that eats far too many cycles. That's why malloc() was used rather than calloc(). There are plenty times where you need to avoid the (totally superfluous) overhead with workspace buffers. You were stating that heap allocation is a lot slower than stack allocation. Well, that is absolutely not true when you don't want the initialization; heap allocation is (bizarrely) faster within D since you /cannot/ inhibit the darned array initialization when you really need to. C/C++ stack allocations are blindingly fast by comparision; just a couple of instructions in many cases. > 1) Your test was with C++, not Java. That was D source-code. > 4) I presume that someone allocating an array in C++ would presumably store > something in it - just allocating it, doing nothing with it, and freeing it > is about as representative as benchmarking the sequence of code: > > x = x + 1; > x = x - 1; Thanks Walter ... you have never heard of allocating a stack buffer, and then passing it off to another function for population? Yes, something else is done with the buffer, but often there's only a very /small/ part of it used. The overhead of initialization totally swamps the typical processing time. Do that often enough (in, say, something like text processing) and it adds up rapidly. You've never done this kind of thing before? Fine. This was a test to see where the unknown overhead was coming from during a C port; naturally, the test was whittled down to isolate the culprit. Isn't that how you like them? > Most lightweight objects are less than 64 bytes in length. Allocating 1000 byte objects on the stack is very rare, not the usual case. What usual case? Do you mean in all software? Please be a little more specific, as "usual case" is terribly vague. Allocating double[1024], char[8192], and often much bigger, on the stack is not at all rare; far from it. How about just one example? Microsoft does it /all/ the time in their Operating Systems, IE, and probably all their other mainstream software. Why do you think AMD and Intel have the NX instruction in their CPUs now? What's nice about D is that you always know the length of the array, and debug options catch out-of-bounds conditions. Rare? I'd claim it's perhaps a rather usual case in existing C and C++ code. May I ask where you get your use-cases from, please? > I've had problems with inappropriate use of benchmarks since my compiler executed the following loop: > > void foo() > { int i; > double d; > for (i = 0; i < 1000; i++) > d += 1; > } > > in orders of magnitude faster than the competition. The journalist disassembled the code, found that the compiler had removed the floating point instructions, and so declared the compiler to be "buggy" in his writeup. He never bothered to contact us to find out why. What actually happened was that it was the ONLY compiler reviewed that did data flow analysis, and concluded that the benchmark routine did nothing and so deleted it. Today, nearly all compilers do that. You didn't bother to ask about the code-paths that prompted this topic, Walter; yet seem to go out of your way to belittle any and all points made; and then vehemently deny any possible existence of the problem. I really don't understand why this post has apparently inflamed you so, but do offer sincere and profound apologize. One would have thought knowing where the codegen was seriously non-optimal for given scenarios would be helpful to you. Would you rather not know? > > > > 2) Implicit initialization is great. However, it can clearly and easily be > > demonstrated to be a serious impediment. Given that D is supposed to be a > > 'systems' language, I humbly suggest that a mechanism be provided to > bypass > > such nicities when one is prepared to assume full responsibility. For example, there might be a qualifier for variables that disables the automatic initialization (along the lines of the 'auto' qualifier). > > The solution is improving the optimizer so it can determine when the initialization is redundant. It already does this for most variables, just not for arrays. Great! Will it eliminate initialization for this type of (trivail) usage: void A (void delegate (char[]) formatter) { char [1024 * 32] workspace; formatter (workspace); } Because, if not, then the following modification is the sort of (keyword qualifier) thing that would hand control back to the programmer: { uninitialized char[1024 * 32] workspace; formatter (workspace); } That's perhaps not the best name for it, but you get the idea. It's very clear what's going on, it's entirely optional, and the compiler will do exactly what the programmer intends it to do. No under-the-tablecloth-magic required. Oh, and it's easy to grep for. It's just a suggestion. |
August 12, 2004 Re: D stack allocation | ||||
---|---|---|---|---|
| ||||
Posted in reply to van eeshan | In article <cfete2$1p7g$1@digitaldaemon.com>, van eeshan says... >Currently, if you want to quickly allocate an array on the stack you must do something like the following: > > char[] array = (cast(char *) alloca(Size))[0..Size]; My experiments seem to suggest that the following actually works rather well. # void f() # { # ubyte[] a; # asm # { # sub ESP,1000; // Reserve 1000 bytes # mov a,1000; // Set a.length = 1000 # mov a+4,ESP; // Set &a[0] to reserved bytes # } # } |
August 12, 2004 Re: D stack allocation | ||||
---|---|---|---|---|
| ||||
Posted in reply to van eeshan | On Thu, 12 Aug 2004 00:31:38 -0700, van eeshan wrote: > "Walter" <newshound@digitalmars.com> wrote [snip] >> The solution is improving the optimizer so it can determine when the initialization is redundant. It already does this for most variables, just not for arrays. > > Great! Will it eliminate initialization for this type of (trivail) usage: > > void A (void delegate (char[]) formatter) > { > char [1024 * 32] workspace; > > formatter (workspace); > } > > Because, if not, then the following modification is the sort of (keyword qualifier) thing that would hand control back to the programmer: > > { > uninitialized char[1024 * 32] workspace; > > formatter (workspace); > } > > That's perhaps not the best name for it, but you get the idea. It's very clear what's going on, it's entirely optional, and the compiler will do exactly what the programmer intends it to do. No under-the-tablecloth-magic required. Oh, and it's easy to grep for. It's just a suggestion. Sounds like a good idea. It allows a coder to take some responsibility for their own work. There is also nothing wrong with D's auto-initialized arrays *when* the coder accepts them too. I feel that D ought to support both ideas. A new keyword is probably the way to go. -- Derek Melbourne, Australia 12/Aug/04 5:54:15 PM |
August 12, 2004 Re: D stack allocation | ||||
---|---|---|---|---|
| ||||
Posted in reply to Arcane Jill | Now that is exactly what the compiler ought to (optionally) provide for, at the high level :-) 16 bytes allocations (versus relative execution time): heap: 269 stack[]: 50 alloca(): 57 asm: 6 2K allocations: heap: 504 stack[]: 5681 alloca(): 57 asm: 10 32K allocations: heap: 520 stack[]: 8812 alloca(): 118 asm: 10 There's that 32KB stack-array initialization being ~88000% slower than the traditional [asm] C/C++ uninitialized style. Barring further issue, I will use that ~ thanks very much Jill. Perhaps one of the Wiki owners will take an interest in noting these workarounds. "Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cff6a9$1ugb$1@digitaldaemon.com... > In article <cfete2$1p7g$1@digitaldaemon.com>, van eeshan says... > > >Currently, if you want to quickly allocate an array on the stack you must do something like the following: > > > > char[] array = (cast(char *) alloca(Size))[0..Size]; > > My experiments seem to suggest that the following actually works rather well. > > # void f() > # { > # ubyte[] a; > # asm > # { > # sub ESP,1000; // Reserve 1000 bytes > # mov a,1000; // Set a.length = 1000 > # mov a+4,ESP; // Set &a[0] to reserved bytes > # } > # } > > |
Copyright © 1999-2021 by the D Language Foundation