August 13, 2004
In article <cfhhbe$3jf$1@digitaldaemon.com>, Walter says...

>Allocating large, uninitialized arrays on the stack has its own problems:
>
>1) The uninitialized data that is on the stack will get scanned by the garbage collector looking for any references to allocated memory. Since the uninitialized data consists of old D stack frames, it is highly likely that some of that garbage will look like references into the gc heap, and the gc memory will not get freed. This problem really does happen, and can be pretty frustrating to track down. Some of my C code that uses GC gets a memset() on the stack frame for that reason.

That is a very good point. In fact, it's an excellent point, and one which I had not previously considered. Although, I suppose you could get round it with something like this:

#   auto class GCDisabler
#   {
#       this() { std.gc.disable(); }
#       ~this() { std.gc.enable(); }
#   }


>2) It's possible for a function to pass out of it a reference to data on that function's stack frame. By then allocating a new stack frame over the old data, and not initializing, the reference to the old data may still appear to be valid. The program will then behave erratically. Initializing all data on the stack frame will greatly increase the probability of forcing that bug into the open in a repeatable manner.

That's another very good point.


>3) Large arrays on the stack bring up the specter of stack exhaustion with even modest amounts of recursion.

I'm not sure about this one. I thought that in the 80386+ architecture, the stack(s) can grow indefinitely. Or is it just that Windows doesn't take advantage of this possibility.



>That said, there are things I can do to generate faster code for the initialization.

I suspect that if you succeed in making this lightening-fast, everyone will stop complaining. You only have to look at the figures which have been posted recently to see what everybody's complaining about.



August 13, 2004
"Walter" <newshound@digitalmars.com> wrote in message news:cfhhbe$3jf$1@digitaldaemon.com...
>
> "van eeshan" <vanee@hotmail.net> wrote in message news:cfh14k$2ouk$1@digitaldaemon.com...
> > I misread your code somewhat, Water, but what you suggest has a fatal
flaw
> > in it:  you are still initializing the original stack-based array, and you're making the assumption that the typical size is very small.
>
> I'm confused - you wrote: "The allocated (temporary) workspace is
typically
> used in only a small amounts, but sporadically will get used more fully."
My
> suggestion was tuned to that.

This topic had become attuned somewhat to 16KB and 32KB buffers, so the typical population of 2 to 3KB is relatively small; I can understand how that might have been misinterpreted to become the < 5 bytes you had tuned for.

> > b) the initial size needs to be around the 2 to 3KB mark. Given that,
your
> > example is hardly faster than the original code. In fact, the
> initialization
> > cost is ~57000% slower than the equivalent C/C++ array allocation.
That's
> > better than ~88000%, but who's to quibble at these levels?
>
> If your app typically is going to fill 2 to 3 Kb, I suggest using malloc()
> (or alloca()) for it. The handful of extra instructions to allocate via
> malloc as opposed to the stack will be lost in the thousands of
instructions
> needed to fill 2 to 3 Kb.

Good that you should note the thousands of instructions, because that's effectively what the default initialization does (fills the entire array with default data).  This particular app is actually very efficient in filling the 'typical' space used; you might think of it as effectively a quick lookup followed by a memcpy(). If you consider that as the core element of the algorithm; then the default initialization will slow that core down by almost a factor of two (clear the entire buffer, then fill the whole thing again). That slows down the overall application quite noticeably.  One is rather likely to notice that level of redundant overhead in any CPU-bound application that doesn't completely toss the notion of efficiency out of the proverbial window.

> Allocating large, uninitialized arrays on the stack has its own problems:
>
> 1) The uninitialized data that is on the stack will get scanned by the garbage collector looking for any references to allocated memory. Since
the
> uninitialized data consists of old D stack frames, it is highly likely
that
> some of that garbage will look like references into the gc heap, and the
gc
> memory will not get freed. This problem really does happen, and can be pretty frustrating to track down. Some of my C code that uses GC gets a memset() on the stack frame for that reason.

So what you are saying is that the current GC is incompatible with stack allocation in general? What you describe is not restricted to uninitialized data, Walter ~ what you appear to be saying is that /any/ data on the stack could potentially look like a reference into the gc heap (as you describe it). If this is the case, the use of alloca() will have exactly the same issues, yet you recommend that as a solution a few paragraphs above. Where does this really stand? Are you really saying that the D GC is incompatible with long-term stack content? That is, if I place some arbitrary values in a stack array during main(), will those values cause the GC confusion throughout the remainder of program execution if they happen to look like GC references?

> 2) It's possible for a function to pass out of it a reference to data on that function's stack frame. By then allocating a new stack frame over the old data, and not initializing, the reference to the old data may still appear to be valid. The program will then behave erratically. Initializing all data on the stack frame will greatly increase the probability of
forcing
> that bug into the open in a repeatable manner.

Oh ... I thought we were already well past the general case here, and into the realm of where the programmer had alreay taken full responsibility for their actions. Specifically, not initializing the stack content. Let's face it: it's quite feasible for any function to barf all over the heap, stack, and everywhere else until the hardware catches it.

And what about the cases whereby a stack buffer is consistently reused within the same, or subordinate, function? It's not suddenly cleared all over again before each subsequent usage (e.g. between foreach loops), is it? This appears to be a completely moot (and potentially misleading) point, but please educate me if I'm wrong. Seriously ...

>
> 3) Large arrays on the stack bring up the specter of stack exhaustion with even modest amounts of recursion.

Anyone who has read this thread is well aware that we're not talking recursion. Regardless; your sudden concern didn't stop this being a common and successful approach in C/C++. I mean, did you outlaw large(ish) stack-arrays in the DM C++ compiler? That is not a rhetorical question ...

Besides, as noted previously, it's best we not confuse this topic with design-methodologies; as even the best education cannot reconcile such issues. I hope we can agree on that.

>
> That said, there are things I can do to generate faster code for the initialization.

Yes, I'm aware of that. You could, for example, initialize in chunks of four rather than one. However, that will still not even get close to a C/C++ style array-allocation on the stack. But the thought is both appreciated and noted.



To summarize thus far:

a) we have a D language facility to initialize all variables with default values. This is arguably a wonderful facility, and can help catch certain bugs in the case of general programming.

b) unfortunately there's no easy "accessible" means to inhibit that initialization for stack-based arrays; leaving the programmer to eat a potentially (and repeated) vast overhead of CPU consumption, or resort to some non-obvious guru-tricks (none of which will easily work on anything other than a single dimensional array) ... Not to mention that this CPU cost is completely hidden from source-level view in the first place.

c) you indicate, Walter, that the GC has problems with uninitialized data on the stack. Well, that will happen regardless of whether or not the language supports non-initialization. That is; a valid stack array can contain arbitrary data patterns ~ said content may look exactly like gc references. Please correct me if I'm wrong.

d) The language explicitly exposes a built-in assembler, to do whatever the heck anyone might want to (barring the usual platform issues). Yet, there seems to be a massive pushback against providing an easy to use, symmetrical, high performance, and thoroughly useful counter to the default initialization ~ the alternative to which is to drop into assembly (or similar guru-grok), to simulate what C/C++ provides as its default, highly efficient, behavior. There's not even the vaguest counter-argument relating to implementation complexity.

Please forgive the unsavory mess while this latter, wildly contradictory concept, blows my tiny non-academic mind all over the inside of your screen ...

Let me finish with the sincere hope that my (obviously) thoroughly weak grasp on reality will soon be at an end. Can you please help me wrangle that into shape Walter?



August 13, 2004
Just out of curiosity, how does alloca() work? I guess it just manipulates the Stack and Base Pointers?

My apologies for going a bit off topic, I've just never seen alloca() before.


August 13, 2004
"Walter" <newshound@digitalmars.com> wrote in message news:cfhk9n$5jh$1@digitaldaemon.com...
>
> > plus error-message
> > for equivalent pathological conditions (out of memory).
>
> This is not necessary, because very few programs can proceed if they run
out
> of memory, so they just issue an error message and abort. In D, this is handled automatically - running out of memory throws an exception with a message in it, if you don't catch the exception, the message is printed
and
> the program aborted. For nearly all programs, this is the right behavior.

Perhaps you haven't had the opportunity to write resiliant software? For
example, halting an entire web-site, because some file is not in the correct
format, is just not a solution. Frankly, I'm really surprised you'd
encourage
something so obviously naiive in this day and age.

With a server, you do whatever is necessary to track and release allocations
to ensure the server stays alive as long as it can. Even better; you  focus
on temporarily /stack/ allocations (surprise!), and thread-local reusable
buffers, so there's never any issue about running out of memory (as a bonus,
it's all thread-safe). Please don't waste your efforts on me in this
respect: Been
there. Done that. Seen the movie a gazillion times. Optimize it while I
sleep.


> > 2) what you are now suggesting is that the program should be redesigned instead. That is simply not realistic for working, optimal code. The
code
> > compiled and executed without too much effort, but it exhibits
performance
> > issues. That's what drove the foray into this topic. I will probably fix
> it
> > with the assembly-code thoughtfully provided by AJ (and thanks to Ben
also
> > for both his suggestions) but that's hardly an accessible, or portable, resolution in the general sense.
>
> I suggest the alloca() approach, and do some timing tests on the real app with it vs AJ's approach. I'm willing to bet you a beer that you won't be able to measure the difference, especially if the typical use of the array will fill it 2 or 3 Kb worth.

You are well aware that this is not about alloca() versus assembly.


> > 3) the solution offered (by both yourself and by Ben) is not allocated
on
> > the stack.
>
> The alloca() version is.

Was referring to the example Ben gave that looked rather like your own one. The one he noted in the post you  recently replied to.


> > The whole point here is to utilize the stack because it's a much
> > faster means of providing a temporary workspace. Going to the heap means
a
> > partial redesign, and the program will still run slower in D than it did originally. Forcing stack arrays to initialize will /always/ be
> dramatically
> > slower than the equivalent C/C++ code, whereas a simple embellishment to
> the
> > existing initialization mechanism would resolve that completely; and in
a
> > jiffy. This is about choice, Walter, and supporting the programmer in
what
> > they are doing.
> >
> > 4) the approach to addressing this seems to be one of "guilty until
proven
> > innocent". That is, there doesn't seem to be any notion on your part
that
> > this might, actually, be a valid observation. With solid number to back
it
> > up. You're basically saying, " ... the approach never had, and never
will,
> > have any validity whatsoever. Rip out the code and do it just like I
> would".
> > I do sincerely hope that I'm misunderstanding you, but that's kinda' how
> it
> > has been coming across.
> >
> > May I ask why some easy manual control over default-initialization is
such
> a
> > terrible thing? Is it really /so/ impossible that you'd rather force
> people
> > into re-implementing and retesting parts of their existing codebase
> instead
> > (as you suggest above) ?
> >
> > Or is there perhaps some fundamental ideology at stake, that I just
don't
> > see?
>
> Uninitialized data can be a source of bugs and trouble, as I pointed out
in
> my other posting, even when used correctly. One design goal of D is to improve reliability and portability by eliminating sources of undefined behavior, and uninitialized data is one huge source of undefined, unportable, erratic and unpredictable behavior. I'm trying to squeeze that out of the normal constructs in D - but access to C functions and inline assembler is there for folks who really need it.

Most admirable, Walter. And agreed, as I've stated in virtually every single post on this topic.

But here's where that position truly falls apart ... what about those multitude of cases where buffers are reused over and over again? What about thread-local workspaces? What about pooled buffers?  what about all those X, Y, and Z variations? They do /not/ get automagically re-initialized.

Your argument, as you state it, is one-shot only. That is, the whole basis of said argument hold validity only as long as the "uninitialized data" is used once, and once only. Anyone who produces high-performance designs and/or code will chortle with glee at this glaring logic hole (where such buffer are deliberately reused thousands, sometimes millions, of times per second). Such systems do not have the time to spend clearing out each little byte of data ... they simply ensure that it's never a problem. Can we please get past this level of discussion?

You know; in the time you've taken over these postings, you could probably
have implemented a keyword qualifier to optionally inhibit the default
initialization. Again, we're talking about a deliberate action by the
programmer to say " I don't want that frickin CPU gobblin initialization!".
From the developers position, your noble ideology has already  been placed
to one side at that point. The D language should be able to help there,
rather
than just get in the way. Remember? It's a "Practical Language for Practical
Programmers".

Please consider.




August 13, 2004
In article <cfhqhu$9jj$1@digitaldaemon.com>, van eeshan says...

>>Since
>the
>> uninitialized data consists of old D stack frames, it is highly likely
>that
>> some of that garbage will look like references into the gc heap, and the
>gc
>> memory will not get freed.

>Are you really saying that the D GC is incompatible
>with long-term stack content? That is, if I place some arbitrary values in a
>stack array during main(), will those values cause the GC confusion
>throughout the remainder of program execution if they happen to look like GC
>references?

I think you may have misunderstood Walter. Arbitrary random data can indeed give the GC false positives, but this usually doesn't matter, since it is rare. What Walter said is that uninitialized stack data is /not random/ - in that it is almost certain to contain fragments of some previous stack frame, and therefore also /real, valid/ (but out-of-date) references to GC-controlled memory. So the number of false positives will shoot up from "almost none" to "very many".

(And my previous suggestion of temporarily disabling the GC won't help either, so I withdraw the suggestion).

Jill


August 13, 2004
Fair point Jill.

But at that stage, one might as well state that the stack is entirely off limits for general array usage. You cannot predict the frequency of what might or might not look like GC references for any given application. It would require dynamic sampling to find a trend and try to offset that against GC behavior. The real problem, in this particular case, is the GC ~ it's simply incompatible with arbitrary (yet valid) stack-based array content. How much of a problem is that? It may actually be a red-herring. On the other hand, I may be waning considerably at this very moment (eight hours behind you).



"Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cfhvbq$bra$1@digitaldaemon.com...
> In article <cfhqhu$9jj$1@digitaldaemon.com>, van eeshan says...
>
> >>Since
> >the
> >> uninitialized data consists of old D stack frames, it is highly likely
> >that
> >> some of that garbage will look like references into the gc heap, and
the
> >gc
> >> memory will not get freed.
>
> >Are you really saying that the D GC is incompatible
> >with long-term stack content? That is, if I place some arbitrary values
in a
> >stack array during main(), will those values cause the GC confusion throughout the remainder of program execution if they happen to look like
GC
> >references?
>
> I think you may have misunderstood Walter. Arbitrary random data can
indeed give
> the GC false positives, but this usually doesn't matter, since it is rare.
What
> Walter said is that uninitialized stack data is /not random/ - in that it
is
> almost certain to contain fragments of some previous stack frame, and
therefore
> also /real, valid/ (but out-of-date) references to GC-controlled memory.
So the
> number of false positives will shoot up from "almost none" to "very many".
>
> (And my previous suggestion of temporarily disabling the GC won't help
either,
> so I withdraw the suggestion).
>
> Jill
>
>


August 13, 2004
> Because, if not, then the following modification is the sort of (keyword qualifier) thing that would hand control back to the programmer:
> 
> {
>     uninitialized char[1024 * 32] workspace;
> 
>     formatter (workspace);
> }
> 
> That's perhaps not the best name for it, but you get the idea.

another option is to look for "void" as an initialization value:
{
  char[1024*32] workspace = void;
  formatter (workspace);
}
that way we don't need another keyword. If the initializer is void the
variable isn't initialized.
August 13, 2004
In article <cfiam0$hhc$1@digitaldaemon.com>, Ben Hinkle says...

>another option is to look for "void" as an initialization value: {
>  char[1024*32] workspace = void;
>  formatter (workspace);
>}
>that way we don't need another keyword. If the initializer is void the
>variable isn't initialized.

/another/ option is for the compiler, on seeing the code:

#    char[1024*32] workspace;

would think to itself: "Hmmm. Now 1024*32 - that's 32768. That's a mighty large number of bytes to be putting on the stack. I think I'll allocate it from the heap instead. The programmer won't care. Hell, the programmer won't even /notice/ unless they start looking at the generated assembler!"

Arcane Jill


August 13, 2004
"Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cfidel$j0u$1@digitaldaemon.com...
> In article <cfiam0$hhc$1@digitaldaemon.com>, Ben Hinkle says...
>
> >another option is to look for "void" as an initialization value: {
> >  char[1024*32] workspace = void;
> >  formatter (workspace);
> >}
> >that way we don't need another keyword. If the initializer is void the
> >variable isn't initialized.
>
> /another/ option is for the compiler, on seeing the code:
>
> #    char[1024*32] workspace;
>
> would think to itself: "Hmmm. Now 1024*32 - that's 32768. That's a mighty
large
> number of bytes to be putting on the stack. I think I'll allocate it from
the
> heap instead.

That would be very unusual for a compiler to do - and it raises various questions like what "large" is. Plus unless it uses malloc to allocate it still won't be as fast as C due to the initialization overhead.

> The programmer won't care. Hell, the programmer won't even /notice/ unless they start looking at the generated assembler!"

uhh - that's obviously not completely true. Some programmers will care and will definitely notice. The question is how many (and how bad the problem is). Walter thinks the number is small (ie-not large enough to change things). The OP thinks the number is significant. I'm in the camp of nice-to-have but not a high priority problem.

> Arcane Jill



August 13, 2004
Arcane Jill wrote:

> /another/ option is for the compiler, on seeing the code:
> 
> #    char[1024*32] workspace;
> 
> would think to itself: "Hmmm. Now 1024*32 - that's 32768. That's a mighty large
> number of bytes to be putting on the stack. I think I'll allocate it from the
> heap instead. The programmer won't care. Hell, the programmer won't even
> /notice/ unless they start looking at the generated assembler!"

I like this approach.  The programmer didn't say where he wants the memory to come from, so it's only logical to assume that he isn't thinking about it and is therefore relying on the compiler to do the right thing for him.

If the programmer wants to force the storage to one place or another, he can say so explicitly. (using calloc, malloc, gc.malloc, etc)

 -- andy