June 18, 2002
"Walter" <walter@digitalmars.com> ha scritto nel messaggio news:aem052$p8b$2@digitaldaemon.com...
>
> "anderson" <anderson@firestar.com.au> wrote in message news:ael937$2qm$1@digitaldaemon.com...
> > That reminds me of what we used to do in C to speed up malloc/realloc.
> Block
> > allocation is the key.
> > Parhaps .length++ could increase by n (ie 10) values or n% (ie 2%) and
> then
> > it'd only need to update when it gets passed that number again. Even C++'s vector did that.
>
> That was the original plan, but due to the way array slicing works,
there's
> no way for the GC to tell that there's "unused" space off the end of the array. Hence, it must reallocate and copy every time. The result is
terrible
> performance - the cure is to preallocate the length you'll need, or use OutBuffer which will do block reallocation in suitable increments.

So it is a thing to underline very well in the spec: "reallocation of a
vector through
the length property may be a CPU hog. Instead of writing:

    for (i = 0; i < n; i++)
    {
        array.length = array.length + 1;
        array[i] = foo();
    }

Write:

    array.length = array.length + n;
    for (i = 0; i < n; i++)
    {
        array[i] = foo();
    }

This way you reduce from exponential complexity to linear."

Obviously this should be written in correct English :-)

Ciao


June 18, 2002
It's not however the best balance of wasted memory (the memory allocated but never used or detected as "wasted")

I was thinking this over the other day and by the time you get to 5 or 6 it should start giving you one extra...at about 12 it should give you 2 extra.

Think about it this way.  You're working with modules that are so big that (or memory is so tight that) if one extra gets allocated, the game won't run (it will run out of memory and crash).  How many of those are you likely to be allocating.

If it's so big that allocating only 1 will be possible, you obviously want no extras.

If you can fit 2, good, but still wouldn't want to bet on having room for one more.

But say they've allocated 3 already.  Will they allocate more or not?  It's the gamble.  It may be that they only had enough memory for 3 of them.  Or it could be the start of a very big pattern.

I think that by about 4 you can be pretty sure they are small enough that if you allocate one extra it might not be such a big problem.

Or perhaps the allocator could be based on size of object.  Really large objects don't allocate as many in advance.  The smaller, the more likely to allocate more in advance (down to the optimal 1.5x approach).  For byte sizes I'd have it start out allocating 4 or 8 more just in case.  The heap structure usually needs this much alignment anyway.  At size 4 bytes I'd start allocating exactly how many they asked for times 2.  2.0x growth (round down).  For a 64K object, probably wouldn't want any extra preallocation (but you *might!*)  And somewhere in between is the 150%, probably at about the OS page level or cacheline level.  This could be info that the profiler could provide.  In fact, a standard allocator could perhaps be instructed to analyze usage patterns and come up with recommended allocation strategy percentage as a result (modify its "prediction" over time).  If you pluck these values out from the debugger after letting the program run for a while you can drop that number in during initialization to give the allocator a "hint" as to how you're going to be using it ("I'm going to be allocating a bunch of these" == 10.0x or more and "This is a singleton" == 0.0x)  In any case the allocator would be able to modify the number over time if it wished so it could be self-correcting.  For that the allocator would need access to some kind of timer (to tell the difference between 1 allocation per frame and 10000 allocations per frame).  Perhaps it could be a method

allocator.time_elapsed(1000); // 1 second in milliseconds

During which it could update its prediction value.  Default would be about 1.25.

I tried some functions with log, and you *way* don't want to go there. Exponentiation is not the thing we're after here.

This would need to be a compiler switch probably (memory allocation prediction factor).  Or a configurable parameter of a standard allocator.

Sean

"Matthew Wilson" <dmd@synesis.com.au> wrote in message news:aellgq$fgj$2@digitaldaemon.com...
> Koenig showed in (can't remember reference) that reallocation by 150% is
the
> best balance of speed/size. I would be quite happy with that.
>
> I enhanced that in a string class to act only from the second allocation onwards, i.e.. the first allocation is exact, in order to deal with
constant
> strings. The same could be applied to any array type



June 18, 2002
You sounds to be agreeing violently with me, or is is too late in the day?

"Sean L. Palmer" <seanpalmer@earthlink.net> wrote in message news:aemmca$1fpa$1@digitaldaemon.com...
> It's not however the best balance of wasted memory (the memory allocated
but
> never used or detected as "wasted")
>
> I was thinking this over the other day and by the time you get to 5 or 6
it
> should start giving you one extra...at about 12 it should give you 2
extra.
>
> Think about it this way.  You're working with modules that are so big that (or memory is so tight that) if one extra gets allocated, the game won't
run
> (it will run out of memory and crash).  How many of those are you likely
to
> be allocating.
>
> If it's so big that allocating only 1 will be possible, you obviously want no extras.
>
> If you can fit 2, good, but still wouldn't want to bet on having room for one more.
>
> But say they've allocated 3 already.  Will they allocate more or not?
It's
> the gamble.  It may be that they only had enough memory for 3 of them.  Or it could be the start of a very big pattern.
>
> I think that by about 4 you can be pretty sure they are small enough that
if
> you allocate one extra it might not be such a big problem.
>
> Or perhaps the allocator could be based on size of object.  Really large objects don't allocate as many in advance.  The smaller, the more likely
to
> allocate more in advance (down to the optimal 1.5x approach).  For byte sizes I'd have it start out allocating 4 or 8 more just in case.  The heap structure usually needs this much alignment anyway.  At size 4 bytes I'd start allocating exactly how many they asked for times 2.  2.0x growth (round down).  For a 64K object, probably wouldn't want any extra preallocation (but you *might!*)  And somewhere in between is the 150%, probably at about the OS page level or cacheline level.  This could be
info
> that the profiler could provide.  In fact, a standard allocator could perhaps be instructed to analyze usage patterns and come up with
recommended
> allocation strategy percentage as a result (modify its "prediction" over time).  If you pluck these values out from the debugger after letting the program run for a while you can drop that number in during initialization
to
> give the allocator a "hint" as to how you're going to be using it ("I'm going to be allocating a bunch of these" == 10.0x or more and "This is a singleton" == 0.0x)  In any case the allocator would be able to modify the number over time if it wished so it could be self-correcting.  For that
the
> allocator would need access to some kind of timer (to tell the difference between 1 allocation per frame and 10000 allocations per frame).  Perhaps
it
> could be a method
>
> allocator.time_elapsed(1000); // 1 second in milliseconds
>
> During which it could update its prediction value.  Default would be about 1.25.
>
> I tried some functions with log, and you *way* don't want to go there. Exponentiation is not the thing we're after here.
>
> This would need to be a compiler switch probably (memory allocation prediction factor).  Or a configurable parameter of a standard allocator.
>
> Sean
>
> "Matthew Wilson" <dmd@synesis.com.au> wrote in message news:aellgq$fgj$2@digitaldaemon.com...
> > Koenig showed in (can't remember reference) that reallocation by 150% is
> the
> > best balance of speed/size. I would be quite happy with that.
> >
> > I enhanced that in a string class to act only from the second allocation onwards, i.e.. the first allocation is exact, in order to deal with
> constant
> > strings. The same could be applied to any array type
>
>
>


June 18, 2002
You're right. Thanks for catching this egregious error.

"Nic Tiger" <nictiger@pt.comcor.ru> wrote in message news:aemia1$1ap7$1@digitaldaemon.com...
> Didn't anyone noticed that the following example is not valid not only for current DMD (as long as Walter doesn't allow ++ on properties) but is
simply
> *buggy*?
> >                int array[];
> >                array[array.length++] = x;
>
> AFAIK, the sequence of event will be:
>     array[array.length] = x;
>     array.length=array.length+1;
>
> And it is simply not valid array expanding. It's a bug which in the best case will give GPF (in the worst it silently corrupts memory). In fact,
the
> sequence should be:
>     array.length=array.length+1;
>     array[array.length-1] = x;
>
> which corresponds to
>     array[(++array.length)-1] = x;
>
> And after all, I really think that form is not very good. I agree that properties should have only gettors and settors, and ++, --, etc, should
not
> be used with them.
>
> Nic Tiger.



June 18, 2002
You're right. I obviously need to make this point clear in the spec.


June 18, 2002
This is the reason realloc is not encouraged in C++.  With value semantics something can just be copied and used.  C++ lets you override that, but ideally everything involved would inherently know how to do that safely or could prohibit copying.  Things that are being pointed to (with pointers) cannot be moved without updating all the pointers.  And pointers by definition go one direction.  So if you want to use a pointer to point at something, use the heap, otherwise make an array object and use the indices as pointers.  If that gets moved or copied, the pointer will still work.  It allows range errors to creep in though and that's where the concept of iterator comes in (an iterator is like a "safe" pointer; it knows the managed area and can tell if it's within the valid range or not).  And if your class is overly sensitive to being moved (allows pointers to itself to be made and doesn't watch them)  your class should come from the heap and be known by reference.

typedef uint intcontaineriterator;
int intcontainer[cast(intcontaineriterator)100];
//typedef typeof(intcontainer.containedtype) intcontaineriterator;
intcontaineriterator begin = cast(intcontaineriterator)0;
intcontaineriterator end = cast(intcontaineriterator)intcontainer.length;
for (intcontaineriterator i = begin; i != end; ++i)
  printf("%d,"intcontainer[i]);

Value objects (int, struct) should go on the stack
Reference objects (class) should go from a heap (somewhere) but who decides
where they come from?  The class itself doesn't know because derived classes
might need to come from somewhere else.  It's inconvenient for the user of
the class to know but that's how it's mostly handled now.  Perhaps you could
always make objects from the main heap but alternatively could replace said
allocation with one of your own and a matching delete.  Which also at the
same time prevents the builtin garbage collector from working on it and
prevents values of that type from being constructed on the stack (turns it
into a reference object)

There must be some way to prove at compile time that each and every object ever allocated must be at some point destroyed, that the flow path always leads to a destructor or deallocator call.  Especially if it were allowed to insert arbitrary calls to clean house unless otherwise instructed.  Think of this as "don't worry about this... I'm going to take care of this garbage" will then allow the address-of operator to be used and allow the value to be passed by reference.

In fact taking the address of or passing by reference both implicitly convert a  value type into a reference type.  If this ever happens in a scope, the value is no longer an auto.  However the more explicit we make this the better so that leaks are avoided.  But not too explicit since we don't want to make normal heap class allocation involve too much typing.

Maybe this whole heap vs. stack question could be settled by having user-defined allocation operators, and you can choose for a class which allocator to use.  For builtins and structs it defaults to the stack allocator (calloc).  For class objects it defaults to the heap allocator (gc / malloc).  You could override for a class to make them come from some other memory.  Or to be accessed via a "forwarding" interface of some kind (if its data is not accessible from outside, the only way to communicate is via methods.  A object not accessible by the main cpu couldn't have public data members?  But that's up to the user.)

Realloc is great for performance but it's just unsafe when the objects have pointers (possibly to each other or possibly held from outside).

Has anyone ever thought of building a stack that grows into the heap? Surely someone has thought of that.

void growstack(uint size)
{
  if (stack.room < size)
  {
    malloc some more memory for the stack.  1.5x current total size
    Change stack pointer to top of new memory
    push oldstackpointer on new stack and keep track of how to get back to
the old stack
  }
  else
     stackpointer -= size;
  return stackpointer;
}

This would assume stack cleanups always come in the reverse order of allocations.  setjmp/longjmp could screw this up.  But it makes the stack infinitely growable up to available memory (fixed size stacks suck)

This kind of logic could be inserted for each stack frame where recursion is possible (otherwise it should be mostly determinable at link time how much stack memory is needed by an application)

Sean

"Burton Radons" <loth@users.sourceforge.net> wrote in message news:3D0E80EC.9090708@users.sourceforge.net...
> anderson wrote:
> > That reminds me of what we used to do in C to speed up malloc/realloc.
Block
> > allocation is the key.
> >
> > Parhaps .length++ could increase by n (ie 10) values or n% (ie 2%) and
then
> > it'd only need to update when it gets passed that number again.
>
> Python uses a specific function for overallocation that they claim gives linear reallocation speed on poorly performing reallocs, such as, uh, Windows'.  I stuffed that in my realloc wrapper function so now I just reallocate everything up by one, secure in the knowledge that it'll be fast.  The function is:
>
>      static int
>      __roundupsize (int n)
>      {
>          unsigned int nbits = 0;
>          unsigned int n2 = (unsigned int) n >> 5;
>
>          do
>          {
>              n2 >>= 3;
>              nbits += 3;
>          }
>          while (n2);
>
>          return ((n >> nbits) + 1) << nbits;
>      }
>
> Python's source goes into its behaviour and results and I can dig that up if it's wanted.
>
> Arrays could be given a third integer for maximum that's set to length in slices and static arrays but is overallocated otherwise.  Once it tops it, allocate the new array and copy it over.  Resizing to lower lengths causes maximum to equal length, as it's essentially a slice operator.
>
> One real problem that comes up is in multiple array references:
>
>      a = foo; /* foo is a dynamic array */
>      b = foo;
>
>      a ~= 4;
>      b ~= 8;
>
>      a [a.length - 1] == 8 or == 4 depending upon whether (maximum !=
> length) before the append.
>
> One solution would be that when making a reference copy of an array the maximum field is set to length, so that a and b always get their own copy when appending.
>
> I can't think of any other problems, but Walter would certainly know what could go wrong.
>


June 18, 2002
So don't put them in.

Sean

"Walter" <walter@digitalmars.com> wrote in message news:aeluv6$o6r$1@digitaldaemon.com...
>
> "Sean L. Palmer" <seanpalmer@earthlink.net> wrote in message news:aek6k2$1d57$1@digitaldaemon.com...
> > Isn't that gonna make a lot of other writeable properties hard to use?
> The
> > syntax is shared for all properties (including user-defined ones, if I'm right) so it seems to make sense to make it flexible for use by user properties which should work just like normal variables correct?
>
> I think the .length property is going to wind up being a special case. (I
> hate special cases.)



June 18, 2002
You can't predict exactly how much memory you need alot of the time.  But
you can give a rough estimate and the compiler can try its best to help if
you're being a moron and allocating hundreds of things one-by-one.
I'd rather the language be kinda smart about guesstimating the array
reallocation anyway because sometimes you just can't know how often
something will be used and don't have access to rewrite the algorithm that
calls your allocation function.  If you write a library that makes foo's,
someone will write a loop foo*a []; for (int i=0; i<1000; i++) a[a.length++]
= new foo;   There are just those kinds of people in the world (they're
called newbies, and they grow out of it eventually)

I don't want to reduce language functionality (or have a special case for length) just because somebody might misuse it.  Just make array preallocation smarter.  Add more info so that the GC has enough info to know how to do it properly if you must.  Behind-the-scenes baggage is ok so long as it makes your job as a programmer easier.  We all like code that works and runs fast.

Walter you have absolutely the worst job ever, which is the job of making everybody else's jobs easier.  Maybe by sometimes telling them to give it up and don't use ++ on array.length property.  ;)

Sean

"Walter" <walter@digitalmars.com> wrote in message news:aelvi3$onf$1@digitaldaemon.com...
>
> "anderson" <anderson@firestar.com.au> wrote in message news:aeks0b$24p8$1@digitaldaemon.com...
> > I don't agree. Any decent programmer with half a brain would beable to notice that one.
>
> I see that kind of code all the time, over and over. I saw it again
recently
> in a popular benchmark program. I've shown many people how they can
improve
> performance a thousand+ fold by preallocating the array.
>
> I believe that by supporting ++ on the .length, that would implicitly
bless
> that wrong technique.
>
>
> > Besides, what's going to stop them from doing ...
> >
> >     for (i = 0; i < n; i++)
> >     {
> >         array.length = array.length + 1;
> >         array[n] = foo();
> >     }
> >
> > or
> >
> >     for (i = 0; i < n; i++)
> >     {
> >         array.length = n + 1;
> >         array[n] = foo();
> >     }
>
> Absolutely nothing will prevent them from writing the loop that way.



June 18, 2002
On Mon, 17 Jun 2002 17:43:25 -0700 "Walter" <walter@digitalmars.com> wrote:

> I see that kind of code all the time, over and over. I saw it again recently in a popular benchmark program. I've shown many people how they can improve performance a thousand+ fold by preallocating the array.
> 
> I believe that by supporting ++ on the .length, that would implicitly bless that wrong technique.

But still... you do support operator ~= (add element to the end of
the array), don't you? ++ is essentially the same, it adds the element
and initializes it with default value for its type, so what's the
difference? We're talking of dynamic arrays, after all, I understand
that preallocation is a useful technique, but it is not always useable.
Since ~= is there, ++ should also be just for consistency! =)

Also, I wonder why there is no -- ? It would be used even wider, since it is practically a stack-pop operator...


June 18, 2002
On Mon, 17 Jun 2002 17:56:36 -0700 "Walter" <walter@digitalmars.com> wrote:

> That was the original plan, but due to the way array slicing works, there's no way for the GC to tell that there's "unused" space off the end of the array. Hence, it must reallocate and copy every time. The result is terrible performance - the cure is to preallocate the length you'll need, or use OutBuffer which will do block reallocation in suitable increments.

Does this mean that ~= also performs a reallocation each time? It would be inacceptible...