September 23, 2003
"Helmut Leitner" <helmut.leitner@chello.at> ha scritto nel messaggio news:3F6BF6C5.AB02DBDD@chello.at...

> Therefore, please consider the suggestion to add a property
>    .reserve
> to the array type in this way:
>    -  reserve can be read and set like length
>    -  reallocations happen according to max(reserve,length)
> advantage:
>    -  if length changes below or equal reserve,
>       no reallocations need to happen.

You have my vote! BTW, what if length grows above reserve? Should there also be a property to specify subsequent allocation "steps"? Say, reserve 100 elements at first, then grow by 10 at a time?

Ric


September 23, 2003
"Walter" <walter@digitalmars.com> wrote in message news:bko2u7$is2$1@digitaldaemon.com...
> Actually, the implementation of D arrays does have a bit of a 'reserve', since the garbage collector allocates by using power of 2 buckets. This is an implementation detail, though. It still makes for much more efficient code to set the .length to some reasonably expected reserve value, then
fill
> up the array, then reset the .length to the final size.
>

What about this:

char[] s = new char[100];
s = s[0 .. 5];

would .length changes create a new buffer or use that 100 up first?



September 23, 2003
"J C Calvarese" <jcc7@cox.net> wrote in message news:bko3oe$l8q$1@digitaldaemon.com...
> Walter wrote:
> > Actually, the implementation of D arrays does have a bit of a 'reserve', since the garbage collector allocates by using power of 2 buckets. This
is
> > an implementation detail, though. It still makes for much more efficient code to set the .length to some reasonably expected reserve value, then
fill
> > up the array, then reset the .length to the final size.
> But how do you keep up with the currently-filled length using this method?  I'm guessing you have an extra variable whose use may not be obvious.  I'd rather use .length for the current length and use .reserve to set a reasonable expected length.  I think Helmut's idea makes a lot of sense.  Would it be difficult to implement?

After setting .length to the max reasonable size, then set .length back to zero. You now have a reserve that is something like the max reasonable size rounded up to the next bucket size.


September 23, 2003
"Helmut Leitner" <leitner@hls.via.at> wrote in message news:3F700109.7EDDD804@hls.via.at...
> Walter wrote:
> > Actually, the implementation of D arrays does have a bit of a 'reserve', since the garbage collector allocates by using power of 2 buckets.
> This will help little. Lets assume, someone decides to read a file line by line (or read it in one sweep but ends with a line in a string instead of a slice). The line will be reallocated hundredes of times.

No. For each line, maybe 2 to 4 times.

>   char line[];
>   line.reserve=1024;
> would set aside enough space so that reallocations would happen almost
never.

You can do the same thing with:
    char line[];
    line.length = 1024;
    line.length = 0;

Now fill the array.

> How does your dmd compiler avoid reallocation of the "current source
buffer"?

The compiler doesn't. The garbage collector runtime does. It does it by figuring out by the address which bucket it is in, and how much room is left in the bucket.

> This reserve thing is not theoretical. After 20 years of C-programming
I've
> based all my C dynamic array handling on a "generic" Buffer structure,
that
> looks like (translated to fit):

You're right. I use something almost exactly the same in my C programming. But with the garbage collector, I don't need to carry around the extra 'reserve' size, as it is implicit in the address of the buffer.

> An example of this conflict of interest is the wc official sample program,
> which will perform  but not scale, because
>         if (inword)
>         {   char[] word = input[wstart .. input.length];
>             dictionary[word]++;
>         }
> will keep the basic input buffer (the whole file) in memory. So you would
> be surprised about this code, if you would extend it to count the symbols
> in a larger set of files. It will eat all your memory. And it would be a
> good "test for experienced D programming capability" to change this to a
> performing and scaling example.

To process hundreds of megabytes of source with wc, just change the line to:
    char[] word = input[wstart .. input.length].dup;

> I also want to restate, that "basic IO" is still missing in D.
>
> This means, you can't write a tutorial that makes sense, because no sensible IO is available (you have to go back to printf and its friends)
that
> you want to show anyone new to the language.
>
> Why is this so? Because "char []" can't be used for IO in a performing
way!

I think the reason is more my sloth than that!

> So this decision and implementation has been pushed and pushed ....
> And you can't work with "char buffer[N]" because
>   - you can never be sure about the buffer size
>   - you will need conversions all the time
> The obvious solution, the final, powerful and performing String class
> is a pure vision. I suppose it will never come.
>
> .reserve would be a pragmatic solution to this.


September 23, 2003
"Vathix" <vathix@dprogramming.com> wrote in message news:bkp8j2$2pmi$1@digitaldaemon.com...
> "Walter" <walter@digitalmars.com> wrote in message news:bko2u7$is2$1@digitaldaemon.com...
> > Actually, the implementation of D arrays does have a bit of a 'reserve', since the garbage collector allocates by using power of 2 buckets. This
is
> > an implementation detail, though. It still makes for much more efficient code to set the .length to some reasonably expected reserve value, then
> fill
> > up the array, then reset the .length to the final size.
> What about this:
>
> char[] s = new char[100];
> s = s[0 .. 5];
>
> would .length changes create a new buffer or use that 100 up first?

The latter. It would not reallocate it.


September 23, 2003
Walter wrote:
> After setting .length to the max reasonable size, then set .length back to
> zero. You now have a reserve that is something like the max reasonable size
> rounded up to the next bucket size.

Hmmm. Does that mean that the amount of memory used up by arrays can never decrease?

I think you should at least leave the door open for later optimization of the array memory handling so that some of the memory is freed if the used length is below a certain threshold.

That would mean, though, that the kind of reserving memory that you proposed may not work in the future. Increasing the length and then decreasing it again also looks a bit like a dirty hack - the semantics depend on the internal compiler architecture. Nothing I would want to use and certainly nothing that is easy to understand for an uninitiated programmer reading someone else's code.

Is there a specific reason why you don't want to expose the reserve field? It looks like a much cleaner solution to me.

Hauke




September 23, 2003
In article <bkq1lf$r5v$1@digitaldaemon.com>, Hauke Duden wrote:
> Walter wrote:
>> After setting .length to the max reasonable size, then set .length back to zero. You now have a reserve that is something like the max reasonable size rounded up to the next bucket size.
> 
> That would mean, though, that the kind of reserving memory that you proposed may not work in the future. Increasing the length and then decreasing it again also looks a bit like a dirty hack - the semantics depend on the internal compiler architecture.

> Is there a specific reason why you don't want to expose the reserve field? It looks like a much cleaner solution to me.

reserve(int i)
{
    int old_length = length;
    length = i;
    length = old_length;
}

Voilà! A self-documenting, reusable solution.

-Antti
September 23, 2003

Walter wrote:
> > An example of this conflict of interest is the wc official sample program,
> > which will perform  but not scale, because
> >         if (inword)
> >         {   char[] word = input[wstart .. input.length];
> >             dictionary[word]++;
> >         }
> > will keep the basic input buffer (the whole file) in memory. So you would
> > be surprised about this code, if you would extend it to count the symbols
> > in a larger set of files. It will eat all your memory. And it would be a
> > good "test for experienced D programming capability" to change this to a
> > performing and scaling example.
> 
> To process hundreds of megabytes of source with wc, just change the line to:
>     char[] word = input[wstart .. input.length].dup;

It may not be that simple, because I would hope that

      char[] word = input[wstart .. input.length];
      if(dictionary[word]!=NULL) {
          dictionary[word]++;
      } else {
          dictionary[word.dup]++;
      }

performs better, because only those words are created as objects that are actually needed.

But it may also be, that

      char[] word = input[wstart .. input.length];
      int *p=dictionary[word];
      if(p) {
          dictionary[word]++;
      } else {
          dictionary[word.dup]++;
      }

performs even better, because it avoids an hash access that we can't be sure about whether the compiler optimizes it away or not.

-- 
Helmut Leitner    leitner@hls.via.at
Graz, Austria   www.hls-software.com
September 23, 2003
Sorry for:

Helmut Leitner wrote:
> But it may also be, that
> 
>       char[] word = input[wstart .. input.length];
>       int *p=dictionary[word];
>       if(p) {
>           dictionary[word]++;

This should read:

            *p++;

>       } else {
>           dictionary[word.dup]++;
>       }
> 
> performs even better, because it avoids an hash access that we can't be sure about whether the compiler optimizes it away or not.


-- 
Helmut Leitner    leitner@hls.via.at
Graz, Austria   www.hls-software.com
September 23, 2003
> You can do the same thing with:
>     char line[];
>     line.length = 1024;
>     line.length = 0;
>
> Now fill the array.
>

A question: Does the GC respects any number of buckets allocated like this?

char line1[];
line1.length = 4096;
line1.length = 0;

char line2[];
line2.length = 4096;
line2.length = 0;

I mean, if I reserve more than 1 bucket (of 1024) will still be available after allocating another array?