June 19, 2002
Sean L. Palmer wrote:
> Those are our 3 choices?

As far as I know.  Array slices that copy aren't a real option, as:

   b [0 .. n] = c;

Will no longer work as expected... it'll make a slice of the array and then clear it to a single value, so it fails many assumptions of the standard.  But it would allow realloc to be used.

Uh, the high bit of length could be used to indicate that the array is a static, a slice, or an assignment.  When resizing to a larger size, it's reallocated if the bit is set or if the length will cross a roundup boundary.  That's a fourth choice.

anderson described a fifth (and you'd only need five bits to do up to four gig length, so arrays of up to 134217728 length could be represented).  I personally think exponential reallocation wastes too much space - on average it'll make the array 50% too large - and this only works with one overallocation method.

> I'll take the maximum field.  Useful for doing a sort of "reserve"
> operation.  After that, ~= and array.length++ aren't nearly so much of a
> problem.  The only problem is that dynamic arrays then need 4 more bytes of
> storage.  Well ram is cheap, and getting cheaper.  ;)

It'll be much cheaper for equivalent code as there'll be fewer temporary copies running around dancing in tights.

printf ("%.*s", (char []) "foo"); won't work any more, so it shouldn't be done, if ever, until fmt is in and working, then printf should probably be removed from Phobos until all code can be fixed up.

Here's the roundup function I posted earlier with comments:

/* Round up:
 * If n <         8, to a multiple of        1. (My addition)
 * If n <       256, to a multiple of        8.
 * If n <      2048, to a multiple of       64.
 * If n <     16384, to a multiple of      512.
 * If n <    131072, to a multiple of     4096.
 * If n <   1048576, to a multiple of    32768.
 * If n <   8388608, to a multiple of   262144.
 * If n <  67108864, to a multiple of  2097152.
 * If n < 536870912, to a multiple of 16777216.
 * ...
 * If n < 2**(5+3*i), to a multiple of 2**(3*i).
 *
 * This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc() (which is a reality, e.g., across all flavors
 * of Windows, with Win9x behavior being particularly bad -- and
 * we've still got address space fragmentation problems on Win9x
 * even with this scheme, although it requires much longer lists to
 * provoke them than it used to).
 * (From Python (http://www.python.org) source code)
 */

int roundupsize(int n)
{
    if (n < 8)
        return n;

    uint nbits = 0;
    uint n2 = (uint) n >> 5;

    do {
        n2 >>= 3;
        nbits += 3;
    } while (n2);
    return ((n >> nbits) + 1) << nbits;
}

June 19, 2002
"Matthew Wilson" <matthew@thedjournal.com> wrote in message news:aeorj3$lh8$1@digitaldaemon.com...
> All sounds a bit inefficient from a performance point of view. :/

I'd aggree to that, but as a programmer I always like to look at alternative.

A small performance hit. Often there is a toss up with effeciency and
memory. Note
I was able to do the afore mentioned in C with a few or's, a shift and a
plus, but that my be optimised with todays proccessors. One performace gain
is that it would mean less memory to proccess in the CPU. Often it's faster
to uncompress something in th CPU/RAM then to load twice the size from
RAM/hardrive, but that's another story.

I'd probably go for the extra 32-bits because, it'll make little diffence
when
compared to the magitude of arrays and also compared to the amount of memory
your already wasting to make things more effecient anyway. For example max
of 10 32-bits with only 8 used, what's an extra 32-bits, nothing. And the
aim is performace anyway.



June 19, 2002
"Burton Radons" <loth@users.sourceforge.net> wrote in message news:3D100B21.1040604@users.sourceforge.net...
> Sean L. Palmer wrote:
> > Those are our 3 choices?
>

> anderson described a fifth (and you'd only need five bits to do up to four gig length, so arrays of up to 134217728 length could be represented).  I personally think exponential reallocation wastes too much space - on average it'll make the array 50% too large - and this only works with one overallocation method.

I also think 50% is to large. It just needs to be tweeked utill a good set values fall out.  The extra value (offset) only needs to contain where the cutoff point is so it can simply be added like so. If the value is exceeded, then it's simply rapped around (offset -1). I don't think this is coming out very clear. Or parhaps it is?

Alternativly, you can get way with no max_number at all. Simply look at the length number and decide from that if it needs to increased. If it's linear this can be extreamly easy.

Persudo
length++;

//The actual array size is
Actualsize = length + length mod Increment;

//So that if (actualsize changes)
if ((length mod Increment) == 0)
    realloc Actualsize

But divides can be expensive.... but there are plenty of ways around divides.


June 19, 2002
On Wed, 19 Jun 2002 09:02:43 +1000 "Matthew Wilson" <matthew@thedjournal.com> wrote:

> Seems like if we give the arrays a notion of capacity, all these sophistications (including the previously unliked ++) could be accomodated efficiently.
> 
> Any reasons against?

None! An additional int field to store the "actual" size of the array is worth it. STL uses such approach, after all, and it proved its efficiency already, why not use it in D as well? Walter?
June 19, 2002
"anderson" <anderson@firestar.com.au> wrote in message news:aened5$27vo$1@digitaldaemon.com...
>
> *sigh*
>
> But of coarse I was discussing the exceptions to this rule. ie when you
are
> collecting or grouping data from an array and you have no idea what size
the
> array needs to be expanded to. If you increased it by n every time it
could
> be in-efficient if n = 10000 and the resize only needed to be 10. Also
versa
> versa.
>
> Sometimes things like this can be solved in two array passes, but when the results directly related to the movement of data in the arrays this it is impossible to use that method.

So use any other algorithm of your choice. I often double the array when exhausted, it's fast and easy.

Ciao


June 19, 2002
There are always reasons against, but probably none worth the advantages this would bring.

Disadvantages

1) Additional overhead (speed/ram) - Code that enlarges memory outside the loop will also have to have a check for capacity and possibly resize to a larger then needed array. In these cases the slightly extra time needed would be barley noticeable. Although probably not worth the effort, if capacity uses an entire 32-bits (or 16-bit offset) representation (ie not exponentially based) then perhaps the complier could treat.

//any array.length =
array.length = array.length + 10

as it currently does and make array.capacity = array.length

//and any of these things increase capacity offsets by a factor
++
+=
-=
--

Also it'd be nice to have access to .capacity (read and write). If capacity < length, then length would be trimmed.

2) Extra work for Walter.


PS - Has anyone thought of having a bit flag with arrays. Yes I know this is extra storage, perhaps it could be combine into the capacities 16-bit offset to make a 32-bits pack (16-bit capacity offset and 16-bits of array flag). The flags could be used for dirty bits and things like reversing arrays (but that would have problems with length changes).

Of course this type of stuff could be done by the programmer, and it's just a dumb though I had.


"Pavel Minayev" <evilone@omen.ru> wrote in message news:CFN37426561534456@news.digitalmars.com...
> On Wed, 19 Jun 2002 09:02:43 +1000 "Matthew Wilson"
<matthew@thedjournal.com>
> wrote:
>
> > Seems like if we give the arrays a notion of capacity, all these sophistications (including the previously unliked ++) could be
accomodated
> > efficiently.
> >
> > Any reasons against?
>
> None! An additional int field to store the "actual" size of the array is worth it. STL uses such approach, after all, and it proved its efficiency already, why not use it in D as well? Walter?


June 20, 2002
Pavel Minayev <evilone@omen.ru> wrote in news:CFN37426561534456@news.digitalmars.com:

> On Wed, 19 Jun 2002 09:02:43 +1000 "Matthew Wilson" <matthew@thedjournal.com> wrote:
> 
>> Seems like if we give the arrays a notion of capacity, all these sophistications (including the previously unliked ++) could be accomodated efficiently.
>> 
>> Any reasons against?
> 
> None! An additional int field to store the "actual" size of the array is worth it. STL uses such approach, after all, and it proved its efficiency already, why not use it in D as well? Walter?

I agree.

Another idea might be to store the capacity as part of the memory block header and leave the array as is.

[-1] = n
[0] <- array(ptr,length)
[1]
[2]
[3]
[...]
[length]
[...]
[n]

Unfortunately this doesn't work for slices.  Perhaps
the high order bit of length could be used as
a flag to indicate an array is a slice.

June 20, 2002
I would do the "proxy" approach to slices.  Make slices a distinct type that has extra information, and *knows* it's a slice of another array.  If used in a situation where the length can change, it copies itself into a new distinct array and is no longer a slice.  If you don't change the length it's very fast.

So arrays and array slice proxies have the same signature up to a point and yes perhaps the length field is the best place to store it since inability to change length or "copy if length changed" is the main difference between a slice and a regular array.

Sean

"Patrick Down" <pat@codemoon.com> wrote in message news:Xns92335878DD4B0patcodemooncom@63.105.9.61...
> Pavel Minayev <evilone@omen.ru> wrote in news:CFN37426561534456@news.digitalmars.com:
>
> > On Wed, 19 Jun 2002 09:02:43 +1000 "Matthew Wilson" <matthew@thedjournal.com> wrote:
> >
> >> Seems like if we give the arrays a notion of capacity, all these sophistications (including the previously unliked ++) could be accomodated efficiently.
> >>
> >> Any reasons against?
> >
> > None! An additional int field to store the "actual" size of the array is worth it. STL uses such approach, after all, and it proved its efficiency already, why not use it in D as well? Walter?
>
> I agree.
>
> Another idea might be to store the capacity
> as part of the memory block header and leave
> the array as is.
>
> [-1] = n
> [0] <- array(ptr,length)
> [1]
> [2]
> [3]
> [...]
> [length]
> [...]
> [n]
>
> Unfortunately this doesn't work for slices.  Perhaps
> the high order bit of length could be used as
> a flag to indicate an array is a slice.



June 20, 2002
Sean L. Palmer wrote:
> Why doesn't D specify what the behavior is for ++ and --, exactly?  Why
> leave this open to interpretation in the implementation?  That just creates
> subtle portability bugs.

C and D also don't define the order of subexpression evaluation in
function arguments or what order the functions in "a () + b () * c ()"
will be evaluated in.  The reasons are:

- These orders depend upon the style and behaviour of your parser.

- It depends upon what order the function arguments are built in.

- The machine code may be way more efficient in doing one order than in
doing the other, not just globally but locally to this one expression.

- Optimising often desires to change it, and can do much more in D than
in C.  The C rules were very confusing anyway.

- Any hard decision on subexpression evaluation order would be as
arbitrary as the next.

- The only people it affects are those who should be more careful programmers.

The comp.lang.c freaks are the ones to go to for a full set, but the
issues haven't changed from C to D: If something must be portably done
before or after something else it must be in a separate sequence or expression.

June 20, 2002
Yes!

"Sean L. Palmer" <seanpalmer@earthlink.net> wrote in message news:aet11q$1veg$1@digitaldaemon.com...
> I would do the "proxy" approach to slices.  Make slices a distinct type
that
> has extra information, and *knows* it's a slice of another array.  If used in a situation where the length can change, it copies itself into a new distinct array and is no longer a slice.  If you don't change the length it's very fast.
>
> So arrays and array slice proxies have the same signature up to a point
and
> yes perhaps the length field is the best place to store it since inability to change length or "copy if length changed" is the main difference
between
> a slice and a regular array.
>
> Sean
>
> "Patrick Down" <pat@codemoon.com> wrote in message news:Xns92335878DD4B0patcodemooncom@63.105.9.61...
> > Pavel Minayev <evilone@omen.ru> wrote in news:CFN37426561534456@news.digitalmars.com:
> >
> > > On Wed, 19 Jun 2002 09:02:43 +1000 "Matthew Wilson" <matthew@thedjournal.com> wrote:
> > >
> > >> Seems like if we give the arrays a notion of capacity, all these sophistications (including the previously unliked ++) could be accomodated efficiently.
> > >>
> > >> Any reasons against?
> > >
> > > None! An additional int field to store the "actual" size of the array is worth it. STL uses such approach, after all, and it proved its efficiency already, why not use it in D as well? Walter?
> >
> > I agree.
> >
> > Another idea might be to store the capacity
> > as part of the memory block header and leave
> > the array as is.
> >
> > [-1] = n
> > [0] <- array(ptr,length)
> > [1]
> > [2]
> > [3]
> > [...]
> > [length]
> > [...]
> > [n]
> >
> > Unfortunately this doesn't work for slices.  Perhaps
> > the high order bit of length could be used as
> > a flag to indicate an array is a slice.
>
>
>