Jump to page: 1 24  
Page
Thread overview
resizing arrays
Sep 06, 2002
Walter
Sep 07, 2002
anderson
Sep 07, 2002
Walter
Sep 07, 2002
Burton Radons
Sep 07, 2002
Walter
Sep 08, 2002
Sean L. Palmer
Sep 08, 2002
Walter
Sep 07, 2002
Pavel Minayev
Sep 09, 2002
Russell Lewis
Sep 09, 2002
Walter
Sep 10, 2002
Derek Parnell
Sep 10, 2002
Walter
Sep 10, 2002
Pavel Minayev
Sep 10, 2002
Walter
Sep 10, 2002
Sean L. Palmer
Sep 10, 2002
Walter
Sep 10, 2002
Pavel Minayev
Sep 10, 2002
Walter
Sep 11, 2002
Derek Parnell
Sep 13, 2002
Walter
Sep 09, 2002
Patrick Down
Sep 10, 2002
Walter
Sep 10, 2002
Sean L. Palmer
Sep 10, 2002
Walter
Sep 10, 2002
Pavel Minayev
Sep 10, 2002
Patrick Down
Sep 10, 2002
Walter
Sep 10, 2002
Patrick Down
Sep 11, 2002
Walter
Sep 11, 2002
Patrick Down
Sep 11, 2002
Pavel Minayev
September 06, 2002
Everyone has pointed out that the current implementation of array resizing is too slow to be useful - the always copy semantics just aren't good enough. Various remedies were proposed, most of which added bloat or slowed down loops. The reason for the copy semantics to begin with were to deal with the cases of resizing a slize out of a larger array.

So, perhaps this can be fixed by just a specification change - don't resize slices. While this may seem wrong, it was already the case. For example:

char[] a;
char[] b = a[0..3];

Now, let's resize b. Do a[0..3] and b[0..3] still point to the same memory? Maybe yes, maybe no, depending on the implementation.

If we write the rule as: If any slices are made of an array, and either the original array or any of the slices are resized, the resulting relationship of the resized array with the original layout is implementation defined.

What that means is, if you're going to resize an array and there are slices, make a copy of the array with .dup before resizing it, so that there are no slices on it.

(Note: since resizing does not free memory, there isn't a problem with dangling references to free'd memory. There is also no issue with a possible interior pointers only to an array, the gc is designed to regard interior pointers as valid references to an object.)


September 07, 2002
"Walter" <walter@digitalmars.com> wrote in message news:alb8qn$1pea$1@digitaldaemon.com...
> So, perhaps this can be fixed by just a specification change - don't
resize
> slices. While this may seem wrong, it was already the case. For example:
>
> char[] a;
> char[] b = a[0..3];

I suggested, something simular to this before, and I like the idea. One vote from me.


September 07, 2002
Walter wrote:

> Everyone has pointed out that the current implementation of array resizing
> is too slow to be useful - the always copy semantics just aren't good
> enough. Various remedies were proposed, most of which added bloat or slowed
> down loops. The reason for the copy semantics to begin with were to deal
> with the cases of resizing a slize out of a larger array.


I've done a mano a mano comparison on endemic access of the length property.  Using my code generator, there's a large amount of variability for both sides.  Converting the code to C with GCC, it's a simple "I don't give a damn".  I can only force a difference if I make the length volatile, which is a silly handycap.  Length can be optimised same as any other struct, only moreso.

You claim that length is dirtied too often to be viably sequestered away by the optimiser.  No it isn't.  Passing an array _pointer_ is rare, passing as an _out_ parameter is rare.  Passing an array itself is common, but if the array is resized in there then it obviously won't alter the array the loop is dealing with and the code generator doesn't need to deal with it.  It is just not common for the array being looped on to be dirtied; good optimisation material.

To put it another way, if your code generator pretends to optimise but creates significantly worse code using the ownership bit than without outside of fixed, unrealistic tests, it's broken.


> So, perhaps this can be fixed by just a specification change - don't resize
> slices. While this may seem wrong, it was already the case. For example:
> 
> char[] a;
> char[] b = a[0..3];
> 
> Now, let's resize b. Do a[0..3] and b[0..3] still point to the same memory?
> Maybe yes, maybe no, depending on the implementation.
> 
> If we write the rule as: If any slices are made of an array, and either the
> original array or any of the slices are resized, the resulting relationship
> of the resized array with the original layout is implementation defined.
> 
> What that means is, if you're going to resize an array and there are slices,
> make a copy of the array with .dup before resizing it, so that there are no
> slices on it.


I don't understand how this helps your situation.  What do you intend to do with the change?  For that matter, could you spell out what this does change?  The standard is fuzzy here.

September 07, 2002
Yes, I think it was you!

"anderson" <anderson@firestar.com.au> wrote in message news:albudv$tgr$1@digitaldaemon.com...
>
> "Walter" <walter@digitalmars.com> wrote in message news:alb8qn$1pea$1@digitaldaemon.com...
> > So, perhaps this can be fixed by just a specification change - don't
> resize
> > slices. While this may seem wrong, it was already the case. For example:
> >
> > char[] a;
> > char[] b = a[0..3];
>
> I suggested, something simular to this before, and I like the idea. One
vote
> from me.
>
>


September 07, 2002
"Burton Radons" <loth@users.sourceforge.net> wrote in message news:3D799369.5010305@users.sourceforge.net...
> You claim that length is dirtied too often to be viably sequestered away by the optimiser.

Not exactly, I said it was hard for the optimizer to prove that the length has not changed within the loop. Pulling out loop invariants is standard for advanced optimizers, but it isn't easy to write them.

> To put it another way, if your code generator pretends to optimise but creates significantly worse code using the ownership bit than without outside of fixed, unrealistic tests, it's broken.

Check this out. Here's some C code:
------------------------------------------
struct Array
{
    int length;
    char *ptr;
};

int loop(struct Array a)
{
    int i;
    int sum = 0;

    for (i = 0; i < (a.length & 0x7FFFFFFF); i++)
       sum += a.ptr[i];
    return sum;
}
-------------------------------------------
Compiled without optimization:

_loop:
  enter 8,0
  push EBX
  push ESI
  xor EAX,EAX
  mov -4[EBP],EAX
  mov -8[EBP],EAX
LE:  mov ECX,8[EBP]
  and ECX,07FFFFFFFh
  cmp ECX,-8[EBP]
  jle L2E
  mov EBX,0Ch[EBP]
  mov ESI,-8[EBP]
  movsx EDX,[ESI][EBX]
  add -4[EBP],EDX
  inc dword ptr -8[EBP]
  jmp short LE
L2E:  mov EAX,-4[EBP]
  pop ESI
  pop EBX
  leave
  ret
----------------------------------
Compiled with optimization:

_loop:
  push EAX
  mov EAX,8[ESP]
  xor ECX,ECX
  push EBX
  xor EDX,EDX
  and EAX,07FFFFFFFh
  push ESI
  cmp EAX,ECX
  push EDI
  jle L26
  mov EBX,EAX
L17:  mov EDI,018h[ESP]
  movsx ESI,[EDI][EDX]
  add ECX,ESI
  inc EDX
  cmp EBX,EDX
  jg L17
L26:  pop EDI
  mov EAX,ECX
  pop ESI
  pop EBX
  pop ECX
  ret
-----------------------------
So, you see, it does pull the mask out of the loop. The trouble is, it took me years to get all that to work right without bugs - and it took years for other C compiler vendors to catch up with me and implement the equivalent.

The next problem is if the dynamic array is actually a reference, such as if it's inside a class definition, it becomes almost impossible to prove the .length is loop invariant. Every assignment through a pointer and every function call must be assumed to invalidate all referenced data.


September 07, 2002
Walter wrote:

> What that means is, if you're going to resize an array and there are slices,
> make a copy of the array with .dup before resizing it, so that there are no
> slices on it.

My vote goes here!

September 08, 2002
Pavel Minayev wrote:

> Walter wrote:
>
> > What that means is, if you're going to resize an array and there are slices, make a copy of the array with .dup before resizing it, so that there are no slices on it.
>
> My vote goes here!

Perhaps arrays should have a "been sliced" bit, so that the .dup on resizing a sliced array could be implicit.  It would also be good if compile-time and run-time tests could detect and remark upon attempts to resize sliced arrays then go ahead and do an automatic .dup followed by the resize.

It seems to be the flip side of GC, where instead of reclaiming unused memory, D would allocate extra memory for you if you forget to do so manually.  Just as run-time array index checking can be disabled in D, so should the warning issued with an attempt to resize a sliced array.

For me, the main use would be to ensure I craft lean&mean code, since I intend to use D in embedded systems (where the GC would largely be idle).  However, if I have a bug in my system where I do attempt to resize a sliced array, I would still want the operation to succeed in the safest way possible, even if it means allocating extra memory.

If anything more sophisticated than a "been sliced" bit is needed, then the data shouldn't be in a "naked" array, but instead should be a properly designed object with an appropriate set of methods to completely control allocation, use, and reallocation.


-BobC


September 08, 2002
That's what I think sucks about C and C++;  the language assumes the worst. I'd rather it be more restrictive about what you can do and lean toward allowing the optimizer more flexibility.  Gently tell the programmer not to write wicked tricky crap.  ;)   Or if they persist, insist on them saying exactly what they're doing explicitly so the optimizer can tell exactly what's going on.

Sean

"Walter" <walter@digitalmars.com> wrote in message news:alcb1f$255r$1@digitaldaemon.com...
>
> "Burton Radons" <loth@users.sourceforge.net> wrote in message news:3D799369.5010305@users.sourceforge.net...
> > You claim that length is dirtied too often to be viably sequestered away by the optimiser.
>
> Not exactly, I said it was hard for the optimizer to prove that the length has not changed within the loop. Pulling out loop invariants is standard
for
> advanced optimizers, but it isn't easy to write them.
>
> > To put it another way, if your code generator pretends to optimise but creates significantly worse code using the ownership bit than without outside of fixed, unrealistic tests, it's broken.
>
> Check this out. Here's some C code:
> ------------------------------------------
> struct Array
> {
>     int length;
>     char *ptr;
> };
>
> int loop(struct Array a)
> {
>     int i;
>     int sum = 0;
>
>     for (i = 0; i < (a.length & 0x7FFFFFFF); i++)
>        sum += a.ptr[i];
>     return sum;
> }
> -------------------------------------------
> Compiled without optimization:
>
> _loop:
>   enter 8,0
>   push EBX
>   push ESI
>   xor EAX,EAX
>   mov -4[EBP],EAX
>   mov -8[EBP],EAX
> LE:  mov ECX,8[EBP]
>   and ECX,07FFFFFFFh
>   cmp ECX,-8[EBP]
>   jle L2E
>   mov EBX,0Ch[EBP]
>   mov ESI,-8[EBP]
>   movsx EDX,[ESI][EBX]
>   add -4[EBP],EDX
>   inc dword ptr -8[EBP]
>   jmp short LE
> L2E:  mov EAX,-4[EBP]
>   pop ESI
>   pop EBX
>   leave
>   ret
> ----------------------------------
> Compiled with optimization:
>
> _loop:
>   push EAX
>   mov EAX,8[ESP]
>   xor ECX,ECX
>   push EBX
>   xor EDX,EDX
>   and EAX,07FFFFFFFh
>   push ESI
>   cmp EAX,ECX
>   push EDI
>   jle L26
>   mov EBX,EAX
> L17:  mov EDI,018h[ESP]
>   movsx ESI,[EDI][EDX]
>   add ECX,ESI
>   inc EDX
>   cmp EBX,EDX
>   jg L17
> L26:  pop EDI
>   mov EAX,ECX
>   pop ESI
>   pop EBX
>   pop ECX
>   ret
> -----------------------------
> So, you see, it does pull the mask out of the loop. The trouble is, it
took
> me years to get all that to work right without bugs - and it took years
for
> other C compiler vendors to catch up with me and implement the equivalent.
>
> The next problem is if the dynamic array is actually a reference, such as
if
> it's inside a class definition, it becomes almost impossible to prove the .length is loop invariant. Every assignment through a pointer and every function call must be assumed to invalidate all referenced data.
>
>


September 08, 2002
"Sean L. Palmer" <seanpalmer@earthlink.net> wrote in message news:algcfk$2khe$1@digitaldaemon.com...
> That's what I think sucks about C and C++;  the language assumes the
worst.
> I'd rather it be more restrictive about what you can do and lean toward allowing the optimizer more flexibility.  Gently tell the programmer not
to
> write wicked tricky crap.  ;)   Or if they persist, insist on them saying exactly what they're doing explicitly so the optimizer can tell exactly what's going on.

Yeah, I've learned that through bitter experience trying to optimize C. You've always got to assume the worst. You can't even make use of const as being constant (one of the reasons I dropped const as a modifier in D, it looks good but is quite useless. The same goes for volatile.)


September 09, 2002
Walter wrote:
> Everyone has pointed out that the current implementation of array resizing
> is too slow to be useful - the always copy semantics just aren't good
> enough. Various remedies were proposed, most of which added bloat or slowed
> down loops. The reason for the copy semantics to begin with were to deal
> with the cases of resizing a slize out of a larger array.
> 
> So, perhaps this can be fixed by just a specification change - don't resize
> slices. While this may seem wrong, it was already the case. For example:
> 
> char[] a;
> char[] b = a[0..3];
> 
> Now, let's resize b. Do a[0..3] and b[0..3] still point to the same memory?
> Maybe yes, maybe no, depending on the implementation.
> 
> If we write the rule as: If any slices are made of an array, and either the
> original array or any of the slices are resized, the resulting relationship
> of the resized array with the original layout is implementation defined.
> 
> What that means is, if you're going to resize an array and there are slices,
> make a copy of the array with .dup before resizing it, so that there are no
> slices on it.
> 
> (Note: since resizing does not free memory, there isn't a problem with
> dangling references to free'd memory. There is also no issue with a possible
> interior pointers only to an array, the gc is designed to regard interior
> pointers as valid references to an object.)
> 
> 

I've pondered what the semantics of a "copy on write" array would be. COW is a virtual memory device where two virtual pages can point to the same physical page as long as neither is modified.  This is very common after fork()ing a process...both processes have identical page tables until one or the other writes.  The write fails (page fault) and the memory manager makes a second copy of the page in question, which is writable, and then re-enables the process.

It would be cool to have an array type such that you would be referencing the original array until you tried to modify it, either by resizing it or changing one of the elements.  I don't know what the syntax or usage would look like, though.

« First   ‹ Prev
1 2 3 4