Possible new COW/copy suggestions?

August 21, 2010
Posted by Era Scarecrow
Permalink
Era Scarecrow
Permalink
   I was reading the book on D by Andrei Alexandrescu, and it suddenly occurred to me, perhaps there should be a couple special case copy methods for Copy-on-write (COW) which work on arrays only. (on single variables it does nothing special, since changes would just replace the variable's contents). I have a copying suggestion for structures.

  You _can_ live without these, but they would make certain tasks and cases a lot less repetitive and error prone.


  For arrays using COW, I'm using DMD's toupper function as a reference for how this would work/affect code. http://www.digitalmars.com/d/2.0/memory.html

--Strings (and Array) Copy-on-Write

char[] toupper(char[] s)
{
    int i;

    for (i = 0; i < s.length; i++)
    {
	char c = s[i];
	if ('a' <= c && c <= 'z')
	    s[i] = c - (cast(char)'a' - 'A');
    }
    return s;
}

  In a later example walter used would definitely work, but what if the compiler did most of the work for us? Say, adding a keyword like cowref? Then the only visible change would be in the definition signature.

char[] toupper(cowref char[] s)

  Internally it would add a flag, so just before it changes the the array, it would check the flag and if it hasn't been done yet, makes a duplicate copy. With this in mind, it can be treated as an (const/in) to calling functions and thought of as inout inside the function, this allows accepting of const/immutable data. These could be a permanent change in how arrays work for these features too, or maybe a subtype of array for these specific calls.

char[] toupper(cowref char[] s)
{
    bool __cow_s = true;
    int i;

    for (i = 0; i < s.length; i++)
    {
	char c = s[i];
	if ('a' <= c && c <= 'z') {
            if (__cow_s) {
               /*make copy*/
                __cow_s = false
            }
	    s[i] = c - (cast(char)'a' - 'A');
        }
    }
    return s;
}

 For optimization involving only one cowref, the compiler may end up making two copies of the function with a additional label/goto so when it would be able to modify the code the first time, it would copy and then branch to the copy so the check isn't done on every pass. ex:


char[] toupper(cowref char[] s)
{
    int i;

    for (i = 0; i < s.length; i++)
    {
	char c = s[i];
	if ('a' <= c && c <= 'z') {
            /*changes made in this scope, everything but the array copying
              is removed. */
            goto __cowref_jump;
        }
    }
    return s;

    /*only copies code it can possibly return to, in a loop or goto jumps*/
    for (; i < s.length; i++)
    {
	char c = s[i];
	if ('a' <= c && c <= 'z') {
/*continue point at start of scope*/
__cowref_jump:
	    s[i] = c - (cast(char)'a' - 'A');
        }
    }
    return s;
}

  Second thought is for when you want to refer to the original array, but only copy specific elements (rather than the whole array) forward. This would be useful especially when doing sector referencing of 512 bytes or larger as an individual block. Perhaps cowarray would be used. The array would work normally, but with only a couple extra lookups. This would also accept const/immutable data.

char[] toupper(cowarray char[] s)
{
//if known it's returning the array, it might precopy the original.
//but if it does that, the bool change array is probably unneeded unless
//you need to know if specific parts of the array were changed. Which
//means it may just become a cowref instead of a cowarray.
//bool still needed for multi-dimensional arrays.
    bool[] __cowarr_change_s = new bool[s.length];
    char[] __cowarr_arr_s;

    int i;

    for (i = 0; i < s.length; i++)
    {
//if changed, use change
//If the compiler sees it will never go other this again, it may
//skip this check and just read.
	char c = __cowarr_change_s[i] ? __cowarr_arr_s[i] : s[i];
//precopy
//	char c = __cowarr_change_s[i];

	if ('a' <= c && c <= 'z') {
//change and ensure it's changed on the flag.
	    __cowarr_arr_s[i] = c - (cast(char)'a' - 'A');
            __cowarr_change_s[i] = true;
        }
    }

   /*when copying out or to another array or duplicating, the current view
     is used without the cow part active.*/
    return s;
}

  If you needed to know if it changed on that block, perhaps .changed can be used and the compiler would return the true/false.
  if(s[i].changed) { /*code/*
//becomes
  if(__cowarr_change_s[i]) { /*code*/

  Finally, the last suggestion involves structure copying. When copying a structure it does a bitwise copy, however when you work with references to arrays/structures/classes, you may want to make a duplicate rather than refer to the original.

//Book example, pg 246
struct Widget {
   private int[] array;
   this(uint length) {
      array = new int[length];
   }
   // Postblit constructor
   this(this) {
      array = array.dup;
   }
   /*other code*/
}

  Perhaps a keyword like oncopy(copy function defaults to dup) or onstructcopy(<-same) can be used. the compiler would gather all the oncopy's and make a default this(this) using them. If you need anything more complicated/extra during the copy, your definition of this(this) would execute after the compiler built one (appended to the compiler generated one.) Ex:

struct Widget {
//   private oncopy(dup) int[] array;
//       Name of function (is/could be) optional if the function dup
//       is used to create a copy. might be used as oncopy!(dup)
   private oncopy int[] array;

     this(this) {
         //compiler generated oncopy's
            array = array.dup; //dup is the copy name, which could be clone or something else.

         // User definition (if any) Appended here.
     }

   /*other code*/
}

  Naturally, immutable data doesn't need to copy since it doesn't change; however if it does change during the copy the user would likely end up doing it manually, so using oncopy on immutable data would cause an error.

 Comments and suggestions? I'd like to hear Walter's feedback and opinions on these.

 Era
Forums