December 11, 2012
On 12/10/2012 10:40 AM, Dan wrote:
> For something as simple as this would you introduce COW?

Yes.
December 11, 2012
On 12/10/2012 3:41 PM, Jonathan M Davis wrote:
> Yes. I've done it. And I don't think that I've ever bothered with COW. The
> extra complication isn't generally worth it as far as I'm concerned.

I don't believe it is any more complex than writing a postblit is.

December 11, 2012
On Tuesday, 11 December 2012 at 01:47:38 UTC, Walter Bright wrote:
> On 12/10/2012 10:40 AM, Dan wrote:
>> For something as simple as this would you introduce COW?
>
> Yes.

Any D examples showing the pattern in action with structs? I did not see anything in the index of TDPL on COW except one reference regarding strings. If it is a better approach I would like to know more about how it is done and the trade-offs. Some of the responses refer to having to check on every access, so is that not tough to manage?

Thanks
Dan
December 11, 2012
On Tuesday, 11 December 2012 at 03:09:53 UTC, Dan wrote:
> On Tuesday, 11 December 2012 at 01:47:38 UTC, Walter Bright wrote:
>> On 12/10/2012 10:40 AM, Dan wrote:
>>> For something as simple as this would you introduce COW?
>>
>> Yes.
>
> Any D examples showing the pattern in action with structs? I did not see anything in the index of TDPL on COW except one reference regarding strings. If it is a better approach I would like to know more about how it is done and the trade-offs. Some of the responses refer to having to check on every access, so is that not tough to manage?
>
> Thanks
> Dan

Here is a example of using COW for a simple object that wraps
nothing more than an int:

//----
struct S
{
     static struct Payload
     {
         size_t count;
         int val;
     }
     Payload* payload;

     this(int i = 0)
     {payload = new Payload(1, i);}

     this(this)
     {++payload.count;}
     ~this()
     {--payload.count;}

     void opAssign(S other)
     {
         --payload.count;
         payload = other.payload;
         ++payload.count;
     }

     int get()
     {return payload.val;}

     void set(int i)
     {
         dupeIfNeeded();
         payload.val = i;
     }
     void dupeIfNeeded()
     {
         if (payload.count > 1)
         {
             writeln("payload duplication");
             --payload.count;
             payload = new Payload(1);
         }
     }
}

void main()
{
     S s1 = 5;
     S s2 = s1;
     writeln("before set");
     s2.set(4);
     writeln("after set");
}
//----
The basic idea is that on every write, you check the payload
count, and if it is not 1, then you duplicate the payload.

This means you only pay for the copy when you *actually* need it.

Drawbaks from this approach include:
1. Needs actual code.
2. NOT actually cheap: The counter goes up, and down, on each and
every copy/destruction/pass: Requires a destructor and a postblit
for RAII.
3 Implements deep ownership (*)

*Imo, this is the worst part of COW for a language like D. In
theory, you could just allocate your payload, and let the GC and
default copy take care of everything else, and move on with your
life. COW is (IMO) a c++ relic. Not only is this approach easy,
but it is *also* efficient. If you *do* need actual object
duplication, then you can implement "dup".

//---------------------------------------
BTW: There is a subtle bug in this code. Can you spot it ;) ?
December 11, 2012
On Tuesday, 11 December 2012 at 07:19:07 UTC, monarch_dodra wrote:
> Here is a example of using COW for a simple object that wraps
> nothing more than an int:
>
[snip]
> This means you only pay for the copy when you *actually* need it.
>
> Drawbaks from this approach include:
> 1. Needs actual code.
> 2. NOT actually cheap: The counter goes up, and down, on each and
> every copy/destruction/pass: Requires a destructor and a postblit
> for RAII.
> 3 Implements deep ownership (*)
>
> *Imo, this is the worst part of COW for a language like D. In
> theory, you could just allocate your payload, and let the GC and
> default copy take care of everything else, and move on with your
> life. COW is (IMO) a c++ relic. Not only is this approach easy,
> but it is *also* efficient. If you *do* need actual object
> duplication, then you can implement "dup".
>

Thank you for the example. I see the benefit if performance is an issue ... but what happened to premature optimization? For example, the difference between the simple code below and a version with COW may be easy - but I don't think it is trivial. There is a fair amount of boilerplate bookeeping. What if the struct had two, three, or more maps: V1[K1], V2[K2], V3[K3]. Would you have three separate reference counter payloads, or just one payload with three maps? If you choose the latter you are still aggressively copying when unnecessary. It seems one could quickly take this to extremes. I did not understand the part about "In theory, you could just allocate your payload, and let the GC and default copy take care of everything else".

---
struct AddressBook {
  alias Address[string] EmailAddressMap;
  this(this) { _emailAddressMap = _emailAddressMap.dup; }
  void addAddress(string email, Address address) {
    _emailAddressMap[email] = address;
  }
  private EmailAddressMap _emailAddressMap;
}
---


> //---------------------------------------
> BTW: There is a subtle bug in this code. Can you spot it ;) ?

For me it did not compile (No constructor for payload), maybe you had a ctor for Payload? I think opAssign is not necessary - if I remove it it still seems to work since default opAssign calls postblit. I added the ctor, removed opAssign and put it here: http://dpaste.dzfl.pl/6ecfe675

Other than that - is there still a subtle bug?

Thanks,
Dan
December 11, 2012
On 12/11/12 2:19 AM, monarch_dodra wrote:
> On Tuesday, 11 December 2012 at 03:09:53 UTC, Dan wrote:
>> On Tuesday, 11 December 2012 at 01:47:38 UTC, Walter Bright wrote:
>>> On 12/10/2012 10:40 AM, Dan wrote:
>>>> For something as simple as this would you introduce COW?
>>>
>>> Yes.
>>
>> Any D examples showing the pattern in action with structs? I did not
>> see anything in the index of TDPL on COW except one reference
>> regarding strings. If it is a better approach I would like to know
>> more about how it is done and the trade-offs. Some of the responses
>> refer to having to check on every access, so is that not tough to manage?
>>
>> Thanks
>> Dan
>
> Here is a example of using COW for a simple object that wraps
> nothing more than an int:
[snip]

Walter and I discussed that it should be possible to automate the dupIfNeeded() (I call it ensureUnique()) calls like this:

* If a method is const or immutable, leave as is
* For all other methods, insert a ensureUnique() automatically in the prolog code

For this to work, the state must be private and all primitives must be implemented via methods (as opposed to free functions).


Andrei
December 11, 2012
On Tuesday, 11 December 2012 at 12:52:03 UTC, Dan wrote:
>
> For me it did not compile (No constructor for payload), maybe you had a ctor for Payload? I think opAssign is not necessary - if I remove it it still seems to work since default opAssign calls postblit. I added the ctor, removed opAssign and put it here: http://dpaste.dzfl.pl/6ecfe675
>
> Other than that - is there still a subtle bug?
>
> Thanks,
> Dan

Strange, it had compiled for me this morning. Wonder what happened...

Anyways, this just highlights one of D's inconsistencies:
Payload  a = Payload(1, 1); //Legal
Payload* p1 = new Payload(1, 1); //Illegal???
Payload* p2 = [Payload(1, 1)].ptr; //Dirty workaround.

Yeah... You can use the "dirty workaround": http://dpaste.dzfl.pl/c0258f66

Anyways, opAssign is *definitely* necessary. Without it, assignment would be equivalent to aliasing, and you wouldn't have actual COW (both instances would end up modified). What's more, if the current object had a handle on another payload, you wouldn't release it, causing it to duplicate for no reason (but that's moot considering COW is already broken). In my example, it works because no-one actually *calls* opAssign.

The bug is this (on topic)
//----
S s2 = S(2);
S s0 = S();
s0 = s2; //Crap.
//----
Here, you aren't actually calling "this(int = 0)". Nope. What you are actually calling is... nothing! This means s0 does not have an actual payload. This means that in theory, on every write, you need to check the count... and check there is actually a payload there first :D ! Yay extra useless checks and increased lazy initialization on every call!

> There is a fair amount of boilerplate bookeeping. What if the struct had two, three, or more maps

Depends on the required granularity. I'd say ideally, you'd put all three maps in the same "Payload" struct, so things would not actually be more complicated.

If you try to make each map individually COW, it would only be slightly more complex. You *would* take a further performance hit having to call "ensureUnique" on *each* map every modifying call. Furthermore, you may also take a performance hit in terms of locality of reference. But that'd be WAY premature to really worry about.

What you have to keep in mind is that a "triple COW" design would only make sense if you have functions that modify one of your 3 maps. If they are always modified together, it'd make no sense to use "triple COW".

//----
When everything is said and done, most of that code can be templated, or mixed-in.
December 11, 2012
On Tuesday, 11 December 2012 at 13:22:42 UTC, monarch_dodra wrote:
> Anyways, opAssign is *definitely* necessary. Without it, assignment would be equivalent to aliasing, and you wouldn't have actual COW (both instances would end up modified). What's more, if the current object had a handle on another payload, you wouldn't release it, causing it to duplicate for no reason (but that's moot considering COW is already broken). In my example, it works because no-one actually *calls* opAssign.
>

Can you confirm this with an example? Again, I think default opAssign calls postblit. postblit increments. In this example everything looks fine without opAssign (except for the bug you point out regarding S()).

http://dpaste.dzfl.pl/7fe03a43


> What you have to keep in mind is that a "triple COW" design would only make sense if you have functions that modify one of your 3 maps. If they are always modified together, it'd make no sense to use "triple COW".
>

I see.

> //----
> When everything is said and done, most of that code can be templated, or mixed-in.

If so, sounds like useful additions to phobos.

Thanks
Dan
December 11, 2012
On Tuesday, 11 December 2012 at 13:39:22 UTC, Dan wrote:
>
> Can you confirm this with an example? Again, I think default opAssign calls postblit. postblit increments. In this example everything looks fine without opAssign (except for the bug you point out regarding S()).
>
> http://dpaste.dzfl.pl/7fe03a43
>
Oops. Correct.

Note that opAssign *also* calls the destructor. This is important too, or it would only get half the job done.
December 11, 2012
On Tuesday, 11 December 2012 at 13:01:44 UTC, Andrei Alexandrescu wrote:
> Walter and I discussed that it should be possible to automate the dupIfNeeded() (I call it ensureUnique()) calls like this:
>
> * If a method is const or immutable, leave as is
> * For all other methods, insert a ensureUnique() automatically in the prolog code
>
> For this to work, the state must be private and all primitives must be implemented via methods (as opposed to free functions).
>

This sounds good. But ensureUnique replacing dupIfNeeded really does two things - (1) determine if a copy is necessary (i.e. has a copy already been done) and (2) actually do the copy when necessary. What would it use to do the generic copy and what type of copy would it be (1 level deep (e.g. dup all fields of typeof(this)) or dup recursively deep)?

I don't think the compiler can choose which of these two is desired by the user. This would then call for some form of field copy mechanism similar to postblit, likely defined by user.

I think giving the struct designer a simpler way to do COW sounds very useful. Would love to read a DIP or notes on it. IMHO there are many coding use cases for me where even thinking about COW is premature optimization. For instance, when it comes to configuring a server at startup I really don't care too much about extra data copies.

In reading this news groups I see things like Walter is not a fan of postblits and sees no need for them OR static named field initialization of structs is up for deprecation. These kind of issues give pause. I don't think 100% guarantees are needed - but it would be nice if discussions hinting at existing features being in or out of the language be addressed quickly and vocally by either Walter or you.

Also, it would be nice if you release early and often when it comes to your thoughts and ideas on new features, like your thoughts on ensureUnique. Or, say, if you and Walter do have a potential solution or leaning when it comes to replacing postblits with true copy constructors it would be great to hear about.

I'm not complaining - here, though. Keep up the good work.

Thanks,
Dan