Thread overview
[phobos] Improving std.range.Zip
Aug 17, 2010
David Simcha
Aug 19, 2010
David Simcha
August 17, 2010
Apologies for the late answer (the message is from 2009!). This is about David Simcha's thoughts on improving std.range.Zip. In spite of its lateness, the response is rather timely for the ongoing discussion of lockstep().

Zip was indeed a kludge because it required algorithms to be modified to recognize it. Now I think we have a general method for expressing types that should be assignable and swappable, yet don't expose references.

The problem
===========

We want to define read and write methods for e.g. the front of a range. In addition we also want to swap a front of a range with another object of the same type without incurring unnecessary copies. This is because copying objects can be arbitrarily expensive due to this(this).

If all ranges exposed references, that all would be simple because access to objects is immediate. Yet some ranges don't want to expose references to their internals. I define below such ranges as "sealed ranges". Example of non-sealed range:

struct NonSealed {
     @property bool empty();
     @property ref Widget front();
     void popFront();
}

struct Sealed {
     @property bool empty();
     @property Widget front();
     void popFront();
}

Solution
========

Sealed ranges must define additional members as follows:

struct Sealed {
     @property bool empty();
     @property Widget front();
     @property void front(Widget rhs);
     @property Widget moveFront(Widget rhs);
     void popFront();
}

Similarly, if a range offers back() it should also offer back(Widget) and moveBack(). Finally, if a range offers opIndex(), it should also offer opIndeAssign(), opIndexOpAssign(), and Widget moveAt(size_t).

These three routines are all that's needed for implementing a host of efficient algorithms, starting of course with swap. I've already changed std.algorithm.swap and many other functions in std.algorithm to support this new concept. (I think most of these changes are not yet committed.) I haven't yet updated e.g. Zip.

I have followed a very similar pattern in std.container, which I'll check in soon.

Any comments, please chime in.


Andrei

On 11/13/2009 11:08 AM, David Simcha wrote:
> I was thinking about std.range.Zip and ways to improve it.  It would be really nice if we could give Zip assignable elements because right now, sorting of it works by a bunch of special-case ad-hockery. Furthermore, Andrei has mentioned the need for a stable O(N log N) sort in Phobos. The most obvious (and possibly the only) algorithm is a merge sort, but this can only be done (AFAIK) if you have assignable elements, not just swappable elements. Using some magic of postblits and opAssign, I think we can get assignable elements to work with Zip. First of all, there should be two kinds of Proxy structs: A value one and a reference one. This information would be stored in a flag in each instance. The compile time type would be the same. The values and the pointers would be stored together in a union. It would look something like this:
>
> struct Proxy {
> union {
> staticMap!(pointers, R) ptrs;
> R vals;
> }
>
> bool isReference; // Always true if this is stored inside a Zip.
> this(this) {
> if(isReference) {
> // Overwrite pointers with values.
> isReference = false;
> }
> }
>
> void opAssign(ref Proxy rhs) {
> if(isReference) {
> // Assign the values of rhs.at!(0)...
> // rhs.at!(rhs.length) to the deref
> // ptrs.
> } else {
> // Assign rhs's values to vals.
> }
> }
>
> auto at(int index)() {
> if(isReference) {
> return *(ptrs[index]);
> } else {
> return vals[index];
> }
> }
> }
>
> Then, the standard swap algorithm would work. Assuming random access:
>
> auto temp = myZip[index1]; // Postblit makes this a value proxy. myZip[index1] = myZip[index2]; // Writes to dereferences in myZip[index1]. myZip[index2] = temp; // Writes to dereferences in myZip[index2].
>
> Furthermore, you could store Proxy structs in an array as a buffer, etc. and it would (I think) just work. Yes, there would be some performance cost to all this branching and postblitting, but IMHO it's worth it to make Zip work the way that people who don't understand the implementation details would expect it to work.
>
> The only problem I see is uninitialized Proxy objects:
>
> Proxy proxy = void;
> proxy = myZip[5];
August 17, 2010
Two points:

1.  I generally consider making copying arbitrarily expensive via this(this) to be a *terrible* practice that we shouldn't bend over backwards to support efficiently.  Using this(this) to give eager value semantics is a holdover from C++, where objects must have a clear owner and aliasing must be controlled due to lack of GC.  In a GC'd language containers should have reference semantics with explicit duplication.  The proper use of this(this) is for simple O(1) things like reference counting.  My general feeling is that range definitions are getting too complex to support a corner case.

2.  What's the difference between moveFront(Widget rhs) and front(Widget
rhs)?  I understand the difference for the case of moveFront() vs. front(),
but not for the assignment case.

Ignoring the complexities of this(this)-related stuff, though, I definitely like the work on sealed containers, etc.

--Dave

On Tue, Aug 17, 2010 at 1:57 PM, Andrei Alexandrescu <andrei at erdani.com>wrote:

> Apologies for the late answer (the message is from 2009!). This is about David Simcha's thoughts on improving std.range.Zip. In spite of its lateness, the response is rather timely for the ongoing discussion of lockstep().
>
> Zip was indeed a kludge because it required algorithms to be modified to recognize it. Now I think we have a general method for expressing types that should be assignable and swappable, yet don't expose references.
>
> The problem
> ===========
>
> We want to define read and write methods for e.g. the front of a range. In addition we also want to swap a front of a range with another object of the same type without incurring unnecessary copies. This is because copying objects can be arbitrarily expensive due to this(this).
>
> If all ranges exposed references, that all would be simple because access to objects is immediate. Yet some ranges don't want to expose references to their internals. I define below such ranges as "sealed ranges". Example of non-sealed range:
>
> struct NonSealed {
>    @property bool empty();
>    @property ref Widget front();
>    void popFront();
> }
>
> struct Sealed {
>    @property bool empty();
>    @property Widget front();
>    void popFront();
> }
>
> Solution
> ========
>
> Sealed ranges must define additional members as follows:
>
> struct Sealed {
>    @property bool empty();
>    @property Widget front();
>    @property void front(Widget rhs);
>    @property Widget moveFront(Widget rhs);
>    void popFront();
> }
>
> Similarly, if a range offers back() it should also offer back(Widget) and
> moveBack(). Finally, if a range offers opIndex(), it should also offer
> opIndeAssign(), opIndexOpAssign(), and Widget moveAt(size_t).
>
> These three routines are all that's needed for implementing a host of efficient algorithms, starting of course with swap. I've already changed std.algorithm.swap and many other functions in std.algorithm to support this new concept. (I think most of these changes are not yet committed.) I haven't yet updated e.g. Zip.
>
> I have followed a very similar pattern in std.container, which I'll check in soon.
>
> Any comments, please chime in.
>
>
> Andrei
>
>
> On 11/13/2009 11:08 AM, David Simcha wrote:
>
>> I was thinking about std.range.Zip and ways to improve it.  It would be really nice if we could give Zip assignable elements because right now, sorting of it works by a bunch of special-case ad-hockery. Furthermore, Andrei has mentioned the need for a stable O(N log N) sort in Phobos. The most obvious (and possibly the only) algorithm is a merge sort, but this can only be done (AFAIK) if you have assignable elements, not just swappable elements. Using some magic of postblits and opAssign, I think we can get assignable elements to work with Zip. First of all, there should be two kinds of Proxy structs: A value one and a reference one. This information would be stored in a flag in each instance. The compile time type would be the same. The values and the pointers would be stored together in a union. It would look something like this:
>>
>> struct Proxy {
>> union {
>> staticMap!(pointers, R) ptrs;
>> R vals;
>> }
>>
>> bool isReference; // Always true if this is stored inside a Zip.
>> this(this) {
>> if(isReference) {
>> // Overwrite pointers with values.
>> isReference = false;
>> }
>> }
>>
>> void opAssign(ref Proxy rhs) {
>> if(isReference) {
>> // Assign the values of rhs.at!(0)...
>> // rhs.at!(rhs.length) to the deref
>> // ptrs.
>> } else {
>> // Assign rhs's values to vals.
>> }
>> }
>>
>> auto at(int index)() {
>> if(isReference) {
>> return *(ptrs[index]);
>> } else {
>> return vals[index];
>> }
>> }
>> }
>>
>> Then, the standard swap algorithm would work. Assuming random access:
>>
>> auto temp = myZip[index1]; // Postblit makes this a value proxy. myZip[index1] = myZip[index2]; // Writes to dereferences in myZip[index1]. myZip[index2] = temp; // Writes to dereferences in myZip[index2].
>>
>> Furthermore, you could store Proxy structs in an array as a buffer, etc. and it would (I think) just work. Yes, there would be some performance cost to all this branching and postblitting, but IMHO it's worth it to make Zip work the way that people who don't understand the implementation details would expect it to work.
>>
>> The only problem I see is uninitialized Proxy objects:
>>
>> Proxy proxy = void;
>> proxy = myZip[5];
>>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100817/7d7a95b0/attachment.html>
August 17, 2010
On 08/17/2010 01:18 PM, David Simcha wrote:
> Two points:
>
> 1.  I generally consider making copying arbitrarily expensive via this(this) to be a *terrible* practice that we shouldn't bend over backwards to support efficiently.  Using this(this) to give eager value semantics is a holdover from C++, where objects must have a clear owner and aliasing must be controlled due to lack of GC.  In a GC'd language containers should have reference semantics with explicit duplication. The proper use of this(this) is for simple O(1) things like reference counting.  My general feeling is that range definitions are getting too complex to support a corner case.

I understand the sentiment, and I partially share it. Just like you I am tempted to decree that this(this) must be O(1) and call it a day. However, that often means that other operations all of a sudden become weirder.

Consider an array of integers that uses reference counting. That means that assigning an int to an existing slot of the array could throw. For someone who (a) knows the array uses RC/COW, and (b) has a good understanding of RC/COW, that is understandable. But at the end of the day, that has leaky abstraction written all over it.

This has been a known issue in C++ with std::string. The design of std::string has tried really hard to enable reference counting implementations, but has essentially failed. As an example, in STL everything that has capacity() and size() guarantees that size() <= capacity(). The semantics of capacity() is "maximum number of elements that can be stored without triggering reallocation". For a RC string with count > 1, capacity() should be zero even though size() is nonzero. The STL does not allow this, and current implementations essentially lie in capacity().

> 2.  What's the difference between moveFront(Widget rhs) and front(Widget
> rhs)?  I understand the difference for the case of moveFront() vs.
> front(), but not for the assignment case.

Sorry, I meant Widget moveFront(). It destructively reads the front of the range and returns it.

> Ignoring the complexities of this(this)-related stuff, though, I definitely like the work on sealed containers, etc.

Can't wait to find the time to write that article about it.


Andrei
August 19, 2010
I was thinking that std.range and std.algorithm needed to be fixed to propagate assignable elements in higher-order ranges, for example in std.retro.  Should I do this, or will any of the changes you're about to make pull the rug out from under me?

On 8/17/2010 1:57 PM, Andrei Alexandrescu wrote:
> Apologies for the late answer (the message is from 2009!). This is about David Simcha's thoughts on improving std.range.Zip. In spite of its lateness, the response is rather timely for the ongoing discussion of lockstep().
>
> Zip was indeed a kludge because it required algorithms to be modified to recognize it. Now I think we have a general method for expressing types that should be assignable and swappable, yet don't expose references.
>
> The problem
> ===========
>
> We want to define read and write methods for e.g. the front of a range. In addition we also want to swap a front of a range with another object of the same type without incurring unnecessary copies. This is because copying objects can be arbitrarily expensive due to this(this).
>
> If all ranges exposed references, that all would be simple because access to objects is immediate. Yet some ranges don't want to expose references to their internals. I define below such ranges as "sealed ranges". Example of non-sealed range:
>
> struct NonSealed {
>     @property bool empty();
>     @property ref Widget front();
>     void popFront();
> }
>
> struct Sealed {
>     @property bool empty();
>     @property Widget front();
>     void popFront();
> }
>
> Solution
> ========
>
> Sealed ranges must define additional members as follows:
>
> struct Sealed {
>     @property bool empty();
>     @property Widget front();
>     @property void front(Widget rhs);
>     @property Widget moveFront(Widget rhs);
>     void popFront();
> }
>
> Similarly, if a range offers back() it should also offer back(Widget) and moveBack(). Finally, if a range offers opIndex(), it should also offer opIndeAssign(), opIndexOpAssign(), and Widget moveAt(size_t).
>
> These three routines are all that's needed for implementing a host of efficient algorithms, starting of course with swap. I've already changed std.algorithm.swap and many other functions in std.algorithm to support this new concept. (I think most of these changes are not yet committed.) I haven't yet updated e.g. Zip.
>
> I have followed a very similar pattern in std.container, which I'll check in soon.
>
> Any comments, please chime in.
>
>
> Andrei
>
> On 11/13/2009 11:08 AM, David Simcha wrote:
>> I was thinking about std.range.Zip and ways to improve it.  It would be really nice if we could give Zip assignable elements because right now, sorting of it works by a bunch of special-case ad-hockery. Furthermore, Andrei has mentioned the need for a stable O(N log N) sort in Phobos. The most obvious (and possibly the only) algorithm is a merge sort, but this can only be done (AFAIK) if you have assignable elements, not just swappable elements. Using some magic of postblits and opAssign, I think we can get assignable elements to work with Zip. First of all, there should be two kinds of Proxy structs: A value one and a reference one. This information would be stored in a flag in each instance. The compile time type would be the same. The values and the pointers would be stored together in a union. It would look something like this:
>>
>> struct Proxy {
>> union {
>> staticMap!(pointers, R) ptrs;
>> R vals;
>> }
>>
>> bool isReference; // Always true if this is stored inside a Zip.
>> this(this) {
>> if(isReference) {
>> // Overwrite pointers with values.
>> isReference = false;
>> }
>> }
>>
>> void opAssign(ref Proxy rhs) {
>> if(isReference) {
>> // Assign the values of rhs.at!(0)...
>> // rhs.at!(rhs.length) to the deref
>> // ptrs.
>> } else {
>> // Assign rhs's values to vals.
>> }
>> }
>>
>> auto at(int index)() {
>> if(isReference) {
>> return *(ptrs[index]);
>> } else {
>> return vals[index];
>> }
>> }
>> }
>>
>> Then, the standard swap algorithm would work. Assuming random access:
>>
>> auto temp = myZip[index1]; // Postblit makes this a value proxy.
>> myZip[index1] = myZip[index2]; // Writes to dereferences in
>> myZip[index1].
>> myZip[index2] = temp; // Writes to dereferences in myZip[index2].
>>
>> Furthermore, you could store Proxy structs in an array as a buffer, etc. and it would (I think) just work. Yes, there would be some performance cost to all this branching and postblitting, but IMHO it's worth it to make Zip work the way that people who don't understand the implementation details would expect it to work.
>>
>> The only problem I see is uninitialized Proxy objects:
>>
>> Proxy proxy = void;
>> proxy = myZip[5];
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>

October 25, 2010
Continuing an older thread: David, where are you with this?

Andrei

On 8/19/10 18:56 CDT, David Simcha wrote:
> I was thinking that std.range and std.algorithm needed to be fixed to propagate assignable elements in higher-order ranges, for example in std.retro. Should I do this, or will any of the changes you're about to make pull the rug out from under me?
>
> On 8/17/2010 1:57 PM, Andrei Alexandrescu wrote:
>> Apologies for the late answer (the message is from 2009!). This is about David Simcha's thoughts on improving std.range.Zip. In spite of its lateness, the response is rather timely for the ongoing discussion of lockstep().
>>
>> Zip was indeed a kludge because it required algorithms to be modified to recognize it. Now I think we have a general method for expressing types that should be assignable and swappable, yet don't expose references.
>>
>> The problem
>> ===========
>>
>> We want to define read and write methods for e.g. the front of a range. In addition we also want to swap a front of a range with another object of the same type without incurring unnecessary copies. This is because copying objects can be arbitrarily expensive due to this(this).
>>
>> If all ranges exposed references, that all would be simple because access to objects is immediate. Yet some ranges don't want to expose references to their internals. I define below such ranges as "sealed ranges". Example of non-sealed range:
>>
>> struct NonSealed {
>> @property bool empty();
>> @property ref Widget front();
>> void popFront();
>> }
>>
>> struct Sealed {
>> @property bool empty();
>> @property Widget front();
>> void popFront();
>> }
>>
>> Solution
>> ========
>>
>> Sealed ranges must define additional members as follows:
>>
>> struct Sealed {
>> @property bool empty();
>> @property Widget front();
>> @property void front(Widget rhs);
>> @property Widget moveFront(Widget rhs);
>> void popFront();
>> }
>>
>> Similarly, if a range offers back() it should also offer back(Widget)
>> and moveBack(). Finally, if a range offers opIndex(), it should also
>> offer opIndeAssign(), opIndexOpAssign(), and Widget moveAt(size_t).
>>
>> These three routines are all that's needed for implementing a host of efficient algorithms, starting of course with swap. I've already changed std.algorithm.swap and many other functions in std.algorithm to support this new concept. (I think most of these changes are not yet committed.) I haven't yet updated e.g. Zip.
>>
>> I have followed a very similar pattern in std.container, which I'll check in soon.
>>
>> Any comments, please chime in.
>>
>>
>> Andrei
>>
>> On 11/13/2009 11:08 AM, David Simcha wrote:
>>> I was thinking about std.range.Zip and ways to improve it. It would be really nice if we could give Zip assignable elements because right now, sorting of it works by a bunch of special-case ad-hockery. Furthermore, Andrei has mentioned the need for a stable O(N log N) sort in Phobos. The most obvious (and possibly the only) algorithm is a merge sort, but this can only be done (AFAIK) if you have assignable elements, not just swappable elements. Using some magic of postblits and opAssign, I think we can get assignable elements to work with Zip. First of all, there should be two kinds of Proxy structs: A value one and a reference one. This information would be stored in a flag in each instance. The compile time type would be the same. The values and the pointers would be stored together in a union. It would look something like this:
>>>
>>> struct Proxy {
>>> union {
>>> staticMap!(pointers, R) ptrs;
>>> R vals;
>>> }
>>>
>>> bool isReference; // Always true if this is stored inside a Zip.
>>> this(this) {
>>> if(isReference) {
>>> // Overwrite pointers with values.
>>> isReference = false;
>>> }
>>> }
>>>
>>> void opAssign(ref Proxy rhs) {
>>> if(isReference) {
>>> // Assign the values of rhs.at!(0)...
>>> // rhs.at!(rhs.length) to the deref
>>> // ptrs.
>>> } else {
>>> // Assign rhs's values to vals.
>>> }
>>> }
>>>
>>> auto at(int index)() {
>>> if(isReference) {
>>> return *(ptrs[index]);
>>> } else {
>>> return vals[index];
>>> }
>>> }
>>> }
>>>
>>> Then, the standard swap algorithm would work. Assuming random access:
>>>
>>> auto temp = myZip[index1]; // Postblit makes this a value proxy.
>>> myZip[index1] = myZip[index2]; // Writes to dereferences in
>>> myZip[index1].
>>> myZip[index2] = temp; // Writes to dereferences in myZip[index2].
>>>
>>> Furthermore, you could store Proxy structs in an array as a buffer, etc. and it would (I think) just work. Yes, there would be some performance cost to all this branching and postblitting, but IMHO it's worth it to make Zip work the way that people who don't understand the implementation details would expect it to work.
>>>
>>> The only problem I see is uninitialized Proxy objects:
>>>
>>> Proxy proxy = void;
>>> proxy = myZip[5];
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
>>
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos