BIg semantic issues with SList & DList

Sep 12, 2012

Sep 13, 2012

Basically, the problem is that their copy semantics adhere to neither value nor reference semantics. The embody a "reference to a third party chain system" semantic. Erm... What does this mean? Basically, a "chain of nodes is created", and the lists reference points in that chain, but not necessarily the beginnings nor ends. How is this a problem? I'll start with SList: SList -------- import std.stdio, std.container, std.range, std.iostream, std.algorithm; void main() { auto a = SList!int([3, 4]); auto b = a; // ab -> 3 -> 4 -> null assert(a[].equal([3, 4])); assert(b[].equal([3, 4])); a.stableInsertFront(1); b.stableInsertFront(2); // a -> 1 // \ // 3 -> 4 -> null // / // b -> 2 assert(a[].equal([1, 3, 4])); assert(b[].equal([2, 3, 4])); //a: change the 2nd element (4) into 5; a[].drop(2).front = 5; // a -> 1 // \ // 3 -> 5 -> null // / // b -> 2 assert(a[].equal([1, 3, 5])); assert(b[].equal([2, 3, 5])); a.clear(); // a -> null // // 3 -> 4 -> null // / // b -> 2 assert(a[].empty); assert(b[].equal([2, 3, 5])); a.insert(1); // -> 1 -> null // // 3 -> 5-> null // / // -> 2 assert(a[].equal([1])); assert(b[].equal([2, 3, 5])); } -------- Is this a clear example of what I mean? NOW: The big question: IS THIS A LEGITIMATE BEHAVIOR? IMO, for SList, yes, yes it is. _HOWEVER_, it must be much clearly documented in the docs, as well as examples provided. I have done some relatively heavy testing, and I have found no bugs in SList. However, there are a couple *Gotcha's*, in particular, regarding the effects of "remove": Are we merely advancing the SList's "root pointer", or actually excising elements from the "chain" itself? Depends on context :D but nothing that can't be documented. DLists has the same semantic: -------- import std.stdio, std.container, std.range, std.algorithm; void main() { auto a = DList!int([3, 4]); auto b = a; // (3 - 4) assert(a[].equal([3, 4])); assert(b[].equal([3, 4])); b.stableInsertFront(1); b.stableInsertBack(5); // (2 - (3 - 4) - 5) assert(a[].equal([3, 4])); assert(b[].equal([1, 3, 4, 5])); a.stableInsertFront(2); // (1 - (2 - 3 - 4) - 5) assert(a[].equal([2, 3, 4])); writeln(b[]); assert(b[].equal([1, 2, 3, 4, 5])); //Does not work as expected! a.remove(a[]); writeln(a[]); //[2, 3, 4] writeln(b[]); //[1, 5] } -------- The example stops here, because DList was obviously written without this in mind. I don't have enough fingers on my hands to list the use-cases that break it. Again, same question: IS THIS A LEGITIMATE BEHAVIOR? The gotchas are even more prevalent here. Not only gotchas, but just plain problems regarding theoretical behavior: What does it mean to append two DLists, if those DLists are views into two different, but, bigger, chains? IMO: For DList, it is just too damn complicated to keep this semantic. I'd migrate to "true reference". -------- The good news is that having "true" reference semantics with {SD}List should be relatively easy. Depending on the output of this discussion, it is something <s>I wouldn't mind</s> I'd enjoy doing. TY for reading.

On Wednesday, 12 September 2012 at 17:24:57 UTC, monarch_dodra wrote: > [SNIP] I did some tests and implementations, and I came with 3 options: 1) Embrace the "view of a chain" implementation and document it. 2) Use classic reference semantics. However, ranges can still have an "out of bonds" view of the chain. This means the ranges are virtually never invalidated. 3) Use classic reference semantics. ENFORCE range bounds be inside the underlying DList. This forces us to systematically "detach" removed nodes (as opposed to just advancing the _first pointer) ... 4) There was a 4th solution I did not like: use classic reference semantics. Do not provide support for "out of bounds" view of the Range, but do not enforce it either. This is the "Bites you in the ass" solution, and kind of what we currently had :( All 3 solutions have been implemented. Common to all 3 implementations: *Added a LOT of unit tests. *Added a description in the beggining *Fixed the opOpAssign functions. I know they weren't working because their names were typoed (!). *MUCH more robust assertions of internals (and minor bugfixes). *Ranges are now assertion based validated. (as opposed to enforced). Implementation 1: This embraces the "chain" view. It works as advertised. The downside though is in the concept, in that it _IS_ actually possible to invalidate a DList itself (not just the range). Everything is Asserted, and I do not think it is possible to have some bad usage which isn't asserted. I had to change a few implementations, but everything works now. Here: https://github.com/monarchdodra/phobos/commit/054fe9bea5652aa0e3089d279d4df2ae069d9f30#std/container.d Implementation 2: Reference semantics. However, Ranges can still have an out of bounds view of the chain. This is because functions like "removeFront" don't actually detach nodes, merelly advance the get pointer. The *real* advantage of this approach is that Range are virtually never invalidated. Also had to change a few implementations. This is kind of a hybrid soltuion, as the DLists themselves keep the traditional semantics, but the ranges themselves are more of a "can see anywhere in the chain" object. https://github.com/monarchdodra/phobos/commit/2b08b46bf552f208deb897ec0f8c5b3327225c8c#std/container.d Implementation 3: Classic implementation. No surprises. Robust, safe, expected. Boring. Boring is good. IMO, this is what people expect. Not a lot of changes here actually. https://github.com/monarchdodra/phobos/commit/8ce21e1a9a0107e74ff86a9acc736595ef73362c#std/container.d Could I get any kind of feedback?

September 13, 2012

Re: BIg semantic issues with SList & DList

Posted by monarch_dodra
in reply to monarch_dodra

Permalink

monarch_dodra

Posted in reply to monarch_dodra

Permalink

On Thursday, 13 September 2012 at 19:37:41 UTC, monarch_dodra wrote:
> On Wednesday, 12 September 2012 at 17:24:57 UTC, monarch_dodra
> wrote:
>> [SNIP]
> [SNIP]

I'll just copy paste here the 3 headers which contain examples, to give a quicker idea of what I went for:

111 CHAIN IMPLEMENTATION 111
/**
   Implements a doubly-linked list.

   $(D DList) uses neither reference nor value semantics. They can be seen as
   several different handles into an external chain of nodes. Several different
   $(D DList)s can all reference different points in a same chain.

   $(D DList.Range) is, for all intents and purposes, a DList with range
   semantics. The $(D DList.Range) has a view directly into the chain itself.
   It is not tied to its parent $(D DList), and may be used to operate on
   other lists (that point to the same chain).

   The ONLY operation that can invalidate a $(D DList) or $(D DList.Range), but
   which will invalidate BOTH, is the $(D remove) operation, if the cut Range
   overlaps with the boundaries of another DList or DList.Range.

   Example:
----
auto a = DList!int([3, 4]); //Create a new chain
auto b = a; //Point to the same chain
// (3 - 4)
assert(a[].equal([3, 4]));
assert(b[].equal([3, 4]));

b.stableInsertFront(1); //insert before of b
b.stableInsertBack(5);  //insert after of b
// (2 - (3 - 4) - 5)
assert(a[].equal([3, 4])); //a is not changed
assert(b[].equal([1, 3, 4, 5])); // but a is

a.stableInsertFront(2); //insert in front of a, this will insert "inside" the chain
// (1 - (2 - 3 - 4) - 5)
assert(a[].equal([2, 3, 4])); //a is modified
assert(b[].equal([1, 2, 3, 4, 5])); //and so is b;

a.remove(a[]); //remove all the elements of a;
// (1 - 5)
assert(a[].empty); //a is empty
assert(b[].equal([1, 5])); //b has lost some of its elements;

a.insert(2); //insert in a. This will create a new chain
// (2)
// (1 - 5)
assert(a[].equal([2])); //a is empty
assert(b[].equal([1, 5])); //b has lost some of its elements;
----
 */

222 HYBRID REFERENCE IMPLEMENATION 222
/**
   Implements a doubly-linked list.

   $(D DList) use reference semantics. They give a view into a bidirectional
   chain of nodes.

   $(D DList.Range) can be used to have $(D BidirectionalRange) view of the
   chain of nodes. $(D DList.Ranges) are almost never invalidated, but they
   may have access to nodes that are past the end of original $(D DList)'s,
   if that $(D DList) was shortened.

   Example:
----
auto a = DList!int([1, 2, 3]); //Create a new DList
auto b = a; //A reference to that DList
auto r = a[]; //A range view of the list

a.removeFront();
a.removeBack();
assert(a[].equal([2])); // both a and b have been affected
assert(b[].equal([2])); // both a and b have been affected
assert(r.equal([1, 2, 3])); //But the range remains unchanged

a.insertFront(1);
assert(a[].equal([1, 2])); // both a and b have been affected
assert(b[].equal([1, 2])); // both a and b have been affected
assert(r.equal([1, 1, 2, 3])); //But so has the range

auto c = a; //Create a third reference
a.clear();  //Detaches from the chain
assert(a[].empty); //a is now empty
assert(b[].equal([1, 2])); //But a and b are unchanged
assert(c[].equal([1, 2])); //But a and b are unchanged
assert(r.equal([1, 1, 2, 3]));

b.remove(b[]); //Do a hard removal of the actual elements
assert(b[].empty); //b is empty of course
assert(c[].empty); //but so is c
assert(r.equal([1, 3])); //and the range has lost an element: It was removed from the chain

b.insert(1); //Insert an element into b
assert(a[].empty); //a remains unchanged
assert(b[].equal([1])); //b has the new element
assert(c[].equal([1])); //and so does c
assert(r.equal([1, 3])); //r points to the old chain, so does not see this new element
----
 */

333 CLASSIC IMPLEMENTATION 333
/**
   Implements a doubly-linked list.

   $(D DList) use reference semantics.

   $(D DList.Range) may be used to have a $(D BidirectionalRange) view into
   the DList.

   Example:
----
auto a = DList!int([1, 2, 3]); //Create a new DList
auto b = a; //A reference to that DList

a.removeFront();
a.removeBack();
assert(a[].equal([2])); // both a and b have been affected
assert(b[].equal([2])); // both a and b have been affected

a.insertFront(1);
assert(a[].equal([1, 2])); // both a and b have been affected
assert(b[].equal([1, 2])); // both a and b have been affected

auto c = a; //Create a third reference
a.clear();  //Detaches from the chain
assert(a[].empty); //a is now empty
assert(b[].equal([1, 2])); //But a and b are unchanged
assert(c[].equal([1, 2])); //But a and b are unchanged

b.remove(b[]); //Do a hard removal of the actual elements
assert(b[].empty); //b is empty of course
assert(c[].empty); //but so is c

b.insert(1); //Insert an element into b
assert(a[].empty); //a remains unchanged
assert(b[].equal([1])); //b has the new element
assert(c[].equal([1])); //and so does c
----
 */
/*
There is not much to show here actually: Ranges are *immediatly* invalidated, and attempt to use an invalid range will assert pretty quickly.
 */

Forums