November 03, 2011
On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath <tobias@pankrath.net> wrote:

> Dmitry Olshansky wrote:
>
>> And more importantly, it still would be horribly slow O(N^2).
>> Personally, because of that I'd prefer hand-rolled intrusive
>> singly-linked list any time of day.
>>
>
> To be honest, I don't understand this.
> A "remove_if" for non-intrusive single-linked lists should be doable in
> O(N).

SList is a poor singly linked list implementation.  It does not support O(1) removal.

-Steve
November 03, 2011
"Steven Schveighoffer" , dans le message (digitalmars.D.learn:30402), a
> The primitive for a container is remove(range).  Ranges are essential to containers, and should be the major interface to them.

Programmers have to learn ranges to use containers. Hiding ranges is not helping them.

But here, it is more complicated than that, because range may be more powerful than iterators, they are less friendly to use when a single element reference has to be used.

c.remove(find(c.all, E));

will not remove the first occurence of E, but all elements beyond E. so instead we have to write:

c.remove(take(find(c[], E), 1));

Then it's too much.

The problem is that range are not practical to refer to a single element in the container. We need to have single-element reference to manipulate the range. Then a function should be used to find such one-element reference. std.find is already taken, and can hardly be changed (although it should be name popUntil), but we can still use a find method of the container.

auto r = take(find(c[], E), 1);

should just be written:

auto r = c.find(E);

Then the syntax to remove a single element from c is acceptable:
c.remove(c.find(E)).

Now let us remove several consecutive elements from c, let us say, all elements between the value E and the next value F:

| auto r = find(c[], E);
| int i=0;
| foreach(e, r)
|   if (e == F) break;
|   else ++i;
| c.remove(take(r, i+1));

That is not practical at all, and in addition, it is not efficient, since r is walked again from E to F.

If we add little methods to single element reference to make them behave a little like iterators, we can recreate ranges from two single element references, and regain all the power of iterators. To remove all elements from E to the next F included:

auto r = c.find( E);
c.remove(r, r.find(F)++);
// or c.remove(range(r, r.find(F)++));

(I use the find method a bit like a joker in this exemple, it is just to get the idea).

In conclusion, to refer to a single element of a container for simple
operations, range looses against iterator. Ranges even loose to refer to
a range of consecutive elements...
Many alternatives are possible, but a simple iterator structure to refer
to a single element, and that you can combine to recreate a range (and
use all range algorithms) would be enough, and would complete the range
interface to make them have no drawback against iterators.
November 03, 2011
On 03.11.2011 19:34, Steven Schveighoffer wrote:
> On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
> <tobias@pankrath.net> wrote:
>
>> Dmitry Olshansky wrote:
>>
>>> And more importantly, it still would be horribly slow O(N^2).
>>> Personally, because of that I'd prefer hand-rolled intrusive
>>> singly-linked list any time of day.
>>>
>>
>> To be honest, I don't understand this.
>> A "remove_if" for non-intrusive single-linked lists should be doable in
>> O(N).

Yeah, poor wording going to kill me one day :) And looking at how SList works with "checked" remove does O(N^2) turned out to be a context for reference for intrusive singly-linked list, just my bad.

As for why I'd rather go with intrusive lists, it's because of it usually uses less memory due to node structures being more padding-free.
Anyway the point should have been focused on _hand-rolled_ part, mostly because SList is plain not ready for prime time*. And btw singly-linked list it's not hard to implement and you can fine tune it how you like it e.g. they do play nice with free list allocation strategy. Not to say of circular lists and/or using sentinels.
>
> SList is a poor singly linked list implementation. It does not support
> O(1) removal.
>
> -Steve

Aye, looking at SList implementation I can say that it sort of tries to verify that this is a correct list. Otherwise it would be O(1). Indeed passing wrong range/iterator is quite bad in linked lists and could lead to all manner of funky bugs, but the cost is horrific. Especially since insert doesn't do this extra check.

Actually, it opens up a question of "Checked ranges" akin to "Checked iterators" used in many STL implementations. Any ideas what they can carry around as an "ID" of list? Root pointer won't cut it, as it can be easily removed during iteration. If SList was a class it's reference probably would do.

* I think I'll scrap together a pull request to address both these issues though.

-- 
Dmitry Olshansky
November 03, 2011
On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:

> On 03.11.2011 19:34, Steven Schveighoffer wrote:
>> On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
>> <tobias@pankrath.net> wrote:
>>
>>> Dmitry Olshansky wrote:
>>>
>>>> And more importantly, it still would be horribly slow O(N^2).
>>>> Personally, because of that I'd prefer hand-rolled intrusive
>>>> singly-linked list any time of day.
>>>>
>>>
>>> To be honest, I don't understand this.
>>> A "remove_if" for non-intrusive single-linked lists should be doable in
>>> O(N).
>
> Yeah, poor wording going to kill me one day :) And looking at how SList works with "checked" remove does O(N^2) turned out to be a context for reference for intrusive singly-linked list, just my bad.
>
> As for why I'd rather go with intrusive lists, it's because of it usually uses less memory due to node structures being more padding-free.
> Anyway the point should have been focused on _hand-rolled_ part, mostly because SList is plain not ready for prime time*. And btw singly-linked list it's not hard to implement and you can fine tune it how you like it e.g. they do play nice with free list allocation strategy. Not to say of circular lists and/or using sentinels.
>>
>> SList is a poor singly linked list implementation. It does not support
>> O(1) removal.
>>
>> -Steve
>
> Aye, looking at SList implementation I can say that it sort of tries to verify that this is a correct list. Otherwise it would be O(1). Indeed passing wrong range/iterator is quite bad in linked lists and could lead to all manner of funky bugs, but the cost is horrific. Especially since insert doesn't do this extra check.

No, it's necessarily doing O(n) search for current node because it has no reference to the previous node.

The range type for a SList has a single pointer to the currently iterated node.  How do you remove that node without having access to the head/previous pointer?

I suggested to Andrei having the range keep track of the *previous* node/head, but he did not like that idea (too many dereferences for front()).  Another option is to have a removeAllButFirst method, but that's kind of awkward...

> Actually, it opens up a question of "Checked ranges" akin to "Checked iterators" used in many STL implementations. Any ideas what they can carry around as an "ID" of list? Root pointer won't cut it, as it can be easily removed during iteration. If SList was a class it's reference probably would do.

dcollections stipulates that all ranges/cursors can be verified in O(lgn) time or better to belong to a specific container.  In some cases, this adds an extra word to the range/cursor, and in others, it's easy to determine or the owner-reference was already needed.  Since everything is a class, the fallback is to just stick an owner class instance in the range.

This stipulation is necessary to allow safe slicing.

-Steve
November 03, 2011
On Thu, 03 Nov 2011 12:32:06 -0400, Christophe <travert@phare.normalesup.org> wrote:

> "Steven Schveighoffer" , dans le message (digitalmars.D.learn:30402), a
>> The primitive for a container is remove(range).  Ranges are essential to
>> containers, and should be the major interface to them.
>
> Programmers have to learn ranges to use containers. Hiding ranges is not
> helping them.
>
> But here, it is more complicated than that, because range may be more
> powerful than iterators, they are less friendly to use when a single
> element reference has to be used.
>
> c.remove(find(c.all, E));
>
> will not remove the first occurence of E, but all elements beyond E.
> so instead we have to write:
>
> c.remove(take(find(c[], E), 1));
>
> Then it's too much.
>
> The problem is that range are not practical to refer to a single element
> in the container. We need to have single-element reference to manipulate
> the range. Then a function should be used to find such one-element
> reference. std.find is already taken, and can hardly be changed
> (although it should be name popUntil), but we can still use a find
> method of the container.
>
> auto r = take(find(c[], E), 1);
>
> should just be written:
>
> auto r = c.find(E);
>
> Then the syntax to remove a single element from c is acceptable:
> c.remove(c.find(E)).
>
> Now let us remove several consecutive elements from c, let us say, all
> elements between the value E and the next value F:
>
> | auto r = find(c[], E);
> | int i=0;
> | foreach(e, r)
> |   if (e == F) break;
> |   else ++i;
> | c.remove(take(r, i+1));
>
> That is not practical at all, and in addition, it is not efficient,
> since r is walked again from E to F.
>
> If we add little methods to single element reference to make them behave
> a little like iterators, we can recreate ranges from two single element
> references, and regain all the power of iterators. To remove all
> elements from E to the next F included:
>
> auto r = c.find( E);
> c.remove(r, r.find(F)++);
> // or c.remove(range(r, r.find(F)++));
>
> (I use the find method a bit like a joker in this exemple, it is just
> to get the idea).
>
> In conclusion, to refer to a single element of a container for simple
> operations, range looses against iterator. Ranges even loose to refer to
> a range of consecutive elements...
> Many alternatives are possible, but a simple iterator structure to refer
> to a single element, and that you can combine to recreate a range (and
> use all range algorithms) would be enough, and would complete the range
> interface to make them have no drawback against iterators.

Preaching to the choir sir :)

http://www.dsource.org/projects/dcollections/browser/branches/d2/concepts.txt

If you can convince Andrei that cursors are a good addition to std.container, you have my admiration.  I've tried and failed quite a few times.

To illustrate syntax:

auto cursor = c.elemAt(E);
c.remove(c[cursor..c.end]); // note, $ could be used here, but opDollar is broken.

Note, this only works if c supports elemAt.  For example, TreeSet supports this, LinkList does not.  But dcollections supports even other slicing abilities.  For example TreeSet can do this too:

c.remove(c[E..F]);

If you have a container that doesn't support elemAt, you can use std.algorithm.find, and the dcollections' range.begin method to get a valid cursor:

auto cursor = find(c[], E).begin;

I realize this isn't much better than take(find(c[], E), 1), but I don't know if you realize what a task/chore it would be to create a simple shortcut in a class for std.algorithm.find.

-Steve
November 03, 2011
On 11/03/2011 06:13 PM, Steven Schveighoffer wrote:
> On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky
> <dmitry.olsh@gmail.com> wrote:
>
>> On 03.11.2011 19:34, Steven Schveighoffer wrote:
>>> On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
>>> <tobias@pankrath.net> wrote:
>>>
>>>> Dmitry Olshansky wrote:
>>>>
>>>>> And more importantly, it still would be horribly slow O(N^2).
>>>>> Personally, because of that I'd prefer hand-rolled intrusive
>>>>> singly-linked list any time of day.
>>>>>
>>>>
>>>> To be honest, I don't understand this.
>>>> A "remove_if" for non-intrusive single-linked lists should be doable in
>>>> O(N).
>>
>> Yeah, poor wording going to kill me one day :) And looking at how
>> SList works with "checked" remove does O(N^2) turned out to be a
>> context for reference for intrusive singly-linked list, just my bad.
>>
>> As for why I'd rather go with intrusive lists, it's because of it
>> usually uses less memory due to node structures being more padding-free.
>> Anyway the point should have been focused on _hand-rolled_ part,
>> mostly because SList is plain not ready for prime time*. And btw
>> singly-linked list it's not hard to implement and you can fine tune it
>> how you like it e.g. they do play nice with free list allocation
>> strategy. Not to say of circular lists and/or using sentinels.
>>>
>>> SList is a poor singly linked list implementation. It does not support
>>> O(1) removal.
>>>
>>> -Steve
>>
>> Aye, looking at SList implementation I can say that it sort of tries
>> to verify that this is a correct list. Otherwise it would be O(1).
>> Indeed passing wrong range/iterator is quite bad in linked lists and
>> could lead to all manner of funky bugs, but the cost is horrific.
>> Especially since insert doesn't do this extra check.
>
> No, it's necessarily doing O(n) search for current node because it has
> no reference to the previous node.
>
> The range type for a SList has a single pointer to the currently
> iterated node. How do you remove that node without having access to the
> head/previous pointer?
>

The container interface does not expose references to the 'Node' struct, therefore the following approach would be practical:

0. Have a sentinel for end of list. (O(1) additional memory for the entire list). It is the only node in the list that has a null 'next' pointer.

example, for removing just the current node:
1. if the following node is not the sentinel
1.1. Move value from following node into the current node.
1.2. Remove following node.
2. if the following node is the sentinel
2.1. erase the contents of the current node (iff it has indirections)
2.2. set it's 'next' field to null


Removing an entire range is just erasing the contents of the current node and setting the 'next' field of the associated Node to zero. Removing a Take!Range is a simple generalization of the algorithm above.







November 03, 2011
On 03.11.2011 21:13, Steven Schveighoffer wrote:
> On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky
> <dmitry.olsh@gmail.com> wrote:
>
>> On 03.11.2011 19:34, Steven Schveighoffer wrote:
>>> On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
>>> <tobias@pankrath.net> wrote:
>>>
>>>> Dmitry Olshansky wrote:
>>>>
>>>>> And more importantly, it still would be horribly slow O(N^2).
>>>>> Personally, because of that I'd prefer hand-rolled intrusive
>>>>> singly-linked list any time of day.
>>>>>
>>>>
>>>> To be honest, I don't understand this.
>>>> A "remove_if" for non-intrusive single-linked lists should be doable in
>>>> O(N).
>>
>> Yeah, poor wording going to kill me one day :) And looking at how
>> SList works with "checked" remove does O(N^2) turned out to be a
>> context for reference for intrusive singly-linked list, just my bad.
>>
>> As for why I'd rather go with intrusive lists, it's because of it
>> usually uses less memory due to node structures being more padding-free.
>> Anyway the point should have been focused on _hand-rolled_ part,
>> mostly because SList is plain not ready for prime time*. And btw
>> singly-linked list it's not hard to implement and you can fine tune it
>> how you like it e.g. they do play nice with free list allocation
>> strategy. Not to say of circular lists and/or using sentinels.
>>>
>>> SList is a poor singly linked list implementation. It does not support
>>> O(1) removal.
>>>
>>> -Steve
>>
>> Aye, looking at SList implementation I can say that it sort of tries
>> to verify that this is a correct list. Otherwise it would be O(1).
>> Indeed passing wrong range/iterator is quite bad in linked lists and
>> could lead to all manner of funky bugs, but the cost is horrific.
>> Especially since insert doesn't do this extra check.
>
> No, it's necessarily doing O(n) search for current node because it has
> no reference to the previous node.
>
> The range type for a SList has a single pointer to the currently
> iterated node. How do you remove that node without having access to the
> head/previous pointer?
>

removeAfter ? ;) Hm though it does hit as strange. Looking at my code, I usually keep previous node but only when I remove elements during iteration.

> I suggested to Andrei having the range keep track of the *previous*
> node/head, but he did not like that idea (too many dereferences for
> front()). Another option is to have a removeAllButFirst method, but
> that's kind of awkward...

And then using a sentinel as in Timon's idea doesn't look half bad.

>
>> Actually, it opens up a question of "Checked ranges" akin to "Checked
>> iterators" used in many STL implementations. Any ideas what they can
>> carry around as an "ID" of list? Root pointer won't cut it, as it can
>> be easily removed during iteration. If SList was a class it's
>> reference probably would do.
>
> dcollections stipulates that all ranges/cursors can be verified in
> O(lgn) time or better to belong to a specific container. In some cases,
> this adds an extra word to the range/cursor, and in others, it's easy to
> determine or the owner-reference was already needed. Since everything is
> a class, the fallback is to just stick an owner class instance in the
> range.
>
> This stipulation is necessary to allow safe slicing.
>

Seems reasonable, I'd expect checks to go away in release, right(?).


-- 
Dmitry Olshansky
November 03, 2011
On Thu, 03 Nov 2011 13:47:28 -0400, Timon Gehr <timon.gehr@gmx.ch> wrote:

> On 11/03/2011 06:13 PM, Steven Schveighoffer wrote:
>> On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky
>> <dmitry.olsh@gmail.com> wrote:
>>
>>> On 03.11.2011 19:34, Steven Schveighoffer wrote:
>>>> On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
>>>> <tobias@pankrath.net> wrote:
>>>>
>>>>> Dmitry Olshansky wrote:
>>>>>
>>>>>> And more importantly, it still would be horribly slow O(N^2).
>>>>>> Personally, because of that I'd prefer hand-rolled intrusive
>>>>>> singly-linked list any time of day.
>>>>>>
>>>>>
>>>>> To be honest, I don't understand this.
>>>>> A "remove_if" for non-intrusive single-linked lists should be doable in
>>>>> O(N).
>>>
>>> Yeah, poor wording going to kill me one day :) And looking at how
>>> SList works with "checked" remove does O(N^2) turned out to be a
>>> context for reference for intrusive singly-linked list, just my bad.
>>>
>>> As for why I'd rather go with intrusive lists, it's because of it
>>> usually uses less memory due to node structures being more padding-free.
>>> Anyway the point should have been focused on _hand-rolled_ part,
>>> mostly because SList is plain not ready for prime time*. And btw
>>> singly-linked list it's not hard to implement and you can fine tune it
>>> how you like it e.g. they do play nice with free list allocation
>>> strategy. Not to say of circular lists and/or using sentinels.
>>>>
>>>> SList is a poor singly linked list implementation. It does not support
>>>> O(1) removal.
>>>>
>>>> -Steve
>>>
>>> Aye, looking at SList implementation I can say that it sort of tries
>>> to verify that this is a correct list. Otherwise it would be O(1).
>>> Indeed passing wrong range/iterator is quite bad in linked lists and
>>> could lead to all manner of funky bugs, but the cost is horrific.
>>> Especially since insert doesn't do this extra check.
>>
>> No, it's necessarily doing O(n) search for current node because it has
>> no reference to the previous node.
>>
>> The range type for a SList has a single pointer to the currently
>> iterated node. How do you remove that node without having access to the
>> head/previous pointer?
>>
>
> The container interface does not expose references to the 'Node' struct, therefore the following approach would be practical:

It does, actually.  A range is a reference to a Node struct.  But I still think the following approach will work.

>
> 0. Have a sentinel for end of list. (O(1) additional memory for the entire list). It is the only node in the list that has a null 'next' pointer.
>
> example, for removing just the current node:
> 1. if the following node is not the sentinel
> 1.1. Move value from following node into the current node.
> 1.2. Remove following node.
> 2. if the following node is the sentinel
> 2.1. erase the contents of the current node (iff it has indirections)
> 2.2. set it's 'next' field to null
>
>
> Removing an entire range is just erasing the contents of the current node and setting the 'next' field of the associated Node to zero. Removing a Take!Range is a simple generalization of the algorithm above.

This would be a good addition to the type.  It could not be a stableRemove, however, because it would invalidate any ranges pointing at the node you removed.  Can you put together a pull request?  If not, I can see about doing it.  One issue, you could not do this if the value is immutable/const.

I could use this idea, I think, to implement a singly linked list in dcollections as well (the prospect of not having O(1) removal is what has stopped me).  Thanks for the idea!

-Steve
November 03, 2011
On Thu, 03 Nov 2011 14:08:31 -0400, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:

> On 03.11.2011 21:13, Steven Schveighoffer wrote:
>>
>> dcollections stipulates that all ranges/cursors can be verified in
>> O(lgn) time or better to belong to a specific container. In some cases,
>> this adds an extra word to the range/cursor, and in others, it's easy to
>> determine or the owner-reference was already needed. Since everything is
>> a class, the fallback is to just stick an owner class instance in the
>> range.
>>
>> This stipulation is necessary to allow safe slicing.
>>
>
> Seems reasonable, I'd expect checks to go away in release, right(?).

For the moment, no.  I am not sure whether this is the right decision or not, because once you get beyond arrays, when to do bounds checks becomes fuzzy.

For example, imagine you have this:

auto ts = new TreeSet!int(1, 3, 5, 7, 9);

What does this mean?

auto r = ts[2..4];

Note that a range type for a treeset has a pointer to a begin and end node for the container.  For arrays, not doing a bounds check is simply less code.  For a RBTree, you still have to look for the specific node, even if you are in release mode.

Another example:

auto ts2 = ts.dup; // duplicate the treeset

auto c1 = ts2.elemAt(3); // get the node for 3 in ts2

auto r2 = ts[c1..ts.end];

Here I can say, we can skip the belongs check (which is an O(lgn) walk up the tree).

But I'm still doing it in release mode.  I'm not sure what's best.  Should I just do the least work possible, or should I make it consistent with ts[2..4]?

-Steve
November 03, 2011
On Thu, 03 Nov 2011 22:08:31 +0400, Dmitry Olshansky wrote:

> On 03.11.2011 21:13, Steven Schveighoffer wrote:
>> The range type for a SList has a single pointer to the currently iterated node. How do you remove that node without having access to the head/previous pointer?
>>
>>
> removeAfter ? ;) Hm though it does hit as strange. Looking at my code, I usually keep previous node but only when I remove elements during iteration.

Matt Austern's excellent "STL Singly Linked Lists" presentation is relevant:

  http://accu.org/index.php/accu_branches/accu_usa/past

Slide 11 is titled "Solving the insert problem: off-by-one iterators". Also slide 16.

Ali