View mode: basic / threaded / horizontal-split · Log in · Help
February 22, 2011
Re: We need to rethink remove in std.container
On Tue, 22 Feb 2011 16:55:29 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> On 2/22/11 3:23 PM, Philippe Sigaud wrote:
>> Btw, is there a way to have
>> an empty container and use an output range to fill it? I didn't look
>> at std.container for quite some time.
>
> No, but this may be an interesting idea to pursue.

s/insert/put

Now all containers that support insertion are output ranges...

Or am I missing something?

-Steve
February 23, 2011
Re: We need to rethink remove in std.container
On Tue, Feb 22, 2011 at 23:20, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
>
> s/insert/put
>
> Now all containers that support insertion are output ranges...
>
> Or am I missing something?

I mean that, in the same way that a container can expose its elements
as an input range using [], a partially filled container could give
access to its empty 'slots' (created by 'reserve()' ) via an output
range.
February 23, 2011
Re: We need to rethink remove in std.container
On Tuesday 22 February 2011 05:01:24 Andrei Alexandrescu wrote:
> On 2/21/11 8:55 PM, Jonathan M Davis wrote:
> > Okay, removing elements from a container sucks right now. You can do
> > stuff like removeAny (generally pretty useless IMHO) or removeFront just
> > fine, but removing an arbitrary range from a container just plain sucks.
> > 
> > remove takes a range type which is the range type for the container that
> > it's on. That makes sense. It's obviously not going to be able to a take
> > an arbitrary range and remove that from itself. How would it know which
> > elements in that range corresponded to which elements in itself -
> > especially when it could be a range which skips elements or something
> > similar? So, _that_ part makes sense.
> > 
> > But have you actually tried to get a range of the appropriate type to
> > remove from a container? It seems like almost ever function in std.range
> > and std.algorithm returns a new range type, making them completely
> > useless for processing a range to be removed from a container.
> > 
> > I was looking to remove a single element for a RedBlackTree. The best
> > function that I could think to get the proper range was findSplit. The
> > middle portion of the return value would be the portion that I was
> > looking for. I'd take that and pass it to RedBlackTree's remove. Wrong.
> > It uses takeExactly in its implementation and the first two portions of
> > the result of findSplit aren't the right range type.
> 
> It's good that positioned remove does not work with findSplit. It would
> be inefficient to use the wonderfully structured tree for linear search.
> The algorithm should be simple - find the key to delete in O(log n),
> then remove it.
> 
> > So, what do I do? The _only_ thing that I can think of at the moment is
> > to use find to locate the beginning of the range that I want, take the
> > length of the range with walkLength and then use popBackN to pop of the
> > elements I don't want.
> 
> [snip]
> 
> What we need to do is add to RedBlackTree a function that accepts the
> result of take() - the same way SList does. It has only recently dawned
> on me that certain generic ranges are pivotal to algorithms and
> containers, and so far I identified take() and takeExactly() to be two
> such important ranges.
> 
> One other thing that we can add to RedBlackTree is a simple removeKey()
> convenience method that does the find-and-remove thing. Any Phobos
> developer (including JMD of course :o)), please feel free to implement
> all of the above and submit it for pulling. And thanks Jonathan for
> using std.container - I am sure there are a few other wrinkles that will
> be revealed and then ironed out following consistent usage.

I've made several improvements to RedBlackTree and created a pull request for 
them: https://github.com/D-Programming-Language/phobos/pull/10

- Jonathan M Davis
February 23, 2011
Re: We need to rethink remove in std.container
On 2/22/11 8:41 AM, Steven Schveighoffer wrote:
> On Tue, 22 Feb 2011 09:23:00 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> On 2/22/11 7:58 AM, Steven Schveighoffer wrote:
>>> A cursor in dcollections is actually a zero or one element range. It can
>>> only reference exactly one element. Since it has no references to other
>>> elements, it is immune to operations that use the surrounding elements.
>>
>> At this exact point I had what would be either a good idea or a
>> brainfart.
>>
>> I've been thinking for a good while to define a "singleton range", a
>> range that has at most one element. I had the intuition that a
>> singleton range is a useful concept, but I felt it was a bit tenuous
>> to argue in its favor (mostly because one could easily simulate it
>> with repeat(x, 1)), so I never put it in std.range.
>>
>> The definition would go like this:
>>
>> auto singletonRange(T)(T element)
>> {
>> static struct Result
>> {
>> private T _element;
>> private bool _done;
>> @property bool empty() { return _done; }
>> @property auto front() { assert(!empty); return _element; }
>> void popFront() { assert(!empty); _done = true; }
>> auto save() { return this; }
>> }
>>
>> return Result(element, false);
>> }
>
> This is almost the same as a dcollections cursor. The only difference
> is, popFront in a cursor not only sets the '_done' member, but also
> moves to the next element. This distinction is necessary with node-based
> containers that have an end marker node. So the cursor that is returned
> from 'end()' (or from find where the element was not found, etc.) is an
> already-empty cursor.

I thought about this overnight and reached the same conclusion. The 
thing is, what's needed is not one element, but one element of a 
specific range. Here's what I have right now in range.d:

/**
Returns a range with at most one element; for example, $(D
takeOne([42, 43, 44])) returns a range consisting of the integer $(D
42). Calling $(D popFront()) off that range renders it empty.

Sometimes an empty range with the same signature is needed. For such
ranges use $(D takeNone!R()). For example:

----
auto s = takeOne([42, 43, 44]);
static assert(isRandomAccessRange!(typeof(s)));
assert(s.length == 1);
assert(!s.empty);
assert(s.front == 42);
s.front() = 43;
assert(s.front == 43);
assert(s.back == 43);
assert(s[0] == 43);
s.popFront();
assert(s.length == 0);
assert(s.empty);
s = takeNone!(int[])();
assert(s.length == 0);
assert(s.empty);
----

In effect $(D takeOne(r)) is equivalent to $(take(r, 1)) and $(D
takeNone(r)) is equivalent to $(D take(r, 0)), but in certain
interfaces it is important to know statically that the range may only
have at most one element.

The type returned by $(D takeOne) and $(D takeNone) is a random-access
range.
 */
auto takeOne(R)(R source) if (isInputRange!R)
{
    static struct Result
    {
        private R _source;
        bool _empty;
        @property bool empty() const { return _empty; }
        @property auto ref front() { assert(!empty); return 
_source.front; }
        void popFront() { assert(!empty); _empty = true; }
        void popBack() { assert(!empty); _empty = true; }
        auto save() { return Result(_source.save, empty); }
        @property auto ref back() { assert(!empty); return _source.front; }
        @property size_t length() const { return !empty; }
        auto ref opIndex(size_t n) { assert(n < length); return 
_source.front; }
        auto opSlice(size_t m, size_t n)
        {
            assert(m <= n && n < length);
            return n > m ? this : Result(_source, false);
        }
    }

    return Result(source, source.empty);
}

/// Ditto
auto takeNone(R)() if (isInputRange!R)
{
    return typeof(takeOne(R.init))(R.init, true);
}

unittest
{
    auto s = takeOne([42, 43, 44]);
    static assert(isRandomAccessRange!(typeof(s)));
    assert(s.length == 1);
    assert(!s.empty);
    assert(s.front == 42);
    s.front() = 43;
    assert(s.front == 43);
    assert(s.back == 43);
    assert(s[0] == 43);
    s.popFront();
    assert(s.length == 0);
    assert(s.empty);
    s = takeNone!(int[])();
    assert(s.length == 0);
    assert(s.empty);
}

> I hadn't thought of just setting the empty variable, but I think it
> wouldn't work properly. You could possibly say "a cursor that points to
> an element but is empty is actually pointing *after* the element." But
> then I'm not sure how to construct an end cursor. It's advantageous to
> point at the end of a container, where there is no true node to point
> at. Is there another way to think about this?

Yah, I think the right approach is to base the 0/1 range on an existing 
range, not on an existing value.

>> But now I realized that singleton ranges are your cursors, so I'll
>> definitely add them. And you also made me realize that a singleton
>> range is very different from other ranges in the same way 1 is
>> different from other numbers. Great.
>
> Yes, that distinction is really important, I'm glad you see that!
>
>> This may as well be The Great Unification that was needed to converge
>> std.container with dcollections in a harmonious whole.
>
> I'm not sure what to think of this ;) Is this a "oh what could have
> been" statement or a "let's reopen negotiations" statement?
>
> Note, I would love for dcollections to be mainstream Phobos, but I have
> accepted the current reality and am fine with the way things are now too.

Absolutely, and your collaboration on e.g. RedBlackTree and others is 
very much appreciated.

The way I was thinking is, there would a lot of convergence if 
dcollections wiped the notion of "cursor" and replaced it with takeOne 
throughout. One obvious objection to the range/cursor design is there 
are two abstractions to deal with instead of one. I had the intuition 
that ranges should be enough, but was not quite able to substantiate it 
fully until just about now. With take, takeExactly, and takeOne, it 
seems that indeed there is only need for the range interface, and that 
particular range types can replace cursors.

I realize that there are other major differences in design between 
dcollections and std.container, probably the largest one being the use 
of a hierarchy vs. free-floating types, so probably full unification is 
still not possible or quite remote. Nevertheless, keeping the 
communication channels open would make both libraries better and all of 
us a bit wiser.


Andrei
February 23, 2011
Re: We need to rethink remove in std.container
On Wed, 23 Feb 2011 08:33:03 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> On 2/22/11 8:41 AM, Steven Schveighoffer wrote:
>> On Tue, 22 Feb 2011 09:23:00 -0500, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:

>>> This may as well be The Great Unification that was needed to converge
>>> std.container with dcollections in a harmonious whole.
>>
>> I'm not sure what to think of this ;) Is this a "oh what could have
>> been" statement or a "let's reopen negotiations" statement?
>>
>> Note, I would love for dcollections to be mainstream Phobos, but I have
>> accepted the current reality and am fine with the way things are now  
>> too.
>
> Absolutely, and your collaboration on e.g. RedBlackTree and others is  
> very much appreciated.
>
> The way I was thinking is, there would a lot of convergence if  
> dcollections wiped the notion of "cursor" and replaced it with takeOne  
> throughout. One obvious objection to the range/cursor design is there  
> are two abstractions to deal with instead of one. I had the intuition  
> that ranges should be enough, but was not quite able to substantiate it  
> fully until just about now. With take, takeExactly, and takeOne, it  
> seems that indeed there is only need for the range interface, and that  
> particular range types can replace cursors.

I don't think this is possible.  When passing a range to a container for  
removal, the container needs to access the underlying implementation (e.g.  
pointer to tree node) in order to immediately know what element is being  
removed.  The takeOne range you propose does not allow this, you can't get  
at the underlying range.  Even if you could, only takeOne!(myrangetype)  
would work, which leaves us right back at supporting a small number of  
specific ranges.

In addition, a cursor points to one element, a range points to multiple  
elements.  *conceptually* takeOne is the same, but it still stores the  
range internally, so it's actually more expensive than a range to pass  
around (additional _empty member).

I was thinking more along the lines of ranges being able to generate  
cursors if it makes sense (i.e. base range is a range of the container).   
Dcollections already supports this with the two accessors begin() and  
end().  We might also need a last() which gets the last valid element in  
the range.  This would be akin to back().

Here is what I was thinking that can help with unifying interface.  I'll  
just show you as a code snippet:

void remove(C)(C c) if (is(C == mycursortype))
{
   // since we know the structure of c, we can use it to remove the  
element from the container
}

void remove(R)(R r) if (is(R == myrangetype))
{
  // possibly can be optimized
}

void remove(R)(R r) if (!is(R == myrangetype) && isInputRange!R &&  
is(typeof(r.begin) == mycursortype))
{
  while(!r.empty)
  {
    auto c = r.begin;
    r.popFront();
    remove(c);
  }
}

If wrapper ranges can "forward" the begin and end functions through their  
API, then you can construct arbitrary ranges that will be accepted as  
parameters to remove.  We could potentially add forwarding functions to  
all wrapper ranges, or we could pick ones that make the most sense (like  
take).

In addition, dcollections (and any collection that can verify its  
endpoints and cursor membership, SList obviously cannot) allows slicing  
based on cursors in a limited fashion.  Therefore, this allows  
constructing ranges from cursors if you have the original container.

As an example of what this could allow (assuming retro and take forward  
the begin(), end() and last() methods):

container.remove(take(retro(container[]), 5)); // remove the last 5  
elements from the container.

>
> I realize that there are other major differences in design between  
> dcollections and std.container, probably the largest one being the use  
> of a hierarchy vs. free-floating types, so probably full unification is  
> still not possible or quite remote. Nevertheless, keeping the  
> communication channels open would make both libraries better and all of  
> us a bit wiser.

As I've said in the past, the interface hierarchy is not critical at the  
moment because D has very little shared library support.  I'm willing to  
remove the interfaces as long as the types remain classes (which I believe  
you are more agreeable with lately).  This allows the interfaces to be  
added on later if it makes sense.

The cursor concept has to stay though.

-Steve
February 23, 2011
Re: We need to rethink remove in std.container
On Wed, 23 Feb 2011 01:30:49 -0500, Philippe Sigaud  
<philippe.sigaud@gmail.com> wrote:

> On Tue, Feb 22, 2011 at 23:20, Steven Schveighoffer  
> <schveiguy@yahoo.com> wrote:
>>
>> s/insert/put
>>
>> Now all containers that support insertion are output ranges...
>>
>> Or am I missing something?
>
> I mean that, in the same way that a container can expose its elements
> as an input range using [], a partially filled container could give
> access to its empty 'slots' (created by 'reserve()' ) via an output
> range.

Oh, that's easy.  Any container range that allows setting front elements  
is fair game.

For example SList's range can be an output range right now.  (I believe  
isOutputRange!(SList.init[]) should be true)

What's not available is adding new elements to a container via the output  
range concept.

-Steve
February 23, 2011
Re: We need to rethink remove in std.container
On Wed, 23 Feb 2011 07:01:12 -0500, Jonathan M Davis <jmdavisProg@gmx.com>  
wrote:

> I've made several improvements to RedBlackTree and created a pull  
> request for
> them: https://github.com/D-Programming-Language/phobos/pull/10

I'll be sure to look at this soon.  I really need to ramp up my git  
skills...

Anyone know where I can find a 30 hour day? ;)

-Steve
February 23, 2011
Re: We need to rethink remove in std.container
On 2/23/11 8:51 AM, Steven Schveighoffer wrote:
> On Wed, 23 Feb 2011 08:33:03 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> On 2/22/11 8:41 AM, Steven Schveighoffer wrote:
>>> On Tue, 22 Feb 2011 09:23:00 -0500, Andrei Alexandrescu
>>> <SeeWebsiteForEmail@erdani.org> wrote:
>
>>>> This may as well be The Great Unification that was needed to converge
>>>> std.container with dcollections in a harmonious whole.
>>>
>>> I'm not sure what to think of this ;) Is this a "oh what could have
>>> been" statement or a "let's reopen negotiations" statement?
>>>
>>> Note, I would love for dcollections to be mainstream Phobos, but I have
>>> accepted the current reality and am fine with the way things are now
>>> too.
>>
>> Absolutely, and your collaboration on e.g. RedBlackTree and others is
>> very much appreciated.
>>
>> The way I was thinking is, there would a lot of convergence if
>> dcollections wiped the notion of "cursor" and replaced it with takeOne
>> throughout. One obvious objection to the range/cursor design is there
>> are two abstractions to deal with instead of one. I had the intuition
>> that ranges should be enough, but was not quite able to substantiate
>> it fully until just about now. With take, takeExactly, and takeOne, it
>> seems that indeed there is only need for the range interface, and that
>> particular range types can replace cursors.
>
> I don't think this is possible. When passing a range to a container for
> removal, the container needs to access the underlying implementation
> (e.g. pointer to tree node) in order to immediately know what element is
> being removed. The takeOne range you propose does not allow this, you
> can't get at the underlying range.

I agree that takeOne's source should be made public.

> Even if you could, only
> takeOne!(myrangetype) would work, which leaves us right back at
> supporting a small number of specific ranges.

That's arguably correct and in any case better than supporting a range 
plus a distinct cursor abstraction. You remove the cursor abstraction 
and replace it with a specific realization of the range abstraction.

> In addition, a cursor points to one element, a range points to multiple
> elements. *conceptually* takeOne is the same, but it still stores the
> range internally, so it's actually more expensive than a range to pass
> around (additional _empty member).

I agree. There are a number of ways around this if important, but I need 
first to be clear of the projected uses of takeOne.

> I was thinking more along the lines of ranges being able to generate
> cursors if it makes sense (i.e. base range is a range of the container).
> Dcollections already supports this with the two accessors begin() and
> end(). We might also need a last() which gets the last valid element in
> the range. This would be akin to back().
>
> Here is what I was thinking that can help with unifying interface. I'll
> just show you as a code snippet:
>
> void remove(C)(C c) if (is(C == mycursortype))
> {
> // since we know the structure of c, we can use it to remove the element
> from the container
> }
>
> void remove(R)(R r) if (is(R == myrangetype))
> {
> // possibly can be optimized
> }
>
> void remove(R)(R r) if (!is(R == myrangetype) && isInputRange!R &&
> is(typeof(r.begin) == mycursortype))
> {
> while(!r.empty)
> {
> auto c = r.begin;
> r.popFront();
> remove(c);
> }
> }
>
> If wrapper ranges can "forward" the begin and end functions through
> their API, then you can construct arbitrary ranges that will be accepted
> as parameters to remove. We could potentially add forwarding functions
> to all wrapper ranges, or we could pick ones that make the most sense
> (like take).

Soon as we get to begin() and end() as primitives it all starts being 
problematic. We may as well design with STL-style iterators then. One 
good thing about ranges is that they _don't_ need to rely on lower-level 
primitives.

> In addition, dcollections (and any collection that can verify its
> endpoints and cursor membership, SList obviously cannot) allows slicing
> based on cursors in a limited fashion. Therefore, this allows
> constructing ranges from cursors if you have the original container.
>
> As an example of what this could allow (assuming retro and take forward
> the begin(), end() and last() methods):
>
> container.remove(take(retro(container[]), 5)); // remove the last 5
> elements from the container.

I understand and I see the advantages. Still I believe that the 
disadvantages of dealing with cursors overcome this potential 
flexibility. I'd be more inclined to look for a design for which ranges 
suffice for iteration.


Andrei
February 23, 2011
Re: We need to rethink remove in std.container
On Wed, 23 Feb 2011 12:16:26 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> On 2/23/11 8:51 AM, Steven Schveighoffer wrote:
>>
>> I don't think this is possible. When passing a range to a container for
>> removal, the container needs to access the underlying implementation
>> (e.g. pointer to tree node) in order to immediately know what element is
>> being removed. The takeOne range you propose does not allow this, you
>> can't get at the underlying range.
>
> I agree that takeOne's source should be made public.

But what about takeOne(retro(myrange))?  If support for that was desired,  
one would need to do it explicitly.  As each new wrapper type is desired,  
one needs to add support for the specific configuration of wrappers.

>> Even if you could, only
>> takeOne!(myrangetype) would work, which leaves us right back at
>> supporting a small number of specific ranges.
>
> That's arguably correct and in any case better than supporting a range  
> plus a distinct cursor abstraction. You remove the cursor abstraction  
> and replace it with a specific realization of the range abstraction.

At the end of the day, a container needs a cursor (or something like it)  
in order to remove an element.  The cursor is the gateway between the safe  
range abstraction, and the unsafe internal representation (with nasty  
pointers).

In reality, supporting cursors and ranges vs. takeOne and ranges only  
differs in that you need to add a cursor type.  The number of functions  
required remains the same (the cursor handling functions replace the  
takeOne handling functions).

> Soon as we get to begin() and end() as primitives it all starts being  
> problematic. We may as well design with STL-style iterators then. One  
> good thing about ranges is that they _don't_ need to rely on lower-level  
> primitives.

Ranges for containers necessarily need to rely on lower level primitives  
as I stated above.  A cursor gives a safe abstraction of such a primitive.

Example, An SList range:


    struct Range
    {
        private Node * _head; // <------ this is the primitive
        private this(Node * p) { _head = p; }
        /// Forward range primitives.
        bool empty() const { return !_head; }
        /// ditto
        @property Range save() { return this; }
        /// ditto
        @property T front() { return _head._payload; }
        /// ditto
        @property void front(T value)
        {
            enforce(_head);
            _head._payload = value;
        }
        ...

Note that the range *hides* the unsafe primitive.  But so does a cursor.   
Essentially, a cursor is a way to refer to one of those primitives.   
Seeing as how the implementation needs to use one of those primitives,  
being able to pass around a safe version of that primitive is very  
advantageous.  Requiring passing around a range of them is problematic for  
many functions.

>> In addition, dcollections (and any collection that can verify its
>> endpoints and cursor membership, SList obviously cannot) allows slicing
>> based on cursors in a limited fashion. Therefore, this allows
>> constructing ranges from cursors if you have the original container.
>>
>> As an example of what this could allow (assuming retro and take forward
>> the begin(), end() and last() methods):
>>
>> container.remove(take(retro(container[]), 5)); // remove the last 5
>> elements from the container.
>
> I understand and I see the advantages. Still I believe that the  
> disadvantages of dealing with cursors overcome this potential  
> flexibility. I'd be more inclined to look for a design for which ranges  
> suffice for iteration.

Then I think we shall keep things the way they are.  Cursors are  
absolutely necessary in my model of collections, and to get rid of them  
would abandon the most fundamental reason I created dcollections.

-Steve
February 23, 2011
Re: We need to rethink remove in std.container
On Wednesday, February 23, 2011 06:56:45 Steven Schveighoffer wrote:
> On Wed, 23 Feb 2011 07:01:12 -0500, Jonathan M Davis <jmdavisProg@gmx.com>
> 
> wrote:
> > I've made several improvements to RedBlackTree and created a pull
> > request for
> > them: https://github.com/D-Programming-Language/phobos/pull/10
> 
> I'll be sure to look at this soon.  I really need to ramp up my git
> skills...
> 
> Anyone know where I can find a 30 hour day? ;)

Yeah. It's hiding under couch, but sneaky. So, if you go looking, odds are you 
won't find it, and then you'll have even _fewer_ hours in the day. :)

- Jonathan M Davis
Next ›   Last »
1 2 3 4
Top | Discussion index | About this forum | D home