output ranges: by ref or by value? (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » output ranges: by ref or by value? (page 4)

January 03, 2010

Re: output ranges: by ref or by value?

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

On Sun, 03 Jan 2010 00:49:08 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Steven Schveighoffer wrote:
>> My theory is, given this list of ranges, if you pair them with an algorithm that requires save capability, you wouldn't want to use that algorithm on it anyways (kinda like the consume example).
>
> Why gratuitously limit the design? You're asking to replace this:
>
> R save() { return this; }
>
> with:
>
> enum thisIsAForwardRange = true;
>
> Is there a reason? The former leaves in flexibility. The latter doesn't, for no good reason.

Well, one thing you could do is:

enum thisIsAnInputRange = true;

and then no special implementation is needed for normal forward ranges.  the other point is there is no special treatment needed inside algorithms -- the risk of forgetting to use save at the right points of the algorithm is higher than forgetting to say isForwardRange!(R) at the beginning of the function.

>
>>> Struct ranges won't work with a container hierarchy. If you define a container hierarchy (classes etc.) you'll also need a range hierarchy. Otherwise defining the inheritance relationship is impossible. Consider:
>>>
>>> abstract class Container(E) { // most general container
>>>      @property bool empty();
>>>      bool add(E element);
>>>      E removeAny();
>>>      InputRange!E opSlice();
>>> }
>>>
>>> That's what I'd expect any container worth its salt to implement: (a) test for empty; (b) add an element to the container; (c) remove some element from the container; (d) get a range that spans the entire container. (Probably removeAll() and a range-positioned remove() would make sense, too.)
>>  The range interface is compile-time only, there is no need to define it in the container interface.  opSlice does not need to be part of the interface, even if it is defined on all the container types.
>>  opApply makes for a much better interface-friendly iteration anyways.
>
> I want container users to be able to save iteration state and also to open up containers to std.algorithm and other goodies, so I'm shying away from opApply.

Not having opSlice be part of the interface itself does not preclude it from implementing opSlice, and does not preclude using ranges of it in std.algorithm.  If I'm not mistaken, all functions in std.algorithm rely on compile time interfaces.  opApply allows for full input range functionality for things like copying and outputting where templates may not be desired.

>
>> BTW, the primitives in dcollections are:
>>  clear(); // clear all elements
>> remove(V v); // remove an element
>
> Search and remove? That's an odd primitive. Why wouldn't you offer an interface for iteration that allows an algorithm for search, and a primitive for positioned removal?

Search and positioned removal are also a primitives, but not defined on the interface.  remove was a primitive on Tango's containers, and dcollections was originally meant to be a replacement for Tango's containers.

I think the point is, if you have an interface reference, what would be the minimum functionality needed so that you could use a container without knowing its implementation.

>
>> contains(V v); // returns whether an element is contained in the colleciton
>
> I don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.

Again, it's part of the minimal usable interface.  It's not a generic algorithm, because some containers can implement this more efficiently.  Plus, to use the generic algorithms, you would need to use interfaces as ranges which I think are completely useless.

>
>> length(); // the length of the collection
>
> That's not a primitive either. std.algorithm.walkLength is. For me, all sorts of red flags and alarm buzzers go off when primitives are guaranteed that can't be implemented efficiently but by a subset of containers. You can't discount complexity as a implementation detail.

All current dcollection containers have O(1) length.

>
>> dup(); // duplicate the collection
>> opApply(int delegate(ref V v) dg); // iterate the collection
>> opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the collection
>>  That means it covers only empty in your list of must-have functions (via length() == 0).
>
> How do you implement length() for a singly-linked list? Is empty() going to take O(n)?

first, dcollections' list implementation is doubly linked because all collections are forward and backward iterable.

second, even for singly linked lists, you can have either O(1) length or O(1) splicing (consuming a link list range into another linked list).  Dcollections' default link list implementation uses O(1) length since I think splicing is a specialized requirement.

>
>> Add is not a primitive because the Map collections shouldn't assign some random key to the new element.  removeAny is defined only on sets and multisets, but I'm not sure that it couldn't be moved to Collection, in fact, I'll probably do that.
>
> add is a primitive that takes Tuple!(K, V) as the element type.

How do you define that on Container(V)?  on Map(K, V), set(K k, V v) is an interface method.

what you can do is define Map(K, V) as inheriting Container(Tuple!(K, V)), but then trying to use the container functions are very cumbersome.  In dcollections, Map(K, V) inherits Collection(V).

>
>> Note that it's missing begin and end which are defined on every single container type (i.e. the equivalent of the all-elements range).  This is because those primitives return a struct that is different for every container type.
>
> So you can't write container-independent iteration code unless you use opApply, in which case composition becomes tenuous.

No, you can easily write container-independent iteration as long as you have the implementation.

If you use interfaces you can write opApply wrappers to do different things.  I'm not sure what you mean by composition.

>
>> It also surpasses opSlice via opApply, since all an input range can do anyways is iterate.  In fact, opApply is more powerful because you can change elements and remove elements (via purging).  Plus it's more efficient than a range-via-interface.
>
> An input range is a subset of other (more powerful) ranges. It's also much more flexible. I agree that calling range primitives via interfaces can become an efficiency liability.

How is it more flexible?  You can't replace data, and you can't remove data while iterating, both of which are possible with dcollection's primitives.  If you have a Container(E) which defines InputRange!E opSlice, how do you get at the more defined range definition?  casting?

>
>>>> I see a range as being useful for iteration or algorithms, but not for general use.  A great example is AAs.  Would you say that an AA *is* a range or should *provide* a range?  If it is a range, does that mean you remove elements as you call popFront?  Does that make any sense?  If it doesn't, then what happens if you add elements through another alias to that AA?
>>>
>>> An AA provides several ranges - among which byKey and byValue.
>>  I misunderstood your statement "[a container hierarchy] does need range interfaces."  I thought you meant containers themselves need to implement the range interface, I see now that isn't the case, so my bad.
>
> Yah, they'd offer it as a result of opSlice(). Covariant return types will ensure there's no virtual call when you know what container you operate on.

Not having opSlice on the interface guarantees you never have a virtual call for iteration :)  opApply mitigates the virtual call on the interface.

> Above all: the primitive set for a container must be a small set of functions that (a) can be implemented by all containers within reasonable efficiency bounds, and (b) are container-specific, not generic. IMHO any container design that defines a search(Element) as a primitive is misguided. Searching is not a container primitive - it's an algorithm. Only more specialized containers (e.g. trees, hashes etc.) can afford to define search(Element) as a primitive. Linear search is a generic algorithm that works the same for everyone. It does not belong as a method of any container.

If the minimal container design isn't usable without std.algorithm, then I don't think it's worth having interfaces.  If you need std.algorithm, you need the full implementation of the container because it's a compile-time interface.  Interface ranges are something that should be avoided, it's like having a programming language where everything has to be a class.

What you are saying seems completely incorrect to me: "since not all containers can implement fast search, I'm going to guarantee that *all* containers use a linear search via their interface.  *AND* I want to make each loop in the search algorithm call 3 virtual functions!"  How is that better than a search function that guarantees linear performance but gives the option of being as fast as possible with no loop virtual calls?  In fact, my implementation is guaranteed to be faster than std.algorithm because of the lack of virtual calls.

-Steve

January 03, 2010

Re: output ranges: by ref or by value?

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

Steven Schveighoffer wrote:
> On Sun, 03 Jan 2010 00:49:08 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> 
>> Steven Schveighoffer wrote:
>>> My theory is, given this list of ranges, if you pair them with an algorithm that requires save capability, you wouldn't want to use that algorithm on it anyways (kinda like the consume example).
>>
>> Why gratuitously limit the design? You're asking to replace this:
>>
>> R save() { return this; }
>>
>> with:
>>
>> enum thisIsAForwardRange = true;
>>
>> Is there a reason? The former leaves in flexibility. The latter doesn't, for no good reason.
> 
> Well, one thing you could do is:
> 
> enum thisIsAnInputRange = true;
> 
> and then no special implementation is needed for normal forward ranges.  the other point is there is no special treatment needed inside algorithms -- the risk of forgetting to use save at the right points of the algorithm is higher than forgetting to say isForwardRange!(R) at the beginning of the function.

isForwardRange will be defined to yield true if and only if the range defines save. But I see you point - user code only asserts isForwardRange and then does not bother to use save(), just copies stuff around in confidence that copying does the right thing.

Thanks for this insight. I don't see how to reconcile that with class ranges and covariance.

> Not having opSlice be part of the interface itself does not preclude it from implementing opSlice, and does not preclude using ranges of it in std.algorithm.  If I'm not mistaken, all functions in std.algorithm rely on compile time interfaces.  opApply allows for full input range functionality for things like copying and outputting where templates may not be desired.

The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.

>>> BTW, the primitives in dcollections are:
>>>  clear(); // clear all elements
>>> remove(V v); // remove an element
>>
>> Search and remove? That's an odd primitive. Why wouldn't you offer an interface for iteration that allows an algorithm for search, and a primitive for positioned removal?
> 
> Search and positioned removal are also a primitives, but not defined on the interface.  remove was a primitive on Tango's containers, and dcollections was originally meant to be a replacement for Tango's containers.
> 
> I think the point is, if you have an interface reference, what would be the minimum functionality needed so that you could use a container without knowing its implementation.

Yes, and I think remove(V v) does not belong to the minimum functionality. It combines two functions (search and remove) and raises questions such as what happens to duplicate elements.

>>> contains(V v); // returns whether an element is contained in the colleciton
>>
>> I don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.
> 
> Again, it's part of the minimal usable interface.  It's not a generic algorithm, because some containers can implement this more efficiently.

But this is exactly what I believe to be a mistake: you are abstracting away algorithmic complexity.

> Plus, to use the generic algorithms, you would need to use interfaces as ranges which I think are completely useless.

Why?

>>> length(); // the length of the collection
>>
>> That's not a primitive either. std.algorithm.walkLength is. For me, all sorts of red flags and alarm buzzers go off when primitives are guaranteed that can't be implemented efficiently but by a subset of containers. You can't discount complexity as a implementation detail.
> 
> All current dcollection containers have O(1) length.

Some containers can't define O(1) length conveniently.

>>> dup(); // duplicate the collection
>>> opApply(int delegate(ref V v) dg); // iterate the collection
>>> opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the collection
>>>  That means it covers only empty in your list of must-have functions (via length() == 0).
>>
>> How do you implement length() for a singly-linked list? Is empty() going to take O(n)?
> 
> first, dcollections' list implementation is doubly linked because all collections are forward and backward iterable.
> 
> second, even for singly linked lists, you can have either O(1) length or O(1) splicing (consuming a link list range into another linked list).  Dcollections' default link list implementation uses O(1) length since I think splicing is a specialized requirement.

Right. The question is how much pressure Container is putting on the implementation. I'd rather leave it to the list implementation to decide to store the length or not.

>>> Add is not a primitive because the Map collections shouldn't assign some random key to the new element.  removeAny is defined only on sets and multisets, but I'm not sure that it couldn't be moved to Collection, in fact, I'll probably do that.
>>
>> add is a primitive that takes Tuple!(K, V) as the element type.
> 
> How do you define that on Container(V)?  on Map(K, V), set(K k, V v) is an interface method.

Map!(K, V) has Tuple!(K, V) as its element type.

> what you can do is define Map(K, V) as inheriting Container(Tuple!(K, V)), but then trying to use the container functions are very cumbersome.  In dcollections, Map(K, V) inherits Collection(V).
> 
>>
>>> Note that it's missing begin and end which are defined on every single container type (i.e. the equivalent of the all-elements range).  This is because those primitives return a struct that is different for every container type.
>>
>> So you can't write container-independent iteration code unless you use opApply, in which case composition becomes tenuous.
> 
> No, you can easily write container-independent iteration as long as you have the implementation.

In this context: container-independent = using the Container interface. This is the whole purpose of creating a container hierarchy. If the container design fosters knowing the implementation, maybe a class hierarchy is not the right choice in the first place.

> If you use interfaces you can write opApply wrappers to do different things.  I'm not sure what you mean by composition.

For example, compose ranges a la retro or take.

>>> It also surpasses opSlice via opApply, since all an input range can do anyways is iterate.  In fact, opApply is more powerful because you can change elements and remove elements (via purging).  Plus it's more efficient than a range-via-interface.
>>
>> An input range is a subset of other (more powerful) ranges. It's also much more flexible. I agree that calling range primitives via interfaces can become an efficiency liability.
> 
> How is it more flexible?  You can't replace data, and you can't remove data while iterating, both of which are possible with dcollection's primitives.  If you have a Container(E) which defines InputRange!E opSlice, how do you get at the more defined range definition?  casting?

You can replace data by assigning to range's elements. Removal is done via positioned remove (which I think needs to be a primitive).

>>>>> I see a range as being useful for iteration or algorithms, but not for general use.  A great example is AAs.  Would you say that an AA *is* a range or should *provide* a range?  If it is a range, does that mean you remove elements as you call popFront?  Does that make any sense?  If it doesn't, then what happens if you add elements through another alias to that AA?
>>>>
>>>> An AA provides several ranges - among which byKey and byValue.
>>>  I misunderstood your statement "[a container hierarchy] does need range interfaces."  I thought you meant containers themselves need to implement the range interface, I see now that isn't the case, so my bad.
>>
>> Yah, they'd offer it as a result of opSlice(). Covariant return types will ensure there's no virtual call when you know what container you operate on.
> 
> Not having opSlice on the interface guarantees you never have a virtual call for iteration :)  opApply mitigates the virtual call on the interface.

And takes away the ability to compose ranges and to use algorithms with the container.

>> Above all: the primitive set for a container must be a small set of functions that (a) can be implemented by all containers within reasonable efficiency bounds, and (b) are container-specific, not generic. IMHO any container design that defines a search(Element) as a primitive is misguided. Searching is not a container primitive - it's an algorithm. Only more specialized containers (e.g. trees, hashes etc.) can afford to define search(Element) as a primitive. Linear search is a generic algorithm that works the same for everyone. It does not belong as a method of any container.
> 
> If the minimal container design isn't usable without std.algorithm, then I don't think it's worth having interfaces.

Why?

I think the other way: if the minimal container design is unusable with std.algorithm, the design took a wrong turn somewhere.

> If you need std.algorithm, you need the full implementation of the container because it's a compile-time interface.

How did you reach that conclusion? std.algorithm uses a syntactic interface that is obeyed by interfaces too. There's no problem.

> Interface ranges are something that should be avoided, it's like having a programming language where everything has to be a class.

I disagree. The negation of an implementation dogma can be just as limiting as the dogma. The way I see it, a design defines some desiderata. Then it does whatever it takes to fulfill them.

If one desideratum is to use a class hierachy to write container-independent code, then interface ranges naturally follow. There's no ifs and buts about it.

> What you are saying seems completely incorrect to me: "since not all containers can implement fast search, I'm going to guarantee that *all* containers use a linear search via their interface.

This is a misunderstanding. In the STL linear containers don't define find(), but associative containers do. That is the correct design.

> *AND* I want to make each loop in the search algorithm call 3 virtual functions!"

Not necessarily. This happens only if you use the Container interface to write container-independent code. It is the cost it takes for the design to fulfill its desiderata.

> How is that better than a search function that guarantees linear performance but gives the option of being as fast as possible with no loop virtual calls?

It is better because it doesn't write off search complexity as an implementation detail. "Linear or better" is not a good guarantee. A good guarantee is "From this node of the hierarchy down, this primitive is defined to guarantee O(log n) search". Linear search is a generic algorithm and does not belong to any container.


Andrei

January 03, 2010

Re: output ranges: by ref or by value?

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Steven Schveighoffer wrote:
>
>> Not having opSlice be part of the interface itself does not preclude it from implementing opSlice, and does not preclude using ranges of it in std.algorithm.  If I'm not mistaken, all functions in std.algorithm rely on compile time interfaces.  opApply allows for full input range functionality for things like copying and outputting where templates may not be desired.
>
> The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.

The answer is, you don't.  Ranges via interfaces are slower than primitive functions defined on the interface, so use what's best for interfaces when you have an interface and what's best for compile-time optimization when you have the full implementation.

>>>> BTW, the primitives in dcollections are:
>>>>  clear(); // clear all elements
>>>> remove(V v); // remove an element
>>>
>>> Search and remove? That's an odd primitive. Why wouldn't you offer an interface for iteration that allows an algorithm for search, and a primitive for positioned removal?
>>  Search and positioned removal are also a primitives, but not defined on the interface.  remove was a primitive on Tango's containers, and dcollections was originally meant to be a replacement for Tango's containers.
>>  I think the point is, if you have an interface reference, what would be the minimum functionality needed so that you could use a container without knowing its implementation.
>
> Yes, and I think remove(V v) does not belong to the minimum functionality. It combines two functions (search and remove) and raises questions such as what happens to duplicate elements.

the function removes one value if it is in the container, zero if it is not.

Another primitive interface Multi(V) adds a removeAll(V v) function.

Again, it combines two functions that are awkward or difficult to implement using ranges via interfaces.

>>>> contains(V v); // returns whether an element is contained in the colleciton
>>>
>>> I don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.
>>  Again, it's part of the minimal usable interface.  It's not a generic algorithm, because some containers can implement this more efficiently.
>
> But this is exactly what I believe to be a mistake: you are abstracting away algorithmic complexity.

Not really, I say right in the docs that contains(V) can take O(n) time.  There's no abstraction.  I don't say that the algorithmic complexity is defined by the implementation.  It's no different than std.algorithm's find.

>
>> Plus, to use the generic algorithms, you would need to use interfaces as ranges which I think are completely useless.
>
> Why?

3 virtual calls per iteration, no possibility to inline, and reference copy semantics.  The latter is the biggest problem for me.  I want to be able to save ranges (iterators) for later use on containers, and having to call to the heap for each save is a deal killer.

>
>>>> length(); // the length of the collection
>>>
>>> That's not a primitive either. std.algorithm.walkLength is. For me, all sorts of red flags and alarm buzzers go off when primitives are guaranteed that can't be implemented efficiently but by a subset of containers. You can't discount complexity as a implementation detail.
>>  All current dcollection containers have O(1) length.
>
> Some containers can't define O(1) length conveniently.

length is actually defined to return ~0 if it cannot compute the length quickly :)  Forgot to mention that detail.  So such containers have an "out" so to say.

In fact, length is part of a different primitive I defined: Iterator(V)

Iterator(V) defines length and opApply and can be implemented by any class, not just containers.  My original goal when I planned dcollections for tango was for things like LineInput file readers and such to implement the Iterator interface.  That way you could do cool stuff like:

container.add(new LineInput("file.txt")); // add all the lines from the file

but it didn't come to fruition, so Iterator is only implemented on dcollections...

>>>> dup(); // duplicate the collection
>>>> opApply(int delegate(ref V v) dg); // iterate the collection
>>>> opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the collection
>>>>  That means it covers only empty in your list of must-have functions (via length() == 0).
>>>
>>> How do you implement length() for a singly-linked list? Is empty() going to take O(n)?
>>  first, dcollections' list implementation is doubly linked because all collections are forward and backward iterable.
>>  second, even for singly linked lists, you can have either O(1) length or O(1) splicing (consuming a link list range into another linked list).  Dcollections' default link list implementation uses O(1) length since I think splicing is a specialized requirement.
>
> Right. The question is how much pressure Container is putting on the implementation. I'd rather leave it to the list implementation to decide to store the length or not.

That is possible with my design.  I realize I didn't fully disclose the nature of length before.  But of course, there aren't any containers in dcollections that *don't* define length.

However, you make it sound like there are so many containers that would want to not store the length.  There is only one -- linked lists, and only with special requirements (O(1) splicing)

I guess this all goes back to how empty is to be composed.  In dcollections, all containers implement O(1) length, so that is not an issue.  If you had a specialized linked list implementation, I'm not sure how you could compose it with my primitives.  But I don't consider it to be a worthwhile deficiency to fix.  most of the time, you care if something is empty because you are about to take an element, or iterate it.  Just call the thing you want to do, and there are ways to check if it didn't return anything.  In any case, empty could be defined as a primitive if necessary (in all dcollections it would be written return length == 0).

>>>> Add is not a primitive because the Map collections shouldn't assign some random key to the new element.  removeAny is defined only on sets and multisets, but I'm not sure that it couldn't be moved to Collection, in fact, I'll probably do that.
>>>
>>> add is a primitive that takes Tuple!(K, V) as the element type.
>>  How do you define that on Container(V)?  on Map(K, V), set(K k, V v) is an interface method.
>
> Map!(K, V) has Tuple!(K, V) as its element type.

That makes things awkward.  For example, to remove, you must remove the K-V pair.  typically you only want to specify the K or the V.  I realize it doesn't make things awkward for your container interface since you don't *define* a remove function, but your container interface isn't very useful as a general container.

>>>> Note that it's missing begin and end which are defined on every single container type (i.e. the equivalent of the all-elements range).  This is because those primitives return a struct that is different for every container type.
>>>
>>> So you can't write container-independent iteration code unless you use opApply, in which case composition becomes tenuous.
>>  No, you can easily write container-independent iteration as long as you have the implementation.
>
> In this context: container-independent = using the Container interface. This is the whole purpose of creating a container hierarchy. If the container design fosters knowing the implementation, maybe a class hierarchy is not the right choice in the first place.
>
>> If you use interfaces you can write opApply wrappers to do different things.  I'm not sure what you mean by composition.
>
> For example, compose ranges a la retro or take.

how do you compose retro on an input range?  that's all you got with your container interface.  It's the same for opApply.

take is simple:

class TakeIterator(V) : Iterator(V)
{
   private Iterator!V src;
   private uint nelems;
   public this(Iterator!V src, uint nelems) { this.src = src; this.nelems = nelems;}
   public int opApply(int delegate(ref V v) dg)
   {
      uint elems = 0;
      int result = 0;
      foreach(ref v; src)
      {
          if(elems++ > nelems || (result = dg(v)) != 0)
		break;
      }
      return result;
   }
   public uint length()
   {
      uint len = src.length();
      return len == NO_LENGTH_SUPPORT ? NO_LENGTH_SUPPORT : len < nelems ? len : nelems;
   }
}

>
>>>> It also surpasses opSlice via opApply, since all an input range can do anyways is iterate.  In fact, opApply is more powerful because you can change elements and remove elements (via purging).  Plus it's more efficient than a range-via-interface.
>>>
>>> An input range is a subset of other (more powerful) ranges. It's also much more flexible. I agree that calling range primitives via interfaces can become an efficiency liability.
>>  How is it more flexible?  You can't replace data, and you can't remove data while iterating, both of which are possible with dcollection's primitives.  If you have a Container(E) which defines InputRange!E opSlice, how do you get at the more defined range definition?  casting?
>
> You can replace data by assigning to range's elements. Removal is done via positioned remove (which I think needs to be a primitive).

That is available in dcollections, just not part of the interface.  Every container implements remove(cursor) function, but since cursors are implementation specific, it can't be part of the interface.

>>  Not having opSlice on the interface guarantees you never have a virtual call for iteration :)  opApply mitigates the virtual call on the interface.
>
> And takes away the ability to compose ranges and to use algorithms with the container.

As long as you only have the interface and not the actual container.  I have no problem with that.

>
>>> Above all: the primitive set for a container must be a small set of functions that (a) can be implemented by all containers within reasonable efficiency bounds, and (b) are container-specific, not generic. IMHO any container design that defines a search(Element) as a primitive is misguided. Searching is not a container primitive - it's an algorithm. Only more specialized containers (e.g. trees, hashes etc.) can afford to define search(Element) as a primitive. Linear search is a generic algorithm that works the same for everyone. It does not belong as a method of any container.
>>  If the minimal container design isn't usable without std.algorithm, then I don't think it's worth having interfaces.
>
> Why?

If I have to pull out std.algorithm to do simple things like remove a specific element from a container, where std.algorithm may not do the most efficient thing, then why the hell have an interface in the first place?  If I have an interface, I want to make it do all the things all containers can do, not delegate it to some other part of the library.  I trust that the interface knows how to do container functions in the best way possible.

>
> I think the other way: if the minimal container design is unusable with std.algorithm, the design took a wrong turn somewhere.

If the OO *interface* does not support std.algorithm, then that's good by me, seeing as how you have to use non-inlinable virtual functions to do simple tasks on ranges that cannot be copied without allocating on the heap.  I think absolutely containers should support std.algorithm via the method std.algorithm is usable by any other range -- compile-time interfaces.

>> If you need std.algorithm, you need the full implementation of the container because it's a compile-time interface.
>
> How did you reach that conclusion? std.algorithm uses a syntactic interface that is obeyed by interfaces too. There's no problem.

It's a bastardized hack to use interfaces with std.algorithm.  I think it's as useful as functional qsort.  Yeah the big-O is the same, but the implementation sucks.

>> Interface ranges are something that should be avoided, it's like having a programming language where everything has to be a class.
>
> I disagree. The negation of an implementation dogma can be just as limiting as the dogma. The way I see it, a design defines some desiderata. Then it does whatever it takes to fulfill them.

You're starting to get too smart for me here :)  I have no idea what disrderata means.

> If one desideratum is to use a class hierachy to write container-independent code, then interface ranges naturally follow. There's no ifs and buts about it.

The container hierarchy can support two *different* paradigms.  First, the paradigm of runtime interfaces which may be useful for using containers to store pieces of data to go with an object.  Such as a socket cache that maps ip addresses to sockets.  This cache cares nothing about sorting or doing retro iteration on the socket cache.  It cares not about the implementation of the map, just that the map does map-like things (i.e. assign this key to this value, remove this key, etc.).

The other paradigm is to have access to std.algorithm in the most efficient way possible.  Such access requires full implementation knowledge to make the most efficient implementation of algorithms.  In fact, containers can *use* std.algorithm underneath their member functions if so desired.

>> What you are saying seems completely incorrect to me: "since not all containers can implement fast search, I'm going to guarantee that *all* containers use a linear search via their interface.
>
> This is a misunderstanding. In the STL linear containers don't define find(), but associative containers do. That is the correct design.

But STL isn't accessed without the full implementation, you are comparing apples to oranges here.

You said that find is something that should be relegated to std.algorithm.  Isn't std.algorithm's find a linear search?  If I have a Container!E, don't I need to use std.algorithm to search for an element with it's returned range?  How is this not forcing a linear search when using an interface to a container?  What is the answer to that, don't search for an element in a container unless you have the full implementation?  That works in dcollections too!

>> *AND* I want to make each loop in the search algorithm call 3 virtual functions!"
>
> Not necessarily. This happens only if you use the Container interface to write container-independent code. It is the cost it takes for the design to fulfill its desiderata.

Is that not the case we are discussing?

>
>> How is that better than a search function that guarantees linear performance but gives the option of being as fast as possible with no loop virtual calls?
>
> It is better because it doesn't write off search complexity as an implementation detail. "Linear or better" is not a good guarantee. A good guarantee is "From this node of the hierarchy down, this primitive is defined to guarantee O(log n) search". Linear search is a generic algorithm and does not belong to any container.

That is available via the full implementation.  With the interface, it's the "best you can do", because that's all that is available.  "linear or better" is a better guarantee than "linear with guaranteed 3 no-inlining virtual function calls per element."

-Steve

January 03, 2010

Re: output ranges: by ref or by value?

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

Steven Schveighoffer wrote:
> On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> 
>> Steven Schveighoffer wrote:
>>
>>> Not having opSlice be part of the interface itself does not preclude it from implementing opSlice, and does not preclude using ranges of it in std.algorithm.  If I'm not mistaken, all functions in std.algorithm rely on compile time interfaces.  opApply allows for full input range functionality for things like copying and outputting where templates may not be desired.
>>
>> The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.
> 
> The answer is, you don't.  Ranges via interfaces are slower than primitive functions defined on the interface, so use what's best for interfaces when you have an interface and what's best for compile-time optimization when you have the full implementation.

I understand we don't agree on this. To me, making containers work with algorithms is a drop-dead requirement so I will rest my case.

Nevertheless, I think there's one point that got missed. It's a tad subtle, and I find it pretty cool because it's the first time I used covariant returns for something interesting. Consider:

interface InputIter(T) { ... }
interface ForwardIter(T) : InputIter!T { ... }

class Container(T) {
    InputIter!T opSlice() { ... }
}

class Array(T) : Container(T) {
    class Iterator : ForwardIter!T {
	... all final functions ...
    }
    Iterator opSlice();
}

Now there are two use cases:

a) The user uses Array specifically.

auto a = new Array!int;
sort(a[]);

In this case, sort gets a range of type Array!(int).Iterator, which defines final primitives. Therefore, the compiler does not emit ONE virtual call throughout. I believe this point was lost.

b) The user wants generality and OOP-style so uses Container throughout:

Container!int a = new Array!int;
auto found = find(a[], 5);

This time find gets an InputIter!int, so it will use virtual calls for iteration.

The beauty of this design is that without any code duplication it clearly spans the spectrum between static knowledge and dynamic flexibility - within one design and one implementation. This is the design I want to pursue.

>> Yes, and I think remove(V v) does not belong to the minimum functionality. It combines two functions (search and remove) and raises questions such as what happens to duplicate elements.
> 
> the function removes one value if it is in the container, zero if it is not.

So you need a way to do searching and a way to do positioned remove. Why combine them into one, when you can use both separately to great effect?

> Another primitive interface Multi(V) adds a removeAll(V v) function.

Why do you need a separate interface for removeAll? Are there containers that can remove one value but not all?

> Again, it combines two functions that are awkward or difficult to implement using ranges via interfaces.

I contend that point.

>>>>> contains(V v); // returns whether an element is contained in the colleciton
>>>>
>>>> I don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.
>>>  Again, it's part of the minimal usable interface.  It's not a generic algorithm, because some containers can implement this more efficiently.
>>
>> But this is exactly what I believe to be a mistake: you are abstracting away algorithmic complexity.
> 
> Not really, I say right in the docs that contains(V) can take O(n) time.  There's no abstraction.  I don't say that the algorithmic complexity is defined by the implementation.  It's no different than std.algorithm's find.
> 
>>
>>> Plus, to use the generic algorithms, you would need to use interfaces as ranges which I think are completely useless.
>>
>> Why?
> 
> 3 virtual calls per iteration, no possibility to inline, and reference copy semantics.  The latter is the biggest problem for me.  I want to be able to save ranges (iterators) for later use on containers, and having to call to the heap for each save is a deal killer.

Per my argument above, there may be zero virtual calls per iteration. (I want to make sure my point about covariance was understood even if it is not considered compelling.)

> However, you make it sound like there are so many containers that would want to not store the length.  There is only one -- linked lists, and only with special requirements (O(1) splicing)

I agree it's tenuous to leave length() out. It's a judgment call.

>> Map!(K, V) has Tuple!(K, V) as its element type.
> 
> That makes things awkward.  For example, to remove, you must remove the K-V pair.  typically you only want to specify the K or the V.  I realize it doesn't make things awkward for your container interface since you don't *define* a remove function, but your container interface isn't very useful as a general container.

It is. What is true is a converse of your statement - a general container isn't very useful as a map - which I'd agree with, and is sensible OO design. As long as a map is seen as a container, it _does_ store K-V pairs. If I know it's a map, great, it defines a different primitive for removing an element given its key.

>>> If you use interfaces you can write opApply wrappers to do different things.  I'm not sure what you mean by composition.
>>
>> For example, compose ranges a la retro or take.
> 
> how do you compose retro on an input range?  that's all you got with your container interface.  It's the same for opApply.

You need to be a bit down in the hierarchy to use retro.

> take is simple:
> 
> class TakeIterator(V) : Iterator(V)
> {
>    private Iterator!V src;
>    private uint nelems;
>    public this(Iterator!V src, uint nelems) { this.src = src; this.nelems = nelems;}
>    public int opApply(int delegate(ref V v) dg)
>    {
>       uint elems = 0;
>       int result = 0;
>       foreach(ref v; src)
>       {
>           if(elems++ > nelems || (result = dg(v)) != 0)
>         break;
>       }
>       return result;
>    }
>    public uint length()
>    {
>       uint len = src.length();
>       return len == NO_LENGTH_SUPPORT ? NO_LENGTH_SUPPORT : len < nelems ? len : nelems;
>    }
> }

And then how do I compose take with something else, or pass it to std.algorithm.find()? This whole thing doesn't hold water.

>> You can replace data by assigning to range's elements. Removal is done via positioned remove (which I think needs to be a primitive).
> 
> That is available in dcollections, just not part of the interface.  Every container implements remove(cursor) function, but since cursors are implementation specific, it can't be part of the interface.

Well that's a problem with the design, no? Why then define a hierarchy? A simple set of unrelated types may be a better design.

> If I have to pull out std.algorithm to do simple things like remove a specific element from a container, where std.algorithm may not do the most efficient thing, then why the hell have an interface in the first place?

Things can be arranged such that std.algorithm does do the best thing, and is also the most general and reusable approach. The STL has partly shown that. I plan to take that one step further - to show that container-independent code can be gainfully written with a hierarchy of containers (something that STL failed at).

> If I have an interface, I want to make it do all the things all containers can do, not delegate it to some other part of the library.  I trust that the interface knows how to do container functions in the best way possible.

The interface must be expressive enough to let generic algorithms do their job.

>> I think the other way: if the minimal container design is unusable with std.algorithm, the design took a wrong turn somewhere.
> 
> If the OO *interface* does not support std.algorithm, then that's good by me, seeing as how you have to use non-inlinable virtual functions to do simple tasks on ranges that cannot be copied without allocating on the heap.

Did my covariance argument above convince you otherwise?

> I think absolutely containers should support std.algorithm via the method std.algorithm is usable by any other range -- compile-time interfaces.

I agree :o).

>> I disagree. The negation of an implementation dogma can be just as limiting as the dogma. The way I see it, a design defines some desiderata. Then it does whatever it takes to fulfill them.
> 
> You're starting to get too smart for me here :)  I have no idea what disrderata means.

Come on, the Internet is there. I meant "essential goal".

>> If one desideratum is to use a class hierachy to write container-independent code, then interface ranges naturally follow. There's no ifs and buts about it.
> 
> The container hierarchy can support two *different* paradigms.  First, the paradigm of runtime interfaces which may be useful for using containers to store pieces of data to go with an object.  Such as a socket cache that maps ip addresses to sockets.  This cache cares nothing about sorting or doing retro iteration on the socket cache.  It cares not about the implementation of the map, just that the map does map-like things (i.e. assign this key to this value, remove this key, etc.).
> 
> The other paradigm is to have access to std.algorithm in the most efficient way possible.  Such access requires full implementation knowledge to make the most efficient implementation of algorithms.  In fact, containers can *use* std.algorithm underneath their member functions if so desired.

Yah. Covariance takes care of all of the above.

I'd almost agree with you as of a couple of months ago, when the whole covariance thing hadn't occurred to me.

>>> What you are saying seems completely incorrect to me: "since not all containers can implement fast search, I'm going to guarantee that *all* containers use a linear search via their interface.
>>
>> This is a misunderstanding. In the STL linear containers don't define find(), but associative containers do. That is the correct design.
> 
> But STL isn't accessed without the full implementation, you are comparing apples to oranges here.
> 
> You said that find is something that should be relegated to std.algorithm.

YES. If a design must express define linear search in more than one place, then that design is suboptimal. This is an important goal I want to follow.

> Isn't std.algorithm's find a linear search?

Yes.

> If I have a Container!E, don't I need to use std.algorithm to search for an element with it's returned range?

No. It's like in the STL - either the container defines its own searching primitives (e.g. set or map), or you can always grab the container's range and use the well-known linear search defined by std.algorithm.

> How is this not forcing a linear search when using an interface to a container?

It isn't forcing a linear search. It is fostering awareness of the capabilities needed by the compiler.

A design that defines a best-effort find() as a member is suboptimal. There's no guarantee it can provide. A putative user cannot tell anything about the complexity of their code that uses the find() member. A good design has the user say, hey, linear search is ok here; in this other place, I need a fast find so I can't use Container!T, I need AssociativeContainer!T.

> What is the answer to that, don't search for an element in a container unless you have the full implementation?  That works in dcollections too!

The shape and characteristic of the search is defined by the type of the container.

A great library that got mentioned here in the past:

http://www.gobosoft.com/eiffel/gobo/structure/container.html

I strongly recommend studying that design; it is very close to optimality. They define DS_SEARCHABLE as a subclass of DS_CONTAINER - as they should.

>>> *AND* I want to make each loop in the search algorithm call 3 virtual functions!"
>>
>> Not necessarily. This happens only if you use the Container interface to write container-independent code. It is the cost it takes for the design to fulfill its desiderata.
> 
> Is that not the case we are discussing?

No, there are two cases as I discussed above in this post.

>>> How is that better than a search function that guarantees linear performance but gives the option of being as fast as possible with no loop virtual calls?
>>
>> It is better because it doesn't write off search complexity as an implementation detail. "Linear or better" is not a good guarantee. A good guarantee is "From this node of the hierarchy down, this primitive is defined to guarantee O(log n) search". Linear search is a generic algorithm and does not belong to any container.
> 
> That is available via the full implementation.  With the interface, it's the "best you can do", because that's all that is available.  "linear or better" is a better guarantee than "linear with guaranteed 3 no-inlining virtual function calls per element."

I think one issue might be that you think "interface" and "implementation" with nothing in between. I'm talking "gradually more specialized interface", as the Gobo library above defines.

Andrei

January 04, 2010

Re: output ranges: by ref or by value?

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

We are arguing in circles, so I will just stop :)

I'll address the one point I think we both feel is most important below

On Sun, 03 Jan 2010 17:19:52 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Steven Schveighoffer wrote:
>> On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> Steven Schveighoffer wrote:
>>>
>>>> Not having opSlice be part of the interface itself does not preclude it from implementing opSlice, and does not preclude using ranges of it in std.algorithm.  If I'm not mistaken, all functions in std.algorithm rely on compile time interfaces.  opApply allows for full input range functionality for things like copying and outputting where templates may not be desired.
>>>
>>> The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.
>>  The answer is, you don't.  Ranges via interfaces are slower than primitive functions defined on the interface, so use what's best for interfaces when you have an interface and what's best for compile-time optimization when you have the full implementation.
>
> I understand we don't agree on this. To me, making containers work with algorithms is a drop-dead requirement so I will rest my case.
>
> Nevertheless, I think there's one point that got missed. It's a tad subtle, and I find it pretty cool because it's the first time I used covariant returns for something interesting. Consider:
>
> interface InputIter(T) { ... }
> interface ForwardIter(T) : InputIter!T { ... }
>
> class Container(T) {
>      InputIter!T opSlice() { ... }
> }
>
> class Array(T) : Container(T) {
>      class Iterator : ForwardIter!T {
> 	... all final functions ...
>      }
>      Iterator opSlice();
> }
>
> Now there are two use cases:
>
> a) The user uses Array specifically.
>
> auto a = new Array!int;
> sort(a[]);
>
> In this case, sort gets a range of type Array!(int).Iterator, which defines final primitives. Therefore, the compiler does not emit ONE virtual call throughout. I believe this point was lost.
>
> b) The user wants generality and OOP-style so uses Container throughout:
>
> Container!int a = new Array!int;
> auto found = find(a[], 5);
>
> This time find gets an InputIter!int, so it will use virtual calls for iteration.
>
> The beauty of this design is that without any code duplication it clearly spans the spectrum between static knowledge and dynamic flexibility - within one design and one implementation. This is the design I want to pursue.

I see why it is attractive to you.  But to me, algorithms are not the main driver for containers.  One thing we both agree on -- when you know the full implementation, algorithms from std.algorithm should be implemented as fast as possible.  Where we disagree is what is desired when the full implementation is not known.  I contend that the ability to link with std.algorithm isn't a requirement in that case, and is not worth complicating the whole world of ranges to do so (i.e. build std.algorithm just in case you have reference-type ranges with a "save" requirement).  If you don't know the implementation, don't depend on std.algorithm to have all the answers, depend on the implementation which is abstracted correctly by the interface.

What this means is there will be some overlap in functions that are defined on the interfaces and functions defined in std.algorithm.  Most likely the overlapping functions both point to the same implementation (naturally, this should live in std.algorithm).  This is for convenience to people who want to use containers in the interface form, so they do not need to concern themselves with the abilities of std.algorithm, they just want containers that help them get work done.

There is still one flaw in your ideas for covariant types:  even though final functions can be used throughout and the possibility for inlining exists, you *still* need to use the heap to make copies of ranges.  That was and still is my biggest concern.

I think the rest of this whole post is based on our disagreement of these design choices, and it really doesn't seem productive to continue all the subtle points.

[rest of growing disagreement snipped]

BTW, I use covariant return types freely in dcollections and I agree it kicks ass.  It seems like black magic especially when you are returning possibly a class or interface which need to be handled differently.

-Steve

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation