May 22, 2010
On 05/19/2010 03:07 PM, Steven Schveighoffer wrote:
> Ellery Newcomer Wrote:
>
>> Are the collections supposed to not have isEmpty members?
>
> No.  Use length == 0.  O(1) length is always supported for all
> collections.

One thing before I forget: I think any good collection abstraction must
be concretized back to the classic collection instances. Singly-linked
lists definitely can't be left off the list! It would be an epic failure. Imagine the papers! New York Times: "D has containers, but no singly-linked lists". The New Yorker: "D's abstractions are too abstract". The National Enquirer: "The Bat Boy stole D's singly-linked lists". Pyongyang Times: "Another failure of the so-called Western Democracy -- yet Juche doesn't need singly-linked lists anyway."

Keeping the length cached in a singly-linked list is a costly mistake.
It works against splicing (an important list primitive) and most of the
time you don't need it so why waste time updating it. Adding any cruft
beyond { T payload; List* next; } is very strong incentive for the coder
to write their own. Why do you think they're using an SLL in the first
place? Because it's simple and has efficient primitives!

>> OTish: What are your thoughts on a bimap implementation and a
>> child/sibling or general tree implementation as part of
>> dcollections?
>
> I haven't the slightest what a bimap is :)  I am not an expert in
> collections or data structures, I just reimplement things I have
> understood.  My implementations are basically copied from my
> algorithm book, and refined as much as I can do.
>
> That being said, if you have any implementation of a tree or hash, it
> should be easy to insert into dcollections.  If you have ideas for
> other collection types (i.e. other than Map, Set, Multiset or List),
> then I can look into that if you point me at an implementation or
> have one of your own.  I purposefully left out multi-map because I've
> never had a huge use for it, and it seemed like a awkward thing to
> create an interface for...

Tries of various kinds come up again and again. I don't think dcollections' current abstractions capture them, which further supports my point that containers have too much personality to be caught in the straitjacket of class hierarchies.


Andrei
May 22, 2010
Andrei Alexandrescu:

> Keeping the length cached in a singly-linked list is a costly mistake. It works against splicing (an important list primitive) and most of the time you don't need it so why waste time updating it.

... LinkedList(T, bool WithLength=True) {
  static if(WithLength) {
    // defines  @attribute length here etc
  }
}
Inside the methods like the add there is another static if(WithLength) that enables/disables the code that updates the length.

This allows to keep the length by default, but makes it easy to remove it when desired.

Bye,
bearophile
May 23, 2010
Andrei Alexandrescu Wrote:
> 
> One thing before I forget: I think any good collection abstraction must be concretized back to the classic collection instances. Singly-linked lists definitely can't be left off the list! It would be an epic failure. Imagine the papers! New York Times: "D has containers, but no singly-linked lists".

We could always say we were following C++'s lead :-)
May 23, 2010
On 05/22/2010 09:07 PM, Sean Kelly wrote:
> Andrei Alexandrescu Wrote:
>>
>> One thing before I forget: I think any good collection abstraction must
>> be concretized back to the classic collection instances. Singly-linked
>> lists definitely can't be left off the list! It would be an epic
>> failure. Imagine the papers! New York Times: "D has containers, but no
>> singly-linked lists".
>
> We could always say we were following C++'s lead :-)

C++(0|1)x has forward_list. Needless to say, supporting size() at constant complexity was a matter of huge debate. The proposal that I like is this:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2543.htm

Search the page for "size()".

I don't know how the proposal was amended.


Andrei
May 23, 2010
On 2010-05-22 16:01:37 -0400, Walter Bright <newshound1@digitalmars.com> said:

> Michel Fortin wrote:
>> What's the point of having extra indirection here?
> 
> Good question. I think the answer is:
> 
> 1. When do you ever want to copy a collection? I almost never do, because copying one is an inherently expensive operation.

Whenever you need to preserve the previous state of something before applying some transformation on it. But I agree that the copy should be explicit because it is O(n), hence my suggestion of disabling implicit copying for containers.

Since we're at it, a reference types container sometime makes it too easy to just create a new reference to the same container when what you really want is to make a copy. I happen to have a bug of this sort to fix in my Objective-C program right now where a reference to a container leaked where it should have been a copy, causing unwanted mutations to the .


> 2. When you copy a collection, do you copy the container or the elements in the container or both? One would have to carefully read the documentation to see which it is. That increases substantially the "cognitive load" of using them.

I don't see the extra cognitive load. Assuming you disable implicit copying of the container, you'll have to use ".dup", which will work exactly as an array. The way items are copied are exactly the same as if the container was a reference type (you call a "dup" or equivalent function and things get copied).


> 3. Going to lengths to make them value types, but then disabling copying them because you want people to use them as reference types, seems like a failure of design somewhere.

I agree in a way. But at the same time, forcing everyone to use a reference type when sometime a value-type would be more adequate also looks like a failure of design to me. To me, the best tradeoff seems to use a value-type because it's quite trivial to create a reference-type from a value type when you need it; the reverse is awkward.


> 4. That silly extra level of indirection actually isn't there. Consider that even value locals are accessed via indirection: offset[ESP]. For a reference collection we have: offset[EBX]. No difference (yes, EBX has to be loaded, but if it is done more than once it gets cached in a register by the compiler).

Have you ever worked with containers of containers? Surely yes since D associative arrays are one of them. So assume we want to implement our associative arrays like this:

	class HashTable(Key, Value) {
		Array!(Tuple!(Hash!Key, TreeSet!(Tuple!(Key, Value)))) buckets;
	}

Do you find it reasonable that the TreeSet be a reference type?

Reference-type containers would mean one indirection and one extra allocated block for each bucket. Then add that 'Value' could itself be a struct or class containing its own container, and you're stuck with a third unnecessary level of indirection and extra calls to the GC allocate containers and/or check for null. Sound quite wasteful to me. In addition, those extra allocations add more logic to our hash table and thus more chances for bugs.

Here I'm using a hash table as an example, the same problem applies to many other data structures, whether they are generic or specific to a particular problem. Container should be efficient and easy to use when composed one inside another. That's the greatest strengths of C++ value-type containers in my opinion.


> 5. Just making them all reference types zeans the documentation and use become a lot simpler.

Simpler to explain, maybe. Simpler to use, I have my doubts. You're just moving the burden to somewhere else. A reference-type container requires a "new Container()" somewhere, and some protection logic against null. In exchange, you don't need to write 'ref' in functions taking containers, and can easily copy references to the container everywhere (sometime too easily). But the reference-type benefits aren't entirely lost with a value-type, because it's trivial to change a value-type as a reference-type.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

May 24, 2010
Fantastic work Steve,
Pretty good design IMHO, not sure why .. but somehow the collection battle reminds me to

Steve Vai vs Ry Cooder
<vbg>
Bjoern
May 24, 2010
On Sat, 22 May 2010 16:58:12 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 05/19/2010 03:07 PM, Steven Schveighoffer wrote:
>> Ellery Newcomer Wrote:
>>
>>> Are the collections supposed to not have isEmpty members?
>>
>> No.  Use length == 0.  O(1) length is always supported for all
>> collections.
>
> One thing before I forget: I think any good collection abstraction must
> be concretized back to the classic collection instances. Singly-linked
> lists definitely can't be left off the list! It would be an epic failure. Imagine the papers! New York Times: "D has containers, but no singly-linked lists". The New Yorker: "D's abstractions are too abstract". The National Enquirer: "The Bat Boy stole D's singly-linked lists". Pyongyang Times: "Another failure of the so-called Western Democracy -- yet Juche doesn't need singly-linked lists anyway."
>
> Keeping the length cached in a singly-linked list is a costly mistake.
> It works against splicing (an important list primitive) and most of the
> time you don't need it so why waste time updating it. Adding any cruft
> beyond { T payload; List* next; } is very strong incentive for the coder
> to write their own. Why do you think they're using an SLL in the first
> place? Because it's simple and has efficient primitives!

length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't possible, but all dcollections define length.  In that case, you can do coll.begin.empty to determine if the collection has any elements.

But all dcollections are bi-directional anyways, there is no singly linked list.  That was a decision I made early on, because it allows more assumptions about dcollections' containers.  I think length-less singly linked lists would be a good addition to phobos that are not part of the collection hierarchy, they are almost on par with builtin arrays as being so simple.

And singly linked vs. doubly linked does not make any difference whether O(1) length is possible or not.  As you say, it's O(1) splicing or O(1) length, regardless of single or double links.

I disagree with your assessment that length is a less used operation than splicing.  I think I have never used splicing with std::list.  C++'s default is O(1) length, and I followed that design.

>
>>> OTish: What are your thoughts on a bimap implementation and a
>>> child/sibling or general tree implementation as part of
>>> dcollections?
>>
>> I haven't the slightest what a bimap is :)  I am not an expert in
>> collections or data structures, I just reimplement things I have
>> understood.  My implementations are basically copied from my
>> algorithm book, and refined as much as I can do.
>>
>> That being said, if you have any implementation of a tree or hash, it
>> should be easy to insert into dcollections.  If you have ideas for
>> other collection types (i.e. other than Map, Set, Multiset or List),
>> then I can look into that if you point me at an implementation or
>> have one of your own.  I purposefully left out multi-map because I've
>> never had a huge use for it, and it seemed like a awkward thing to
>> create an interface for...
>
> Tries of various kinds come up again and again. I don't think dcollections' current abstractions capture them, which further supports my point that containers have too much personality to be caught in the straitjacket of class hierarchies.

I am not familiar with tries, and dcollections has no class hierarchy.

-Steve
May 24, 2010
On Thu, 20 May 2010 12:46:59 -0400, Robert Jacques <sandford@jhu.edu> wrote:

> On Thu, 20 May 2010 06:34:42 -0400, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
>> Robert Jacques Wrote:
>>
>>> On Wed, 19 May 2010 21:42:35 -0400, Steven Schveighoffer
>>> >
>>> > Does that make sense?
>>> >
>>> > -Steve
>>>
>>> Yes and No. I understand where your coming from, but I think it's a bad
>>> idea. First, I think it needlessly expands the radius of comprehension
>>> needed to understand and use the library. (See Tangled up in tools
>>> http://www.pragprog.com/magazines/2010-04/tangled-up-in-tools) Second, I
>>> think designing a library to be flexible enough to meet some future,
>>> anticipated need (e.g. dlls) is a good idea, but actually implementing
>>> vaporous future needs is fraught with peril; it's too easy to guess wrong.
>>> Third, interface base design is viral; If library X uses interfaces then I
>>> have to use interfaces to interface with it. And if another library Y uses
>>> classes, then I'm going have to write a (needless) wrapper around one of
>>> them.
>>
>> I understand these points, but I'm already using interfaces to copy data between containers.  I don't have to, I could have used generic code, but this way, only one function is instantiated to copy data from all the other containers.  The problem with using generic code is that the compiler will needlessly duplicate functions that are identical.
>
> This sounds like a failure of design. Why aren't you using ranges to do this?

Why are ranges necessarily better?  I'm using the container's opApply, which I'd probably still use even if there were no interfaces.  opApply allows much more possibilities in traversal than ranges which cannot use stack recursion without heap activity.

>
>> Using interfaces is not as viral as you think.  My interfaces can be used in generic code, as long as the generic code uses functions in the interfaces.  If a library returns an interface, the author is saying "I don't want you using any functions outside this interface," so why is that a bad thing?
>
> Well, needlessly duplicated functions for one. :) More importantly, the example I gave was about third party libraries which I have no control over. So this solution explicitly doesn't work. And even if everyone used templates everywhere in order to be compatible with both interfaces and classes, isn't that a viral effect of having both?

If a 3rd party library uses interfaces, it's probably for good reason.  They most likely want to remain binary compatible with other libs, and/or want to abstract the implementation of some custom container type.  If you don't like their requirements, don't use the library.

-Steve
May 24, 2010
On Fri, 21 May 2010 14:00:42 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 05/20/2010 05:34 AM, Steven Schveighoffer wrote:

>> I understand these points, but I'm already using interfaces to copy
>> data between containers.  I don't have to, I could have used generic
>> code, but this way, only one function is instantiated to copy data
>> from all the other containers.  The problem with using generic code
>> is that the compiler will needlessly duplicate functions that are
>> identical.
>
> There is a copy() function that copies any range to any other range. It
> might need a revisit, but I think the way to go about copying is generic.

This implies the space for the elements already exists in the target range.  Copying data from a list to a set for instance can't just allocate space in the set and then use some generic copy function to copy the data in.

If generic copying is to be used, it would be a copy function defined by the collection, not a standard one.

>
>> Using interfaces is not as viral as you think.  My interfaces can be
>> used in generic code, as long as the generic code uses functions in
>> the interfaces.  If a library returns an interface, the author is
>> saying "I don't want you using any functions outside this interface,"
>> so why is that a bad thing?
>>
>> Forcing people to *not* use interfaces has its drawbacks too.
>> Dcollections gives the most flexible design I could muster, while
>> still being useful.
>>
>> I'm not saying I'm against removing the interfaces until some later
>> date, but I don't see any convincing arguments yet, especially since
>> I've already seen benefits from having them.
>
> What are the benefits that you have noticed? I think you'd need to back up the copying argument with some data if you want to frame it as a benefit.

When I mean see benefits, the benefits are not data-related but design related -- more designs are possible with interfaces and generic programming combined than with generic programming alone.

-Steve
May 24, 2010
On Sat, 22 May 2010 07:56:31 -0400, Michel Fortin <michel.fortin@michelf.com> wrote:

> On 2010-05-21 22:55:16 -0400, Walter Bright <newshound1@digitalmars.com> said:
>
>> Walter Bright wrote:
>>> If we can get anywhere close to that level of success with ranges and containers, we should all be well pleased.
>>  Mike Taylor has a phrase for that I think is well-coined: "impedance matching",
>> defined as the work necessary to get one library module to work with another
>> library module.
>
> This makes me think about something.
>
> In principle, I like the idea of containers being reference type. It works well when passing a container to functions. But at the same time, I despite it. By-reference containers forces you to have extra indirections even when you don't need them, and you have to worry about null. Sometime a value-type would be better, when creating more complex data structures for instance:
>
> 	class Channel {
> 		private {
> 			Array!Message inbound;
> 			Array!Message outbound;
> 		}
>
> 		...
> 	}
>
> What's the point of having extra indirection here?

Scope class members should solve this.  It's been thrown around forever -- essentially, you should be able to define that a class member's storage is contained within the owner object.

a value-type node-based containers just don't work well -- it's too easy to escape references, and too easy to accidentally duplicate the entire node set when passing to functions.  It works fine for arrays because the array has a very simple structure -- data and length.  And the length and data are somewhat orthogonal.

-Steve