May 24, 2010
On Mon, 24 May 2010 11:45:44 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 05/24/2010 10:23 AM, Steven Schveighoffer wrote:
>> On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:
>>>> length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't
>>>> possible,
>>>
>>> Do you agree that's an awkwardness?
>>
>> Yes, at that point it's an optimization. The only place where I assume
>> length could be invalid is when working with the Iterator!V type, which
>> might not be a dcollections object.
>
> That doesn't matter. Since you define NO_LENGTH_SUPPORT and you have interfaces, the door is ajar forever. Client code can't write this:
>
> auto total = cont1.length + cont2.length;
>
> because that is incorrect code (that compiles, runs, and produces some odd result).
>
> So don't take it lightly. NO_LENGTH_SUPPORT means no length support. Then pretending it's supported will only harm everyone.

I get what you are saying.  What I meant was it's moot if you are not using interfaces.  If you don't know what the underlying type is, then you have to do a runtime check.

I agree it's awkward, and I could have not included length as a member of Iterator, in which case it would be on all the container types and NO_LENGTH_SUPPORT would not exist.  All containers are meant to have a valid length, it is only with non-container Iterators that length can be NO_LENGTH_SUPPORT.

>> However, supporting non-length containers via
>> generic programming vs. interfaces would be much smoother.
>>
>>>> In that case, you can do
>>>> coll.begin.empty to determine if the collection has any elements.
>>>
>>> Then why not make length optional and define empty? From the above it
>>> looks like an obviously better design.
>>>
>>>> But all dcollections are bi-directional anyways, there is no singly
>>>> linked list. That was a decision I made early on, because it allows more
>>>> assumptions about dcollections' containers. I think length-less singly
>>>> linked lists would be a good addition to phobos that are not part of the
>>>> collection hierarchy, they are almost on par with builtin arrays as
>>>> being so simple.
>>>
>>> Do you see dcollections' inability to express slists as a problem?
>>
>> I don't see them as unable to express slists. They would break the
>> design guidelines that I set for collections being iterable in both
>> directions, but an slist fits fine within the List interface.
>
> What's the required complexity of concat?

O(n) with n being the number of elements added

> Is add expected to put things in a specific place? How does slist implement back()?
>
> slist does not fit the List interface.

A pointer to the end element would be required for both appending and back().

-Steve
May 24, 2010
On Mon, 24 May 2010 11:51:39 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 05/24/2010 10:39 AM, Steven Schveighoffer wrote:
>> On Mon, 24 May 2010 11:21:20 -0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:
>>>> I am not familiar with tries,
>>>
>>> Missed that upon the first read. I suggest you look at tries and the
>>> following other structures as good examples that it's futile to fit
>>> collections into hierarchies.
>>>
>>> http://en.wikipedia.org/wiki/Trie
>>> http://linux.thai.net/~thep/datrie/datrie.html
>>> http://en.wikipedia.org/wiki/Suffix_tree
>>> http://en.wikipedia.org/wiki/Kd-tree
>>>
>>> We'd want to implement in time those and many more in Phobos without
>>> worrying that some of their primitives won't fit the existing
>>> interfaces, and also without the unjustified effort of formalizing
>>> interfaces for each of them in thinking that another very, very
>>> similar container will come along.
>>>
>>
>>  From a cursory look, I don't see why tries would not be possible in
>> dcollections.
>>
>> I'd probably start with something like this:
>>
>> class TrieMap(K, V) : Map!(K[], V)
>>
>> The array brackets enforce the ability to use prefixes on the keys.
>
> The key point of tries is that they have distributed storage. Thus, shoehorning them into the interface function
>
> Iterator!(K[]) keys();
>
> forces allocation and copying.

I wasn't thinking of that, I was thinking the key point of tries being more efficient at lookup for strings of things.

One solution is that Trie could be a separate interface (i.e. a sibling of Map).

One thing to point out is that D's arrays are adept at appending and prefixing.  A K[] key would not necessarily be reallocating for each node traversal, but it certainly would be copying data.

How would you define a way to get all the keys of a Trie?  Or would you simply leave that function off?  Is it unreasonable to expect to be able to iterate the keys in a trie?  (I don't really know, I've never worked with them)

-Steve
May 24, 2010
On 05/24/2010 11:14 AM, Steven Schveighoffer wrote:
> On Mon, 24 May 2010 11:45:44 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> On 05/24/2010 10:23 AM, Steven Schveighoffer wrote:
>>> On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu
>>> <SeeWebsiteForEmail@erdani.org> wrote:
>>>
>>>> On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:
>>>>> length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't
>>>>> possible,
>>>>
>>>> Do you agree that's an awkwardness?
>>>
>>> Yes, at that point it's an optimization. The only place where I assume
>>> length could be invalid is when working with the Iterator!V type, which
>>> might not be a dcollections object.
>>
>> That doesn't matter. Since you define NO_LENGTH_SUPPORT and you have
>> interfaces, the door is ajar forever. Client code can't write this:
>>
>> auto total = cont1.length + cont2.length;
>>
>> because that is incorrect code (that compiles, runs, and produces some
>> odd result).
>>
>> So don't take it lightly. NO_LENGTH_SUPPORT means no length support.
>> Then pretending it's supported will only harm everyone.
>
> I get what you are saying. What I meant was it's moot if you are not
> using interfaces. If you don't know what the underlying type is, then
> you have to do a runtime check.
>
> I agree it's awkward, and I could have not included length as a member
> of Iterator, in which case it would be on all the container types and
> NO_LENGTH_SUPPORT would not exist. All containers are meant to have a
> valid length, it is only with non-container Iterators that length can be
> NO_LENGTH_SUPPORT.
>
>>> However, supporting non-length containers via
>>> generic programming vs. interfaces would be much smoother.
>>>
>>>>> In that case, you can do
>>>>> coll.begin.empty to determine if the collection has any elements.
>>>>
>>>> Then why not make length optional and define empty? From the above it
>>>> looks like an obviously better design.
>>>>
>>>>> But all dcollections are bi-directional anyways, there is no singly
>>>>> linked list. That was a decision I made early on, because it allows
>>>>> more
>>>>> assumptions about dcollections' containers. I think length-less singly
>>>>> linked lists would be a good addition to phobos that are not part
>>>>> of the
>>>>> collection hierarchy, they are almost on par with builtin arrays as
>>>>> being so simple.
>>>>
>>>> Do you see dcollections' inability to express slists as a problem?
>>>
>>> I don't see them as unable to express slists. They would break the
>>> design guidelines that I set for collections being iterable in both
>>> directions, but an slist fits fine within the List interface.
>>
>> What's the required complexity of concat?
>
> O(n) with n being the number of elements added
>
>> Is add expected to put things in a specific place? How does slist
>> implement back()?
>>
>> slist does not fit the List interface.
>
> A pointer to the end element would be required for both appending and
> back().

This further erodes my confidence. Size needs to be maintained _and_ a pointer to the last element must be maintained. For many uses, all of this stuff is unnecessary overhead.

I think we need to build the shared vision that in Phobos we're competing against hand-written, highly-optimized code. This is the fight STL took and largely won. We can't afford to lower our standards for one tiny bit.

I was once talking over a beer about the advantages of container abstractions. Walter slapped my face by defining the fastest SLL in the world on a napkin. It looked like this:

struct List { int v; List * next; }

He took so little time to define that, and the primitives he needed were so simple and fast, that it was an absolute overkill to replace that with a generic whiz-bang container. If I'd mentioned there'd be any extra data involved, that would mean the end of negotiation. "Why do you need that?" "For length()!" "But I don't need length()!" "Well you have to." "That's Communism in design!"

And I agree.


Andrei
May 24, 2010
On Mon, 24 May 2010 11:45:51 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 05/24/2010 10:01 AM, Steven Schveighoffer wrote:
>> On Fri, 21 May 2010 13:42:14 -0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> On 05/19/2010 08:42 PM, Steven Schveighoffer wrote:
>>>> Andrei Alexandrescu Wrote:
>>>>
>>>>
>>>> I.e. there aren't many kinds of HashMaps that derive from each
>>>> other. But the interfaces are not detrimental to your ideas. The
>>>> only thing interfaces require is that the entities implementing
>>>> them are classes and not structs. As long as you agree that
>>>> classes are the right call, then interfaces can co-exist with
>>>> your other suggestions without interference.
>>>
>>> This brings back a discussion I had with Walter a while ago, with
>>> echoes in the newsgroup. Basically the conclusion was as follows:
>>> if a container never escapes the addresses of its elements, it can
>>> manage its own storage. That forces, however, the container to be a
>>> struct because copying references to a class container would break
>>> that encapsulation. I called those "perfectly encapsulated
>>> containers" and I think they are good candidates for manual memory
>>> management because they tend to deal in relatively large chunks.
>>>
>>> I noticed that your collections return things by value, so they are
>>>  good candidates for perfect encapsulation.
>>
>> Then you need reference counting, I don't think that is a good
>> design.
>
> Why?
>
> I think refcounting for containers is perfect.
>
> Refcounting small objects is bad because the counter overhead is large.
> Also, small objects tend to be created and passed around a lot, so the
> time overhead is significant.
>
> In contrast, refcounting and containers are a perfect marriage. The
> container is a relatively large conglomerate of objects so refcounting
> that is bound to yield good benefits in terms of memory reclamation.

I meant refcounting elements.  If you are simply refcounting the container, what do you do when someone wants to remove a node?  Not allow it if more than one reference to the container exists?

>>
>> Java has some warts as you have rightfully pointed out in the past
>> (i.e. O(n) lookup), but I have attempted to remove all those warts.
>> Dcollections I would hope does not suffer from the problems that
>> Java's collections suffer from.
>
> That's great. But let me quote what you said: "I myself don't really use
> the interface aspect of the classes, it is mostly a carryover from the
> Java/Tango inspirations." I took that to mean you don't have a strong
> justification for structuring dcollections as a hierarchy, and
> furthermore that makes me hope it's possible you'd be willing to yank
> that dinosaur brain.

I did not have a strong justification for using interfaces originally, and my first interface design shows that.  But I think with careful design, the interfaces in the D2 version are pretty useful in some situations, all of which could be essential to a project's design.

In other words, I don't see the fact that Java or Tango's original collection package had interfaces as a "wart".

The problem I see right now is, a lot of that utility is moot since D is not very good at dynamic linking.  Since you necessarily have to statically link Phobos and dcollections, it makes no sense to keep binary compatibility, or hide implementation.

> Looks like we're heading straight to stalemate once again.
>
> In the interest of time, it would be better to get to stalemate (or, hopefully, agreement) so we know whether dcollections will be integrated within Phobos or whether I should spend this and next weeks' evenings coding. To that end, please let me know whether it's worth that I spend time on putting together a list of proposed changes.

I think if we can keep dcollections as classes, then the opportunity to have interfaces still exists for the future.  If that's the case, we can ditch the interfaces for now, and revisit it when D's dynamic lib support gets better.

So let's move on to other issues.  I haven't changed my mind on interface utility, but there seems to be not much support for the idea.

-Steve
May 24, 2010
On Mon, 24 May 2010 12:06:18 -0400, Robert Jacques <sandford@jhu.edu> wrote:

> On Mon, 24 May 2010 08:06:29 -0400, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
>> On Thu, 20 May 2010 12:46:59 -0400, Robert Jacques <sandford@jhu.edu> wrote:
>>
>>> On Thu, 20 May 2010 06:34:42 -0400, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> [snip]
>>>>
>>>> I understand these points, but I'm already using interfaces to copy data between containers.  I don't have to, I could have used generic code, but this way, only one function is instantiated to copy data from all the other containers.  The problem with using generic code is that the compiler will needlessly duplicate functions that are identical.
>>>
>>> This sounds like a failure of design. Why aren't you using ranges to do this?
>>
>> Why are ranges necessarily better?  I'm using the container's opApply, which I'd probably still use even if there were no interfaces.  opApply allows much more possibilities in traversal than ranges which cannot use stack recursion without heap activity.
>
> Ranges are not necessarily better and may have some minor amount of overhead over a well optimized opApply. Then again, the opposite may be true. The point is that if the difference between opApply and a range is more than trivial, you've probably had a failure of design occur.

The difference is trivial.  I support both ranges and opApply.  The main benefit from using opApply being only one function is compiled by the compiler.

>>>
>>>> Using interfaces is not as viral as you think.  My interfaces can be used in generic code, as long as the generic code uses functions in the interfaces.  If a library returns an interface, the author is saying "I don't want you using any functions outside this interface," so why is that a bad thing?
>>>
>>> Well, needlessly duplicated functions for one. :) More importantly, the example I gave was about third party libraries which I have no control over. So this solution explicitly doesn't work. And even if everyone used templates everywhere in order to be compatible with both interfaces and classes, isn't that a viral effect of having both?
>>
>> If a 3rd party library uses interfaces, it's probably for good reason.  They most likely want to remain binary compatible with other libs, and/or want to abstract the implementation of some custom container type.  If you don't like their requirements, don't use the library.
>>
>> -Steve
>
> No, if a 3rd party library _needs_ to use interfaces it's probably for a good reason. The problem is, if they exist, people are going to use them even if they don't need them. Which therein lies the problem.

And if 3rd party X _needs_ to use interfaces, and 3rd party Y _needs_ to use interfaces, and your code depends on X and Y, and interfaces aren't defined by dcollections, where are you then?  I don't think there's any way you can define a generic standard library such that people won't create bad designs, that's a lost cause.

I think part of the problem with all this is that interfaces aren't likely to be needed any time soon (D's dynamic lib support is almost non-existent), so I'm probably going to drop the interfaces for now in the interest of progress.

-Steve
May 24, 2010
Steven Schveighoffer:
> Scope class members should solve this.  It's been thrown around forever -- essentially, you should be able to define that a class member's storage is contained within the owner object.

I too have proposed this idea, but in my opinion you can't design a big part of a standard lib on the base of a syntax/language optimization that doesn't exists yet.

Bye,
bearophile
May 24, 2010
On Mon, 24 May 2010 13:04:31 -0400, bearophile <bearophileHUGS@lycos.com> wrote:

> Steven Schveighoffer:
>> Scope class members should solve this.  It's been thrown around forever --
>> essentially, you should be able to define that a class member's storage is
>> contained within the owner object.
>
> I too have proposed this idea, but in my opinion you can't design a big part of a standard lib on the base of a syntax/language optimization that doesn't exists yet.

Yeah, but essentially, it's an optimization.  Whether the storage is stored in the same heap block or a different one really doesn't matter in terms of functionality.  Allocating one heap block vs. two is faster, that's what the OP was saying.

In other words, scope class members aren't essential to using dcollections.

-Steve
May 24, 2010
Steven Schveighoffer:
> Is it unreasonable to expect to be able to iterate the keys in a trie?  (I don't really know, I've never worked with them)

On tries you can iterate on all keys or on a sorted range of the keys.

Bye,
bearophile
May 24, 2010
Steven Schveighoffer <schveiguy@yahoo.com> wrote:

> And if 3rd party X _needs_ to use interfaces, and 3rd party Y _needs_ to use interfaces, and your code depends on X and Y, and interfaces aren't defined by dcollections, where are you then?

You're at the point where the language allows you to create a class
following an interface, which whose all methods redirect to those of
the struct. This could even be done automagically.

-- 
Simen
May 24, 2010
On Mon, 24 May 2010 13:11:27 -0400, Simen kjaeraas <simen.kjaras@gmail.com> wrote:

> Steven Schveighoffer <schveiguy@yahoo.com> wrote:
>
>> And if 3rd party X _needs_ to use interfaces, and 3rd party Y _needs_ to use interfaces, and your code depends on X and Y, and interfaces aren't defined by dcollections, where are you then?
>
> You're at the point where the language allows you to create a class
> following an interface, which whose all methods redirect to those of
> the struct. This could even be done automagically.

A situation Robert has stated he would like to avoid.

-Steve