May 24, 2010
On 05/24/2010 12:11 PM, bearophile wrote:
> Steven Schveighoffer:
>> Is it unreasonable to expect to be able
>> to iterate the keys in a trie?  (I don't really know, I've never worked
>> with them)
>
> On tries you can iterate on all keys or on a sorted range of the keys.

There's one more way of iteration that's unique to tries - by key element (e.g. if key is string, you iterate by character). That would be an exploratory iteration.

At each step in the iteration you pass a character, so popFront() takes an argument, it's popFront(char). trie.popFront('a') takes you to all elements in the trie that start with 'a'. Then you can continue exploring by e.g. trie.popFront('x') which takes you to all elements that start with "ax". At any point the range may become dry (nothing with this prefix). If the range is not empty, front() gives you information about all strings that share the particular prefix you have spanned (count and breadth comes to mind).


Andrei
May 24, 2010
On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 05/24/2010 12:11 PM, bearophile wrote:
>> Steven Schveighoffer:
>>> Is it unreasonable to expect to be able
>>> to iterate the keys in a trie?  (I don't really know, I've never worked
>>> with them)
>>
>> On tries you can iterate on all keys or on a sorted range of the keys.
>
> There's one more way of iteration that's unique to tries - by key element (e.g. if key is string, you iterate by character). That would be an exploratory iteration.
>
> At each step in the iteration you pass a character, so popFront() takes an argument, it's popFront(char). trie.popFront('a') takes you to all elements in the trie that start with 'a'. Then you can continue exploring by e.g. trie.popFront('x') which takes you to all elements that start with "ax". At any point the range may become dry (nothing with this prefix). If the range is not empty, front() gives you information about all strings that share the particular prefix you have spanned (count and breadth comes to mind).

That could be supported.

But iterating over the keys, is that not advisable?  Would your trie iterator be the only way to iterate over the elements?  It seems rather non-useful.

Or is there another way to iterating the keys that would be more efficient than building an array to contain the 'key' for each node?

-Steve
May 24, 2010
On Thu, 20 May 2010 02:11:58 -0400, Bernard Helyer <b.helyer@gmail.com> wrote:

> On 20/05/10 13:39, Bernard Helyer wrote:
>> Oooohhh goody goody goody!  n_n
>>
>>
>> I'm in the process of making a little toy project ATM. I'll shall
>> integrate dcollections 2.0a into ASAP.
>
> ArrayList doesn't compile with warnings as it overrides opEquals, but doesn't mark it as such.

http://www.dsource.org/projects/dcollections/ticket/5

Any other problems, please create a ticket (including your Thing one, but I'm not sure what you were doing there :)

-Steve
May 24, 2010
On Mon, 24 May 2010 12:27:10 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 05/24/2010 11:14 AM, Steven Schveighoffer wrote:
>> On Mon, 24 May 2010 11:45:44 -0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> On 05/24/2010 10:23 AM, Steven Schveighoffer wrote:
>>>> On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu
>>>> <SeeWebsiteForEmail@erdani.org> wrote:
>>>>
>>>>> On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:
>>>>>> length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't
>>>>>> possible,
>>>>>
>>>>> Do you agree that's an awkwardness?
>>>>
>>>> Yes, at that point it's an optimization. The only place where I assume
>>>> length could be invalid is when working with the Iterator!V type, which
>>>> might not be a dcollections object.
>>>
>>> That doesn't matter. Since you define NO_LENGTH_SUPPORT and you have
>>> interfaces, the door is ajar forever. Client code can't write this:
>>>
>>> auto total = cont1.length + cont2.length;
>>>
>>> because that is incorrect code (that compiles, runs, and produces some
>>> odd result).
>>>
>>> So don't take it lightly. NO_LENGTH_SUPPORT means no length support.
>>> Then pretending it's supported will only harm everyone.
>>
>> I get what you are saying. What I meant was it's moot if you are not
>> using interfaces. If you don't know what the underlying type is, then
>> you have to do a runtime check.
>>
>> I agree it's awkward, and I could have not included length as a member
>> of Iterator, in which case it would be on all the container types and
>> NO_LENGTH_SUPPORT would not exist. All containers are meant to have a
>> valid length, it is only with non-container Iterators that length can be
>> NO_LENGTH_SUPPORT.
>>
>>>> However, supporting non-length containers via
>>>> generic programming vs. interfaces would be much smoother.
>>>>
>>>>>> In that case, you can do
>>>>>> coll.begin.empty to determine if the collection has any elements.
>>>>>
>>>>> Then why not make length optional and define empty? From the above it
>>>>> looks like an obviously better design.
>>>>>
>>>>>> But all dcollections are bi-directional anyways, there is no singly
>>>>>> linked list. That was a decision I made early on, because it allows
>>>>>> more
>>>>>> assumptions about dcollections' containers. I think length-less singly
>>>>>> linked lists would be a good addition to phobos that are not part
>>>>>> of the
>>>>>> collection hierarchy, they are almost on par with builtin arrays as
>>>>>> being so simple.
>>>>>
>>>>> Do you see dcollections' inability to express slists as a problem?
>>>>
>>>> I don't see them as unable to express slists. They would break the
>>>> design guidelines that I set for collections being iterable in both
>>>> directions, but an slist fits fine within the List interface.
>>>
>>> What's the required complexity of concat?
>>
>> O(n) with n being the number of elements added
>>
>>> Is add expected to put things in a specific place? How does slist
>>> implement back()?
>>>
>>> slist does not fit the List interface.
>>
>> A pointer to the end element would be required for both appending and
>> back().
>
> This further erodes my confidence. Size needs to be maintained _and_ a pointer to the last element must be maintained. For many uses, all of this stuff is unnecessary overhead.

You make it sound like incrementing a counter and changing a pointer are incredibly slow operations :)  The overhead of storage is minimal compared to the elements of the list, which do not contain said overhead.

> I think we need to build the shared vision that in Phobos we're competing against hand-written, highly-optimized code. This is the fight STL took and largely won. We can't afford to lower our standards for one tiny bit.
>
> I was once talking over a beer about the advantages of container abstractions. Walter slapped my face by defining the fastest SLL in the world on a napkin. It looked like this:
>
> struct List { int v; List * next; }
>
> He took so little time to define that, and the primitives he needed were so simple and fast, that it was an absolute overkill to replace that with a generic whiz-bang container. If I'd mentioned there'd be any extra data involved, that would mean the end of negotiation. "Why do you need that?" "For length()!" "But I don't need length()!" "Well you have to." "That's Communism in design!"
>
> And I agree.

But something like the above is prone to all sorts of nasty things, like inadvertent cycles in your list.  Which may be acceptable to you if you want the bleeding fastest speed available.  There will always be tailored code crafted to be as fast as possible for some specific design and dcollections and your slist just won't fit the bill.

And dcollections' link list node is exactly what you wrote there (with a prev pointer of course).  The only difference is, I don't define an element is the same as a list.  The whole list has it's own data, including a pointer to the first element in the list.

Look at D's arrays.  Is appending with D's arrays the fastest it possibly could be?  Hell no, but it's good enough for most situations, and safe.

-Steve
May 24, 2010
On 05/24/2010 06:27 PM, Andrei Alexandrescu wrote:
> struct List { int v; List * next; }
>

While I do agree with that design for a list, that is no reference type.

I thought we wanted reference types.
May 24, 2010
Steven Schveighoffer:
> Look at D's arrays.  Is appending with D's arrays the fastest it possibly could be?  Hell no, but it's good enough for most situations, and safe.

Append in D dynamic arrays is awful, it's slow and requires complex code (and currently there are not explanations about how it works in the D docs, see http://d.puremagic.com/issues/show_bug.cgi?id=4091 ), so it's not a good example to be copied :-) It's not fast enough for most situations, that's why there is the need of an Appender still.

Bye,
bearophile
May 24, 2010
And the appending is hard to use too, see the ridiculously complex to use & messy capacity, reserve and  and assumeSafeAppend. So overall it's a good example of what I will never want to copy.

Bye,
bearophile
May 24, 2010
On Mon, 24 May 2010 15:32:01 -0400, bearophile <bearophileHUGS@lycos.com> wrote:

> And the appending is hard to use too, see the ridiculously complex to use & messy capacity, reserve and  and assumeSafeAppend. So overall it's a good example of what I will never want to copy.

That part is still pretty new.  What do you find is messy and complex about it?

When I said "good enough for most situations" I didn't mean "most situations where appending speed is crucial to the success of the application" :P

Most situations means you need to do appending once in a while.  A perfect example is something like toStringz.  And most situations do not require the three above mentioned functions.

-Steve
May 24, 2010
On Mon, 24 May 2010 15:29:23 -0400, bearophile <bearophileHUGS@lycos.com> wrote:

> Steven Schveighoffer:
>> Look at D's arrays.  Is appending with D's arrays the fastest it possibly
>> could be?  Hell no, but it's good enough for most situations, and safe.
>
> Append in D dynamic arrays is awful, it's slow and requires complex code.

complex code?

a ~= b;

Seems pretty not complex.

-Steve
May 24, 2010
On 05/24/2010 01:12 PM, Steven Schveighoffer wrote:
> On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> On 05/24/2010 12:11 PM, bearophile wrote:
>>> Steven Schveighoffer:
>>>> Is it unreasonable to expect to be able
>>>> to iterate the keys in a trie? (I don't really know, I've never worked
>>>> with them)
>>>
>>> On tries you can iterate on all keys or on a sorted range of the keys.
>>
>> There's one more way of iteration that's unique to tries - by key
>> element (e.g. if key is string, you iterate by character). That would
>> be an exploratory iteration.
>>
>> At each step in the iteration you pass a character, so popFront()
>> takes an argument, it's popFront(char). trie.popFront('a') takes you
>> to all elements in the trie that start with 'a'. Then you can continue
>> exploring by e.g. trie.popFront('x') which takes you to all elements
>> that start with "ax". At any point the range may become dry (nothing
>> with this prefix). If the range is not empty, front() gives you
>> information about all strings that share the particular prefix you
>> have spanned (count and breadth comes to mind).
>
> That could be supported.

To paraphrase a commercial, "there's an interface for that" :o).

> But iterating over the keys, is that not advisable? Would your trie
> iterator be the only way to iterate over the elements? It seems rather
> non-useful.

I think a trie, like many other nontrivial containers, supports several means of iteration.

> Or is there another way to iterating the keys that would be more
> efficient than building an array to contain the 'key' for each node?

I don't know. I'm just glad I can explore that without having to think in interfaces.


Andrei