May 24, 2010
Steven Schveighoffer:

> complex code?
> a ~= b;
> Seems pretty not complex.

I meant on the implementation side. Arrays are used all the time, and they are basic blocks to be used everywhere so a programmer probably must understand how they work inside.


> That part is still pretty new.  What do you find is messy and complex about it?

The internal semantics of the current design of dynamic arrays is not transparent enough, this means I often don't know what it is doing. This can be acceptable for a more complex data structure as the associative arrays.
And it's not easy to know exactly where to use assumeSafeAppend.
Despite their complexity, D dynamic arrays are not safe enough, see the small thread about passing D dynamic arrays by reference on default.


> When I said "good enough for most situations" I didn't mean "most situations where appending speed is crucial to the success of the application" :P

I have written few hundreds of small D programs, and I have seen that appending often happens (more often that I have originally thought) in points where performance is important enough. Using the Appender of my dlibs was usually able to solve the problem.

I don't know better solutions, so maybe I'm just wasting your time, I am sorry. Before seeing D I have never thought that designing good dynamic arrays can be so hard :-)

Bye,
bearophile
May 24, 2010
On 05/24/2010 01:49 PM, Steven Schveighoffer wrote:
> On Mon, 24 May 2010 12:27:10 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>>> A pointer to the end element would be required for both appending and
>>> back().
>>
>> This further erodes my confidence. Size needs to be maintained _and_ a
>> pointer to the last element must be maintained. For many uses, all of
>> this stuff is unnecessary overhead.
>
> You make it sound like incrementing a counter and changing a pointer are
> incredibly slow operations :)

They are. Ask Walter for a good example.

> The overhead of storage is minimal
> compared to the elements of the list, which do not contain said overhead.

I'm talking containers of lists here.

>> I think we need to build the shared vision that in Phobos we're
>> competing against hand-written, highly-optimized code. This is the
>> fight STL took and largely won. We can't afford to lower our standards
>> for one tiny bit.
>>
>> I was once talking over a beer about the advantages of container
>> abstractions. Walter slapped my face by defining the fastest SLL in
>> the world on a napkin. It looked like this:
>>
>> struct List { int v; List * next; }
>>
>> He took so little time to define that, and the primitives he needed
>> were so simple and fast, that it was an absolute overkill to replace
>> that with a generic whiz-bang container. If I'd mentioned there'd be
>> any extra data involved, that would mean the end of negotiation. "Why
>> do you need that?" "For length()!" "But I don't need length()!" "Well
>> you have to." "That's Communism in design!"
>>
>> And I agree.
>
> But something like the above is prone to all sorts of nasty things, like
> inadvertent cycles in your list.

Yeah, and I'd have my list of arguments to add too.

> Which may be acceptable to you if you
> want the bleeding fastest speed available. There will always be tailored
> code crafted to be as fast as possible for some specific design and
> dcollections and your slist just won't fit the bill.

Except that the set is considerably smaller for a well-designed slist.

> And dcollections' link list node is exactly what you wrote there (with a
> prev pointer of course). The only difference is, I don't define an
> element is the same as a list. The whole list has it's own data,
> including a pointer to the first element in the list.
>
> Look at D's arrays. Is appending with D's arrays the fastest it possibly
> could be? Hell no, but it's good enough for most situations, and safe.

If I could make them faster without adversely affecting the performance of other operations, I would. There's considerable constraints at plain in the array design for historical and other reasons.


Andrei
May 24, 2010
On 05/24/2010 02:01 PM, Pelle wrote:
> On 05/24/2010 06:27 PM, Andrei Alexandrescu wrote:
>> struct List { int v; List * next; }
>>
>
> While I do agree with that design for a list, that is no reference type.

List* is.

My point was that the pressure for a really simple hand-rolled SLL is huge. A good abstraction should convince the Walters of this world that the abstraction is actually better than the hand-rolled SLL.

Andrei
May 24, 2010
On 05/24/2010 02:29 PM, bearophile wrote:
> Steven Schveighoffer:
>> Look at D's arrays.  Is appending with D's arrays the fastest it possibly
>> could be?  Hell no, but it's good enough for most situations, and safe.
>
> Append in D dynamic arrays is awful, it's slow and requires complex code (and currently there are not explanations about how it works in the D docs, see http://d.puremagic.com/issues/show_bug.cgi?id=4091 ), so it's not a good example to be copied :-) It's not fast enough for most situations, that's why there is the need of an Appender still.

When was the last time you measured? I thought the speed has largely improved since Steve integrated his work.

Andrei
May 24, 2010
On Mon, 24 May 2010 15:58:46 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 05/24/2010 01:12 PM, Steven Schveighoffer wrote:
>> On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> On 05/24/2010 12:11 PM, bearophile wrote:
>>>> Steven Schveighoffer:
>>>>> Is it unreasonable to expect to be able
>>>>> to iterate the keys in a trie? (I don't really know, I've never worked
>>>>> with them)
>>>>
>>>> On tries you can iterate on all keys or on a sorted range of the keys.
>>>
>>> There's one more way of iteration that's unique to tries - by key
>>> element (e.g. if key is string, you iterate by character). That would
>>> be an exploratory iteration.
>>>
>>> At each step in the iteration you pass a character, so popFront()
>>> takes an argument, it's popFront(char). trie.popFront('a') takes you
>>> to all elements in the trie that start with 'a'. Then you can continue
>>> exploring by e.g. trie.popFront('x') which takes you to all elements
>>> that start with "ax". At any point the range may become dry (nothing
>>> with this prefix). If the range is not empty, front() gives you
>>> information about all strings that share the particular prefix you
>>> have spanned (count and breadth comes to mind).
>>
>> That could be supported.
>
> To paraphrase a commercial, "there's an interface for that" :o).

No there isn't.  I wouldn't make that an interface, as it's not foreachable.  I'd build a special range for it.

>> But iterating over the keys, is that not advisable? Would your trie
>> iterator be the only way to iterate over the elements? It seems rather
>> non-useful.
>
> I think a trie, like many other nontrivial containers, supports several means of iteration.

Including keys?!!!

Still not a straight answer.  If you should be able to iterate keys, then say you should be able to iterate keys.  If you shouldn't, then say that.

>> Or is there another way to iterating the keys that would be more
>> efficient than building an array to contain the 'key' for each node?
>
> I don't know. I'm just glad I can explore that without having to think in interfaces.

I don't really know what this means.  The interfaces are a way to plug containers in at runtime.  They are not meant to be the entire API for the collection.  You can absolutely add whatever functionality that does not fit in the interface as an additional feature, and users will have to use the exact type in order to use that feature.

-Steve
May 24, 2010
On 05/24/2010 03:18 PM, Steven Schveighoffer wrote:
> On Mon, 24 May 2010 15:58:46 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> On 05/24/2010 01:12 PM, Steven Schveighoffer wrote:
>>> On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu
>>> <SeeWebsiteForEmail@erdani.org> wrote:
>>>
>>>> On 05/24/2010 12:11 PM, bearophile wrote:
>>>>> Steven Schveighoffer:
>>>>>> Is it unreasonable to expect to be able
>>>>>> to iterate the keys in a trie? (I don't really know, I've never
>>>>>> worked
>>>>>> with them)
>>>>>
>>>>> On tries you can iterate on all keys or on a sorted range of the keys.
>>>>
>>>> There's one more way of iteration that's unique to tries - by key
>>>> element (e.g. if key is string, you iterate by character). That would
>>>> be an exploratory iteration.
>>>>
>>>> At each step in the iteration you pass a character, so popFront()
>>>> takes an argument, it's popFront(char). trie.popFront('a') takes you
>>>> to all elements in the trie that start with 'a'. Then you can continue
>>>> exploring by e.g. trie.popFront('x') which takes you to all elements
>>>> that start with "ax". At any point the range may become dry (nothing
>>>> with this prefix). If the range is not empty, front() gives you
>>>> information about all strings that share the particular prefix you
>>>> have spanned (count and breadth comes to mind).
>>>
>>> That could be supported.
>>
>> To paraphrase a commercial, "there's an interface for that" :o).
>
> No there isn't. I wouldn't make that an interface, as it's not
> foreachable. I'd build a special range for it.
>
>>> But iterating over the keys, is that not advisable? Would your trie
>>> iterator be the only way to iterate over the elements? It seems rather
>>> non-useful.
>>
>> I think a trie, like many other nontrivial containers, supports
>> several means of iteration.
>
> Including keys?!!!
>
> Still not a straight answer. If you should be able to iterate keys, then
> say you should be able to iterate keys. If you shouldn't, then say that.

Sorry. Yes, by-key iteration should be possible.

Andrei
May 24, 2010
Andrei Alexandrescu:
> When was the last time you measured? I thought the speed has largely improved since Steve integrated his work.

I have timed it after that integration. I have seen a performance improvement, but it's small. I can perform some syntactic benchmarks.

Later,
bearophile
May 24, 2010
On Mon, 24 May 2010 16:35:03 -0400, bearophile <bearophileHUGS@lycos.com> wrote:

> Andrei Alexandrescu:
>> When was the last time you measured? I thought the speed has largely
>> improved since Steve integrated his work.
>
> I have timed it after that integration. I have seen a performance improvement, but it's small. I can perform some syntactic benchmarks.

Performance was vastly improved for situations where one was appending to multiple arrays at once.  For appending to a single array in a loop, it is improved, but not that much.

But the main improvement was for safety.  The append operation on D's dynamic arrays is a tradeoff between footprint, speed, and safety.  It also does not and should not compromise performance on operations besides appending.

You will always be able to outperform the D append operation with more focus on the append operation, but D's array appending is fine for most situations.

-Steve
May 24, 2010
On Mon, 24 May 2010 16:07:22 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 05/24/2010 01:49 PM, Steven Schveighoffer wrote:
>> On Mon, 24 May 2010 12:27:10 -0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>>> A pointer to the end element would be required for both appending and
>>>> back().
>>>
>>> This further erodes my confidence. Size needs to be maintained _and_ a
>>> pointer to the last element must be maintained. For many uses, all of
>>> this stuff is unnecessary overhead.
>>
>> You make it sound like incrementing a counter and changing a pointer are
>> incredibly slow operations :)
>
> They are. Ask Walter for a good example.

Actually, I realize that pointing to the end node isn't good enough, because once you remove that node, there is no way to get to the previous node without traversing the list again.

So slist cannot implement the List interface, you were right.

>> The overhead of storage is minimal
>> compared to the elements of the list, which do not contain said overhead.
>
> I'm talking containers of lists here.

I was more talking about the ratio of elements to lists.  For example, the Hash implementation uses the same link list nodes as elements in its table, and generally a hash has very few elements per list, so overhead on the list head would be too much.  On top of that, a LinkList has its own allocator, meaning each one consumes a large chunk of data assuming you will be adding lots of elements to it.

So maybe we need to refine the implementation building blocks that back the dcollections classes so they are useable in special situations where performance is critical.  It actually would be pretty trivial to define SLink as a specialization of dcollections' Link struct, and that would essentially give you a lot of functionality in terms of your ideal list type.

>> Which may be acceptable to you if you
>> want the bleeding fastest speed available. There will always be tailored
>> code crafted to be as fast as possible for some specific design and
>> dcollections and your slist just won't fit the bill.
>
> Except that the set is considerably smaller for a well-designed slist.

The set of problems that an slist solves is also considerably smaller.  Having that backwards capability enables more algorithms.

-Steve
May 24, 2010
On Mon, 24 May 2010 16:27:46 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Sorry. Yes, by-key iteration should be possible.

OK, so we should be able to iterate keys.  And the keys are not stored in the trie structure itself.  So how does one iterate the keys of the container without reconstructing them from the trie nodes using the heap?  Forget about interfaces, pretend you were writing it separate from dcollections.

-Steve