August 14, 2019
On 08/13/2019 03:12 PM, Mirjam Akkersdijk wrote:

> For the application I am writing, it makes very much sense to use a DLL.

Yes, Dynamically Linked Libraries can be useful. Oh wait... You mean Doubly-Linked List. :o)

> I tried setting the length first and iterating through it with `nodes[i]
> = ...`

As Mike Parker said, it should be .reserve.

> and the implementation did not run noticeably faster (10 trials),

That's the most important part: You profiled and it was the same. (Although, profiling is tricky business as well; it can be difficult to measure the operation that you are interested in.)

The performance will depend on many factors:

* First, if there aren't many number of nodes it shouldn't matter. (That's why I would choose the simpler arrays.)

* Doubly-linked list nodes will have two extra pointers per element. As a general rule, larger the data slower the data structure.

* Pointer chasing through 'next' pointers will make the program jump around in memory. This will be slower compared to consecutive elements that arrays provide. But if processing the elements cause pointer chasing anyway, then it shouldn't matter much. Another topic to read and watch is "data oriented programming". This talk by Mike Acton is eye opening:

  https://www.youtube.com/watch?v=rX0ItVEVjHc

* Is data appended and removed throughout the life of the program or are the nodes set once in the beginning and then only used?

* Are elements accessed randomly or always in sequence. (If the latter, modern CPUs promise that arrays will be better.)

* etc.

Ali

August 14, 2019
On Tuesday, 13 August 2019 at 18:28:35 UTC, Ali Çehreli wrote:
> On 08/13/2019 10:33 AM, Mirjam Akkersdijk wrote:
> > On Tuesday, 13 August 2019 at 14:04:45 UTC, Sebastiaan Koppe
> wrote:
>
> >> Convert the nodes into an D array, sort the array with
> nodes.sort!"a.x
> >> < b.x" and then iterate the array and repair the next/prev
> pointers.
>
> If possible, I would go further and ditch the linked list altogether: Just append the nodes to an array and then sort the array. It has been shown in research, conference presentations, and in personal code to be the fasted option is most (or all) cases.
>
> > doesn't the nature of the dynamic array slow it down a bit?
>
> Default bounds checking is going to cost a tiny bit, which you can turn off after development with a compiler flag. (I still wouldn't.)
>
> The only other option that would be faster is an array that's sitting on the stack, created with alloca. But it's only for cases where the thread will not run out of stack space and the result of the array is not going to be used.
>
> > can't I define an array of fixed size, which is dependent on
> the input
> > of the function?
>
>     arr.length = number_of_elements;
>
> All elements will be initialized to the element's default value, which happens to be null for pointers. (If we are back to linked list Node pointers.)
>
> However, I wouldn't bother with setting length either as the cost of automatic array resizing is amortized, meaning that it won't hurt the O(1) algorithmic complexity in the general case. In the GC case that D uses, it will be even better: because if the GC knowns that the neighboring memory block is free, it will just add that to the dynamic array's capacity without moving elements to the new location.
>
> Summary: Ditch the linked list and put the elements into an array. :)
>

There are mainly three reasons why arrays are nowadays faster than double linked lists:
- pointer chasing can difficultly be paralized and defeats prefetching. Each pointer load may cost the full latency to memory (hundreds of cycles). In a multiprocessor machine may also trigger a lot of coherency trafic.
- on 64 bit systems 2 pointers cost 16 bytes. If the payload is small, there is more memory used in the pointer than in the data.
- when looping in an array the OO machinery will be able to parallelize execution beyond loop limits.
- reduced allocation, i.e. allocation is done in bulk => faster GC for D.

It is only when there are a lot of external references to the payload in the list that using an array may become too unwieldy, i.e. if moving an element in memory requires the update of other pointers outside of the list.

1 2 3
Next ›   Last »