Jump to page: 1 2
Thread overview
D graph library -- update
Jul 11, 2013
bearophile
Jul 11, 2013
John Colvin
Jul 11, 2013
Dmitry Olshansky
Jul 11, 2013
Dmitry Olshansky
Jul 11, 2013
w0rp
Jul 11, 2013
H. S. Teoh
July 10, 2013
Hi all,

Following earlier discussion about a D graph library
<http://forum.dlang.org/thread/5187D514.5070109@webdrake.net>, this evening I
sat down and had a go at coming up with some basic code to support such a venture.

You can find it here: https://github.com/WebDrake/Dgraph

This takes the basic data structure from the widely-used igraph library <http://igraph.sourceforge.net> but builds around it using idiomatic D structures and algorithms.

The code currently consists of the basic data structure, implemented as a final class with methods to extract the number of vertices, the list of edges, and the degree and neighbours of a vertex.

There is also a simple test file that can be used for benchmarking against comparable igraph code.  With the current method of adding edges one by one, this code already benchmarks as faster than its igraph equivalent, running in 2.4s on my machine when compiled with gdmd -O -release -inline and 1.4s when compiled with ldmd2 and the same flags -- compared to 3s for the igraph code written in C.

However, when igraph's option to add the edges all in one go in a vector is enabled, igraph is significantly faster.  This surely reflects a mix of memory management (how many allocs/appends?) and also the amount of sorting and other updates that occur when edges are added.  So, I think that igraph can probably still be beaten here.

The memory usage for the example D code is slightly higher than for its comparable igraph C code, clocking in at about 2MB as opposed to 1.7.

I've chosen igraph as a point of comparison because it's known for being both the fastest and most scalable graph library out there.  Many of igraph's design choices seem focused on minimizing memory usage, sometimes at the expense of all-out speed: for example, if an algorithm needs an adjacency list representation of the graph, igraph will generate one at that moment, and destroy it afterwards.

However, on the basis of the simple work done here, I'm fairly confident that D can do better.  The code here is _much_ simpler than the equivalent functions in igraph, and has not yet been optimized in any way either for speed or for memory management.  Yet it seems to be performing on par with or better than igraph within the limits of its current design constraints.

I'll be trying to implement a few additional little pieces of functionality in the next days, perhaps some graph metrics which can give another point of performance comparison.

Anyway, comments welcome, both positive and negative.

Thanks & best wishes,

    -- Joe
July 11, 2013
Joseph Rushton Wakeling:

> The memory usage for the example D code is slightly higher than for its
> comparable igraph C code, clocking in at about 2MB as opposed to 1.7.

If you want a meaningful memory comparison then perhaps you need a 10 or 100 (or more) times larger graph.

Bye,
bearophile
July 11, 2013
On 07/11/2013 02:18 AM, bearophile wrote:
> If you want a meaningful memory comparison then perhaps you need a 10 or 100 (or more) times larger graph.

I know, and it's coming. :-)  The main memory-related issues will probably not show up in a situation like this where all we're doing is storing the graph data, but in the case where algorithms are being performed on the data.

July 11, 2013
On 07/11/2013 02:24 AM, Joseph Rushton Wakeling wrote:
> I know, and it's coming. :-)  The main memory-related issues will probably not show up in a situation like this where all we're doing is storing the graph data, but in the case where algorithms are being performed on the data.

For comparison, I performed the same tests but with a 10_000 node graph.  Here we see similar memory use, but igraph outperforms dgraph by a factor of nearly 10 even with the insert of nodes one at a time.

Profiling shows that the time difference is accounted for by the sorting algorithm used in the indexEdges() method:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 93.12    110.01   110.01    20000     0.01     0.01
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165Z11SortedRange561__T13quickSortImplS5066dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165
Z11SortedRange11__lambda168TS3std5range14__T3ZipTAmTAmZ3ZipZ13quickSortImplMFS3std5range14__T3ZipTAmTAmZ3ZipZv
  3.88    114.59     4.58    20000     0.00     0.00
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165Z11SortedRange561__T13quickSortImplS5066dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165
Z11SortedRange11__lambda168TS3std5range14__T3ZipTAmTAmZ3ZipZ13quickSortImplMFS3std5range14__T3ZipTAmTAmZ3ZipZv
  1.81    116.73     2.14 1124043790     0.00     0.00
_D3std5range14__T3ZipTAmTAmZ3Zip7opIndexMFNaNbNfmZS3std8typecons14__T5TupleTmTmZ5Tuple
  0.59    117.42     0.70 203164131     0.00     0.00
_D3std9algorithm43__T6swapAtTS3std5range14__T3ZipTAmTAmZ3ZipZ6swapAtFNaNbNfS3std5range14__T3ZipTAmTAmZ3ZipmmZv
  0.42    117.92     0.50    20000     0.00     0.01
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv

By default, schwartzSort uses SwapStrategy.unstable, which means quicksort is used as the sorting mechanism.  If we replace this with SwapStrategy.stable or SwapStrategy.semistable, TimSort is used instead, and this dramatically cuts the running time -- from almost 2 minutes to under 3 seconds (compared to 13 seconds for igraph with one-by-one edge addition), and under 2 if ldmd2 is used for compiling.

This comes at a cost of increasing memory usage from about 3.7 MB (almost identical to igraph) to 5.6.

Probably the best way to secure optimal performance without the memory hit is to use an insertion sort (or maybe a smoothsort?).  I guess that Timsort would be best to use in the case of adding multiple edges in one go, unless no edges at all have been added before, in which case quicksort would probably be optimal; though quicksort would probably remain best if memory management is the priority.

So, the new D code is still competitive with igraph, but needs some smoothing around the edges (quite literally!:-).
July 11, 2013
On Thursday, 11 July 2013 at 10:25:40 UTC, Joseph Rushton Wakeling wrote:
> On 07/11/2013 02:24 AM, Joseph Rushton Wakeling wrote:
>> I know, and it's coming. :-)  The main memory-related issues will probably not
>> show up in a situation like this where all we're doing is storing the graph
>> data, but in the case where algorithms are being performed on the data.
>
> For comparison, I performed the same tests but with a 10_000 node graph.  Here
> we see similar memory use, but igraph outperforms dgraph by a factor of nearly
> 10 even with the insert of nodes one at a time.
>
> Profiling shows that the time difference is accounted for by the sorting
> algorithm used in the indexEdges() method:
>
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  93.12    110.01   110.01    20000     0.01     0.01
> _D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165Z11SortedRange561__T13quickSortImplS5066dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165
> Z11SortedRange11__lambda168TS3std5range14__T3ZipTAmTAmZ3ZipZ13quickSortImplMFS3std5range14__T3ZipTAmTAmZ3ZipZv
>   3.88    114.59     4.58    20000     0.00     0.00
> _D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165Z11SortedRange561__T13quickSortImplS5066dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165
> Z11SortedRange11__lambda168TS3std5range14__T3ZipTAmTAmZ3ZipZ13quickSortImplMFS3std5range14__T3ZipTAmTAmZ3ZipZv
>   1.81    116.73     2.14 1124043790     0.00     0.00
> _D3std5range14__T3ZipTAmTAmZ3Zip7opIndexMFNaNbNfmZS3std8typecons14__T5TupleTmTmZ5Tuple
>   0.59    117.42     0.70 203164131     0.00     0.00
> _D3std9algorithm43__T6swapAtTS3std5range14__T3ZipTAmTAmZ3ZipZ6swapAtFNaNbNfS3std5range14__T3ZipTAmTAmZ3ZipmmZv
>   0.42    117.92     0.50    20000     0.00     0.01
> _D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv
>
> By default, schwartzSort uses SwapStrategy.unstable, which means quicksort is
> used as the sorting mechanism.  If we replace this with SwapStrategy.stable or
> SwapStrategy.semistable, TimSort is used instead, and this dramatically cuts the
> running time -- from almost 2 minutes to under 3 seconds (compared to 13 seconds
> for igraph with one-by-one edge addition), and under 2 if ldmd2 is used for
> compiling.
>
> This comes at a cost of increasing memory usage from about 3.7 MB (almost
> identical to igraph) to 5.6.
>
> Probably the best way to secure optimal performance without the memory hit is to
> use an insertion sort (or maybe a smoothsort?).  I guess that Timsort would be
> best to use in the case of adding multiple edges in one go, unless no edges at
> all have been added before, in which case quicksort would probably be optimal;
> though quicksort would probably remain best if memory management is the priority.
>
> So, the new D code is still competitive with igraph, but needs some smoothing
> around the edges (quite literally!:-).

This is very promising. Great work!
July 11, 2013
On 7/11/13 3:25 AM, Joseph Rushton Wakeling wrote:
> By default, schwartzSort uses SwapStrategy.unstable, which means quicksort is
> used as the sorting mechanism.  If we replace this with SwapStrategy.stable or
> SwapStrategy.semistable, TimSort is used instead, and this dramatically cuts the
> running time -- from almost 2 minutes to under 3 seconds (compared to 13 seconds
> for igraph with one-by-one edge addition), and under 2 if ldmd2 is used for
> compiling.

Unstable sort should be faster than stable sort. If I remember correctly (1) our TimSort is indeed slower than quicksort on random numbers, (2) there is a performance bug that makes our quicksort perform quadratically on data that's essentially sorted but has one unsorted element at the end. Is that the case?

Andrei
July 11, 2013
On 07/11/2013 05:17 PM, Andrei Alexandrescu wrote:
> Unstable sort should be faster than stable sort. If I remember correctly (1) our TimSort is indeed slower than quicksort on random numbers, (2) there is a performance bug that makes our quicksort perform quadratically on data that's essentially sorted but has one unsorted element at the end. Is that the case?

That's _exactly_ the case here. :-\

Thanks for the clarification!
July 11, 2013
On Thursday, 11 July 2013 at 15:17:03 UTC, Andrei Alexandrescu wrote:
> (2) there is a performance bug that makes our quicksort perform quadratically on data that's essentially sorted but has one unsorted element at the end.

Do you have a link? I couldn't find it on the bugzilla, though I do remember a discussion of this from a while back.
July 11, 2013
I don't like it. Here is why.

1. You can only use a size_t type as the vertex type. What about strings?
2. The distinction between directed and undirected graphs is made with an boolean type parameter? Why not an enum? Better yet, why not just use interfaces? (You are already using classes and garbage collected memory.)
3. I have to control the graph's capacity manually, and I can't use arbitrary numbers for edges.
4. Adding an edge doesn't add vertices.

My most common use of graphs is to start from nothing, build from adding edges arbitrarily, usually from IDs which may be integers, and may be strings, and then use graph algorithms to gain some understanding of the relation between objects mapped to these IDs. With this library, this will be difficult or impossible.
July 11, 2013
On Thursday, 11 July 2013 at 18:14:01 UTC, w0rp wrote:
> I don't like it. Here is why.
>
> 1. You can only use a size_t type as the vertex type. What about strings?

This is a core graph type designed explicitly for speed and efficiency. If you want string ids for vertices, it's better to wrap this with maps from string to size_t and back again. Having strings or other arbitrary data types involved in the core data type will just be a nightmare space- and speed-wise.

> 2. The distinction between directed and undirected graphs is made with an boolean type parameter? Why not an enum?

It is an enum.

> why not just use interfaces? (You are already using classes and garbage collected memory.)

It's in consideration. I haven't yet identified a clear need.

> 3. I have to control the graph's capacity manually, and I can't use arbitrary numbers for edges.
> 4. Adding an edge doesn't add vertices.

Again, those things are probably best achieved by wrappers that mediate the adding of edges.

> My most common use of graphs is to start from nothing, build from adding edges arbitrarily, usually from IDs which may be integers, and may be strings, and then use graph algorithms to gain some understanding of the relation between objects mapped to these IDs. With this library, this will be difficult or impossible.

I understand those requirements, I just think that the core data structure on which calculations are done is not the ideal place to implement them.

(Sorry for terse replies, I'm writing on my phone...)
« First   ‹ Prev
1 2