Thread overview
Re: D graph library -- update
Jul 11, 2013
H. S. Teoh
July 11, 2013
On 07/11/2013 01:59 AM, Joseph Rushton Wakeling wrote:
> There is also a simple test file that can be used for benchmarking against comparable igraph code.  With the current method of adding edges one by one, this code already benchmarks as faster than its igraph equivalent, running in 2.4s on my machine when compiled with gdmd -O -release -inline and 1.4s when compiled with ldmd2 and the same flags -- compared to 3s for the igraph code written in C.

Comparable igraph code attached. :-)  You'll need to download and install igraph 0.6.5 if you want to try this out, then compile with

    gcc -O3 -o graph50 graph50.c -ligraph

There's some commented out stuff in the foo() function which implements the alternative means of adding edges to an igraph_graph_t, namely by adding them all in a big vector.  This makes igraph's performance _much_ faster than the current D implementation.

A note on the data model.  The _head and _tail arrays I guess are fairly self-explanatory, being the start and end vertices of edges in the graph. _indexHead and _indexTail are edge IDs sorted respectively by head and tail values.  Finally, _sumHead and _sumTail are vectors whose v'th entry corresponds to the total number of edges whose head (or tail) vertices have IDs less than v.

In other words, if we want to find out the number of outgoing edges from a vertex v, we can do so by calculating _sumHead[v + 1] - _sumHead[v].  If we want the number of incoming edges, we do the same but with _sumTail.

By similar tricks, we can cheaply list the neighbours of a vertex.  I'm reasonably confident that the D implementation here will allow it to reliably beat igraph for algorithms that rely on access to this information.

The cost of all this is that it's expensive to add edges, or at least, it's expensive to add edges one at a time -- but this is arguably a cost worth bearing for the speed and efficiency gains elsewhere.


July 11, 2013
On Thu, Jul 11, 2013 at 08:14:00PM +0200, w0rp wrote:
> I don't like it. Here is why.
> 
> 1. You can only use a size_t type as the vertex type. What about
> strings?
> 2. The distinction between directed and undirected graphs is made
> with an boolean type parameter? Why not an enum? Better yet, why not
> just use interfaces? (You are already using classes and garbage
> collected memory.)

I think an enum would be better.


> 3. I have to control the graph's capacity manually, and I can't use arbitrary numbers for edges.

Yeah, constraining the number of vertices/edges at ctor time is not feasible. We need to have an implementation that allows modifying the graph post-construction, such as adding edges. (Deleting may not be as important, and may be tricky to implement, but there *are* use cases for that too.)


> 4. Adding an edge doesn't add vertices.

Couldn't you just add vertices manually, since you already know what the edge is?


> My most common use of graphs is to start from nothing, build from adding edges arbitrarily, usually from IDs which may be integers, and may be strings, and then use graph algorithms to gain some understanding of the relation between objects mapped to these IDs. With this library, this will be difficult or impossible.

Yeah we need to cater to this use case too.

However, you should be able to use a helper class/graph to collect all vertices/edges, then create the library's graph type using that once all the information is known (e.g. number of vertices/edges, etc.). I wouldn't say this is impossible. Troublesome, yes, but definitely possible.


T

-- 
He who sacrifices functionality for ease of use, loses both and deserves neither. -- Slashdotter
July 11, 2013
On 07/11/2013 08:27 PM, H. S. Teoh wrote:
> Yeah, constraining the number of vertices/edges at ctor time is not feasible. We need to have an implementation that allows modifying the graph post-construction, such as adding edges.

You can add as many edges as you like -- right now, the implementation doesn't even constrain you from adding _the same edge_ multiple times.  (I think that's a feature rather than a bug -- it allows for support of multigraphs.)

You can also extend the number of vertices.  Just use the addVertices() method, although to be honest I think the easiest thing might be to just allow the user to assign to the vertexCount @property, with some checks to make sure that they don't reduce the total number of vertices below the largest head/tail value.

> (Deleting may not be as
> important, and may be tricky to implement, but there *are* use cases for
> that too.)

I am also concerned about complications related to deletion, but I'm sure it's possible to address.

> Couldn't you just add vertices manually, since you already know what the edge is?

Before calling addEdge(head, tail), calculate m = max(head, tail).  Then if m >=
vertexCount, set vertexCount = m + 1.  Then add your edges.

This is easily done with wrappers.  I think there's a benefit to the core class being strict and _not_ just automatically extending the number of vertices, just as an array doesn't automatically expand if you try and read/write from an index beyond its current length.