Thread overview
D graph library
May 06, 2013
w0rp
May 08, 2013
qznc
May 06, 2013
Hello all,

I've recently been having some very tentative thoughts about whether to try and develop an effective graph/network library in D.  The goal would be to try and have a library that allows for highly efficient representation of graphs and multigraphs, both as static and dynamic entities, with very fast retrieval of graph properties (e.g. the neighbours of a given vertex, the weights of given links) for use in simulations _on_ the graph, and very fast calculation of graph metrics of various kinds.

The general field of application would be in various branches of complexity science with particular attention to those stemming from interdisciplinary physics.

There are a number of existing solutions in other languages.  igraph, written in C, is probably the most feature-complete solution, with Python and R interfaces also available; however, it has (to my eyes) a very complex and unfriendly API, albeit one that improves when the Python or R versions are used.

I haven't personally used the Boost Graph Library, beyond going through some of the documentation to explore it as an option, but more experienced colleagues have downplayed it as very complex to use.

Finally, Networkx is a Python library that seems quite friendly to use, but when exploring it I found some issues that made me reluctant to trust it as a scientifically reliable solution.  (To be fair to the maintainers, when I raised those issues with them, they did something about it quite quickly.)

I know that Philippe Sigaud started some work in this direction for D: http://svn.dsource.org/projects/dranges/trunk/dranges/docs/graph.html

... but I don't know if there is any ongoing status or intent to develop it further.  I haven't yet gone through the source code with sufficient scrutiny to decide whether this makes for a good basis for the kind of project I have in mind.

My main reasons for being tentative about moving forward with this are -- well, first, it's a major undertaking; as my day job is producing research, not code, there's something of a pressure to just use the tools already available rather than trying to optimize for my personal language choice, and I don't want to start something only to drop off because my research activities are taking me in a different direction.

Second, I don't know how useful it would be to others -- Python and C/C++ are still definitely the standard (and growing) tools.  So it would be good to know if anyone else would like to have such a library explicitly in D.

Third, in terms of producing productive code in a short space of time, I don't know if it's really worth implementing things in D rather than just providing bindings/wrappers to an existing library, with igraph being the obvious choice as it is written in C.  The reason for not doing that so far as been a certain amount of nervousness about trying to provide .di interfaces for a very large and complex codebase with lots of custom data structures (the default approach recommended for using igraph is to import the headers for THE ENTIRE LIBRARY, so it doesn't lend itself to piecewise construction of bindings).

Finally, given that many of my complaints about the existing tools relate to complex APIs and not-very-user-friendly code, I'm rather worried about my own ability (as a researcher and not a well-trained developer:-) to do something better.  My experience of code written by researchers is in general not good, though the D community has shown there are exceptions!

Now, that said, there is one very big positive motivation -- I think that D provides a language in which one could create a graph library with a genuinely friendly and well-designed API and with easily readable code, yet which has all the efficiency and scalability of igraph.  And I'd much rather work with that than any of the other solutions out there.

Some goals for a project like this would be something along the following lines:

   * Hyper-efficient performance for storage and manipulation of graphs.
     I'd want to be able to perform analyses of and/or run simulations on
     extremely large graph sizes.  This would include features such as effective
     use of parallelism (when appropriate), dynamic recalculation of various
     network quantities as nodes/edges are added/removed, and so on.

   * Written in idiomatic D style as exemplified by the best of Phobos, making
     best use of D's generics, templates, etc.

   * Easy to hook into other languages -- I think my colleagues would very much
     appreciate a Python interface.

   * Ideally, would rely _only_ on Phobos and not on any 3rd-party libraries.
     It may turn out that some desirable functionality is not yet in Phobos
     (or does not yet perform adequately), in which case those features should
     be implemented or improved in Phobos rather than this library.

Anyway, for a very tentative proposal (and I have to say, I'm really not sure whether I will commit to this), I think that's a long enough email for now -- I'd really welcome people's thoughts on these ideas and to get a feel of whether there's interest in seeing something like this implemented in D.

Thanks & best wishes,

     -- Joe
May 06, 2013
I have an interest in having a nice, reliable graph library in any language I use, whether that's Python, Java, JavaScript, or D. This desire comes from the fact that having a nice graph implementation makes it easy to solve a few particular problems.

I would be interested in assisting with development of a nice graph library in some fashion, if you decide to start such a project. I am merely the holder of a BSc, and I don't have far reaching experience with implementing graph data structures and algorithms. Nonetheless, implementing graph data structures and algorithms is something I just enjoy doing, and it's a good learning experience for me.
May 07, 2013
On 05/06/2013 07:02 PM, w0rp wrote:
> I have an interest in having a nice, reliable graph library in any language I use, whether that's Python, Java, JavaScript, or D. This desire comes from the fact that having a nice graph implementation makes it easy to solve a few particular problems.

Very much the same.  The actual range of specific issues I need to solve is fairly narrow, but having a good general solution would make it much easier.

> I would be interested in assisting with development of a nice graph library in some fashion, if you decide to start such a project. I am merely the holder of a BSc, and I don't have far reaching experience with implementing graph data structures and algorithms. Nonetheless, implementing graph data structures and algorithms is something I just enjoy doing, and it's a good learning experience for me.

Qualifications are not important, what matters is curiosity, fun and engagement with the problem :-)  Besides, if you already have some existing experience implementing these kinds of data structures and algorithms, you might actually have more practical understanding of the problem than I do right now.

Thanks for the interest!  It makes the whole idea seem much more worthwhile.

May 08, 2013
I am thinking about something like this as well, but finishing things is not my strength. ;)

What I desired were probably quite different requirements to other graph libraries. My background is compiler optimizations, where the programs are represented as a graph: libFirm [0].

1) A transaction mechanism. In debug mode, libfirm runs some checks on every graph change. However, sometimes a sequence of changes must be completed until the graph is in a valid state again. A transaction mechanism could be used to run the verifier after a complete transaction.

2) Node maps. Compiler optimizations sometimes rely on multiple analyses, which usually compute information for each node in the graph. While a hash map can be used to annotate nodes, a graph specific map should be more efficient. Additionally, a graph might know about its corresponding maps and update them implicitly, if the graph changes.


[0] http://pp.info.uni-karlsruhe.de/firm/Main_Page
May 08, 2013
On 05/08/2013 09:14 AM, qznc wrote:
> I am thinking about something like this as well, but finishing things is not my strength. ;)

Oh, tell me about it. :-P

> What I desired were probably quite different requirements to other graph libraries. My background is compiler optimizations, where the programs are represented as a graph: libFirm [0].
> 
> 1) A transaction mechanism. In debug mode, libfirm runs some checks on every graph change. However, sometimes a sequence of changes must be completed until the graph is in a valid state again. A transaction mechanism could be used to run the verifier after a complete transaction.
> 
> 2) Node maps. Compiler optimizations sometimes rely on multiple analyses, which usually compute information for each node in the graph. While a hash map can be used to annotate nodes, a graph specific map should be more efficient. Additionally, a graph might know about its corresponding maps and update them implicitly, if the graph changes.

That would be a great perspective to have.  I should add that the use of things like hash maps etc. for efficient data storage/retrieval is something I'm really not that familiar with, so it would be good to be able to get concrete advice on those aspects.  So, your thoughts and insights may still be very important even if you can't commit to a lot of programming time.

I suspect that where effective hashmaps and other container types are concerned, it may be necessary to either make (or push for) some quite extensive additions to Phobos and/or druntime.

We could of course use Steve Schveighoffer's excellent work, but I have a strong preference for avoiding 3rd-party dependencies in cases where the functionality clearly _should_ be available in standard libraries.

> [0] http://pp.info.uni-karlsruhe.de/firm/Main_Page

I love that you provided a brainfuck frontend ... :-)