Thread overview | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
May 06, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
On Mon, May 06, 2013 at 06:06:44PM +0200, Joseph Rushton Wakeling wrote: > Hello all, > > I've recently been having some very tentative thoughts about whether to try and develop an effective graph/network library in D. The goal would be to try and have a library that allows for highly efficient representation of graphs and multigraphs, both as static and dynamic entities, with very fast retrieval of graph properties (e.g. the neighbours of a given vertex, the weights of given links) for use in simulations _on_ the graph, and very fast calculation of graph metrics of various kinds. I would be interested in something like this in D. Though in my case, I'm not so much interested in the graph data structures themselves, as the graph algorithms that apply to them, that can be used to solve various programming problems, like connectivity analysis of large data sets, reachability analysis, and the like. Graph coloring algorithms are also of some interest to me, since they underlie some of the visualization techniques that I'm interested in. So what I envision as far as graph libraries in D are concerned is something like the generic range API, where there is a fixed convention as to what constitutes a graph/digraph/etc., and the library itself is a template library that is able to handle any data type that conforms to the requirements. Of course, this does not preclude specific, non-generic data structures used internally by said graph algorithms, but for maximal usability, the API shouldn't require the user to perform large-scale conversion of existing data structures into graph-specific structures just so he can run, say, a reachability analysis on it. When working with large data sets, one really wants to just map class X to nodes, class Y to edges, now give me the connectivity of the resulting graph (rather than looping over the existing data set to construct a separate graph representation thereof). [...] > My main reasons for being tentative about moving forward with this are -- well, first, it's a major undertaking; as my day job is producing research, not code, there's something of a pressure to just use the tools already available rather than trying to optimize for my personal language choice, and I don't want to start something only to drop off because my research activities are taking me in a different direction. > > Second, I don't know how useful it would be to others -- Python and C/C++ are still definitely the standard (and growing) tools. So it would be good to know if anyone else would like to have such a library explicitly in D. I'm interested. Some years ago I implemented Tarjan's algorithm for computing strongly-connected components in C++. I needed to use it on an n-dimensional grid of cells, so I tried to make it generic, but due to limitations in C++ generic programming, I ended up just using an abstract class for representing the graph. This required implementing a separate graph class for every data structure I wanted to run the algorithm on, which is rather ugly. If something like this were done in D, I expect it would have a friendlier API by allowing Phobos-style genericity (using range-like abstraction for graph concepts like nodes and edges, alias for callbacks/closures to abstract away graph implementation details, etc.). One could, for example, allow overriding of methods that produce the SCC graph so that the result is produced in a custom target type that can be used directly by the application elsewhere. [...] > Now, that said, there is one very big positive motivation -- I think that D provides a language in which one could create a graph library with a genuinely friendly and well-designed API and with easily readable code, yet which has all the efficiency and scalability of igraph. And I'd much rather work with that than any of the other solutions out there. +1. > Some goals for a project like this would be something along the following lines: > > * Hyper-efficient performance for storage and manipulation of graphs. > I'd want to be able to perform analyses of and/or run simulations > on extremely large graph sizes. This would include features such > as effective use of parallelism (when appropriate), dynamic > recalculation of various network quantities as nodes/edges are > added/removed, and so on. This is a bit beyond what I need, but if we can have it, why not? > * Written in idiomatic D style as exemplified by the best of > Phobos, making best use of D's generics, templates, etc. Definitely! > * Easy to hook into other languages -- I think my colleagues would > very much appreciate a Python interface. How would you go about doing this, though? Wouldn't it require some kind of wrapper API to make it compatible with the Python interpreter? Though, if the API is done correctly in a generic way (ala the range API), this should amount to just a simple set of wrappers that maps various graph concepts directly onto Python types. It would also maximize performance by not requiring an intermediate conversion step. > * Ideally, would rely _only_ on Phobos and not on any 3rd-party > libraries. It may turn out that some desirable functionality is > not yet in Phobos (or does not yet perform adequately), in which > case those features should be implemented or improved in Phobos > rather than this library. Yes. > Anyway, for a very tentative proposal (and I have to say, I'm really not sure whether I will commit to this), I think that's a long enough email for now -- I'd really welcome people's thoughts on these ideas and to get a feel of whether there's interest in seeing something like this implemented in D. [...] I will be glad to have generic graph algorithms available in D, to minimize the need for reinventing the wheel. I've seen too much code that tries to reinvent known graphing algorithms (usually rather poorly) just because existing graphing libraries are too hard / too obscure to reuse. T -- If the comments and the code disagree, it's likely that *both* are wrong. -- Christopher |
May 07, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | >
>> * Easy to hook into other languages -- I think my colleagues would
>> very much appreciate a Python interface.
>
> How would you go about doing this, though? Wouldn't it require some kind
> of wrapper API to make it compatible with the Python interpreter?
its trivial with pyd
|
May 07, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
On 05/06/2013 07:09 PM, H. S. Teoh wrote: > I would be interested in something like this in D. Though in my case, I'm not so much interested in the graph data structures themselves, as the graph algorithms that apply to them, that can be used to solve various programming problems, like connectivity analysis of large data sets, reachability analysis, and the like. Sure. The data structures are a tool to let you implement those algorithms really well ... :-) > Graph coloring algorithms are > also of some interest to me, since they underlie some of the > visualization techniques that I'm interested in. This would be very nice to have. It's not really my direct area of interest, but it would be lovely to have a nice visualization solution as part of the library. > So what I envision as far as graph libraries in D are concerned is something like the generic range API, where there is a fixed convention as to what constitutes a graph/digraph/etc., and the library itself is a template library that is able to handle any data type that conforms to the requirements. Of course, this does not preclude specific, non-generic data structures used internally by said graph algorithms, but for maximal usability, the API shouldn't require the user to perform large-scale conversion of existing data structures into graph-specific structures just so he can run, say, a reachability analysis on it. When working with large data sets, one really wants to just map class X to nodes, class Y to edges, now give me the connectivity of the resulting graph (rather than looping over the existing data set to construct a separate graph representation thereof). That's a nice way of looking at it. Probably the basic graph interface would be one which allows a few basic questions to be asked -- "give me a range of all the nodes in the graph", "give me a range of all the edge pairs", etc., and you'd want properties like isGraph, isDirectedGraph, isMultiGraph, isBipartiteGraph, ... ... and similarly you'd want a node or edge interface which allows for similar properties. > I'm interested. Thank you :-) > Some years ago I implemented Tarjan's algorithm for computing strongly-connected components in C++. I needed to use it on an n-dimensional grid of cells, so I tried to make it generic, but due to limitations in C++ generic programming, I ended up just using an abstract class for representing the graph. This required implementing a separate graph class for every data structure I wanted to run the algorithm on, which is rather ugly. If something like this were done in D, I expect it would have a friendlier API by allowing Phobos-style genericity (using range-like abstraction for graph concepts like nodes and edges, alias for callbacks/closures to abstract away graph implementation details, etc.). One could, for example, allow overriding of methods that produce the SCC graph so that the result is produced in a custom target type that can be used directly by the application elsewhere. Sounds fun! :-) > This is a bit beyond what I need, but if we can have it, why not? >From my point of view speed and space efficiency are essential because if I continue down the current planned research direction, I'll probably wind up working on graphs with hundreds of thousands of nodes and links. > How would you go about doing this, though? Wouldn't it require some kind of wrapper API to make it compatible with the Python interpreter? As Ellery suggested, PyD should be our friend here. But I don't think interfacing with Python is an essential short-term goal. > I will be glad to have generic graph algorithms available in D, to minimize the need for reinventing the wheel. I've seen too much code that tries to reinvent known graphing algorithms (usually rather poorly) just because existing graphing libraries are too hard / too obscure to reuse. Well, here's where I am right now -- I'm currently heading towards a writeup of some fun stuff with colleagues, which will probably take a few weeks. Assuming that we are going to follow up on this, there's a reasonable case for doing this kind of graph work. So, I'll continue to read and think -- and we can continue to discuss here! -- and hopefully we can get towards some kind of action plan. One remark on licensing. I'm conscious of the fact that there is probably great advantage (from a code point of view) in careful examination of the internals of existing graph solutions. On the other hand the most highly performing rival, igraph, is GPL-licensed. This obviously creates a potential problem where (conscious or unconscious) copying or influence is concerned, because it would probably be nice to continue the typical D licensing approach and use Boost. Not that I have any personal objection to the GPL -- it's a license that I would be inclined to choose for many projects -- but where such a generally useful library is concerned, my inclination is to be permissive. Anyone else have any thoughts on this? |
May 08, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
On 05/06/2013 07:09 PM, H. S. Teoh wrote:
> So what I envision as far as graph libraries in D are concerned is something like the generic range API, where there is a fixed convention as to what constitutes a graph/digraph/etc., and the library itself is a template library that is able to handle any data type that conforms to the requirements.
So, let's have a bit of a think about what you really, really want in a graph interface.
Something along the lines of the following:
nodes /* @property that returns a RandomAccessRange to the collection
of nodes. You could also call this "vertices" or
"vertexes"; "nodes" is shorter. Should be O(1). */
edges /* Returns a RandomAccessRange to the collection of edges, i.e.
a list of node pairs. Should be O(1). */
Note that these properties should be O(1) to return, but obviously traversing
them is O(N) [O(V)] or O(E) respectively, where N [V] is the number of nodes
[vertices] and E is the number of edges.
Note also that both the above automatically define the count of nodes/edges, but we could of course add courtesy functions to give them directly.
Finally, a reasonable question is whether they should return ranges of actual nodes or edges, or ranges of node/edge IDs.
You'd also need to define various characteristics such as isDirected, isMultiGraph, isMixedGraph, isSimpleGraph, isWeighted, etc. Ideally most of these would be compile-time constraints but might also be implemented as runtime queries (e.g. isBipartiteGraph could be a compile-time constraint but equally one could at runtime query a graph without this constraint and prove that it could indeed be divided into two disjoint sets which satisfy the conditions of bipartitivity [bipartiteness?]).
Then for nodes/vertices you'd want to define a number of properties:
neighbours /* OK, OK, neighbors. Purely in order to save keystroke
time for programmers. Returns a RandomAccessRange to
the collection of nodes that the given node links to.
For directed graphs you might want to add a template
qualifier of "out" (default) or "in" to allow getting
neighbours that this node points to, or neighbours that
point to this node. Should be O(1) [but maybe needs to
be O(d)?] */
edges /* Returns RandomAccessRange of edges that connect to this
node. For a directed graph you might want again a
template qualifier of "out" (default), "in" or "all".
Should be O(1) [but maybe needs to be O(d)?] */
Again, these should be ideally be O(1) to return but are obviously O(d) to
traverse where d is the degree of the node. Note that degree (and in/out degree
for directed graphs) can be implicitly calculated from the above, but we could
add courtesy functions to this extent.
It should also be possible to define arbitrary collections of node properties, such as colour, type, textual labels, numerical properties etc., and to query if a node has them (preferably at compile-time).
You'd also want to define functions on node [node ID?] pairs, such as a function that returns the edge [edge ID?] linking two nodes if they _are_ linked, and null otherwise.
Similarly for edges you might want to define various properties:
head, tail /* the origin and destination nodes respectively */
isDirected /* is this a directed link? */
isLoop /* does this edge connect a node to itself? */
isWeighted /* does this edge have an associated weight? */
weight /* ... if it does ... */
And again, you'd want to be able to define arbitrary collections of edge properties besides "weight", and be able to query (preferably at compile-time) if an edge has them.
OK, those are some VERY provisional thoughts (on a very tentative proposal:-) But I'd be happy to receive feedback, particularly on some of the choices (e.g. should we consider it essential for the ranges in all circumstances to be RandomAccessRanges or will it help to make some of them simpler? Should the ranges be of actual node/edge entities or node/edge IDs? etc.)
|
May 08, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | > Then for nodes/vertices you'd want to define a number of properties:
>
> neighbours /* OK, OK, neighbors. Purely in order to save keystroke
> time for programmers. Returns a RandomAccessRange to
> the collection of nodes that the given node links to.
> For directed graphs you might want to add a template
> qualifier of "out" (default) or "in" to allow getting
> neighbours that this node points to, or neighbours that
> point to this node. Should be O(1) [but maybe needs to
> be O(d)?] */
>
> edges /* Returns RandomAccessRange of edges that connect to this
> node. For a directed graph you might want again a
> template qualifier of "out" (default), "in" or "all".
> Should be O(1) [but maybe needs to be O(d)?] */
>
According to The Queen, it is neighbours :o) I am Canadian, so
which spelling I use depends on how I am feeling on a particular
day.
I wonder if it is a good idea to try and guarantee specific times
for operations like these in all case, or rather support multiple
back-end representations that make specific operations more or
less efficient. Then document the trade offs. Like linked-list
vs. vector for storing a list.
I don't really know what the state-of-the-art is for graph
representations, but isn't in normally either an adjacency list
or matrix used to represent the graph. Adjacency lists allow the
O(d) neighbours() access you want, while the matrix
representation is O(n). But if you want an isNeighbour(A,B)
function, then the matrix rep. can do that in O(1), but only O(d)
for your adjacency list.
Perhaps there is some hybrid representation out there that is
pretty good at everything.
Craig
|
May 08, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Craig Dillabaugh | On 05/08/2013 05:48 PM, Craig Dillabaugh wrote: > I wonder if it is a good idea to try and guarantee specific times for operations like these in all case, or rather support multiple back-end representations that make specific operations more or less efficient. Then document the trade offs. Like linked-list vs. vector for storing a list. Yes, in practice you'll probably want different backends which offer different performance priorities. > I don't really know what the state-of-the-art is for graph > representations, but isn't in normally either an adjacency list > or matrix used to represent the graph. Adjacency lists allow the > O(d) neighbours() access you want, while the matrix > representation is O(n). But if you want an isNeighbour(A,B) > function, then the matrix rep. can do that in O(1), but only O(d) > for your adjacency list. Indeed, and there is also a tradeoff with respect to storage. For one problem I worked on a few years back (it's the one described in my "dregs" library on GitHub), I found the best representation was just an array of link pairs. It would have been very nasty from the point of view of a neighbours() function, but I didn't need that functionality. We'll probably also need to look into effective sparse matrix representations as the adjacency matrix is usually extremely sparse for most actual graphs. > Perhaps there is some hybrid representation out there that is pretty good at everything. Well, when I said O(1) for some of those cases, I was thinking of something that would return a lazily-evaluated range. So, you do the calculation only when you iterate across what has been returned, not when you return the value. By contrast igraph_neighbors() is given as O(d), but I'm fairly sure that's less to do with the underlying data structure than that (AFAICS) they actually construct an array of length d and write values into it. What you'd like to see in our case is that neighbors() would return a RandomAccessRange which is O(1) to create, and for which opIndex(), popFront() and popBack() are all O(1) to call. |
May 08, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
On Wed, May 08, 2013 at 02:34:32PM +0200, Joseph Rushton Wakeling wrote: > On 05/06/2013 07:09 PM, H. S. Teoh wrote: > > So what I envision as far as graph libraries in D are concerned is something like the generic range API, where there is a fixed convention as to what constitutes a graph/digraph/etc., and the library itself is a template library that is able to handle any data type that conforms to the requirements. > > So, let's have a bit of a think about what you really, really want in a graph interface. > > Something along the lines of the following: > > nodes /* @property that returns a RandomAccessRange to the collection > of nodes. You could also call this "vertices" or > "vertexes"; "nodes" is shorter. Should be O(1). */ > > edges /* Returns a RandomAccessRange to the collection of edges, i.e. > a list of node pairs. Should be O(1). */ > > Note that these properties should be O(1) to return, but obviously > traversing them is O(N) [O(V)] or O(E) respectively, where N [V] is > the number of nodes [vertices] and E is the number of edges. Hmm. Is it necessary to provide a random access ranges? For some graph algorithms that's necessary, but for others, it may not be. > Note also that both the above automatically define the count of nodes/edges, but we could of course add courtesy functions to give them directly. Something along the lines of std.range's .hasLength, I guess? Makes sense. > Finally, a reasonable question is whether they should return ranges of actual nodes or edges, or ranges of node/edge IDs. I think the distinction is mostly superfluous. One could always wrap IDs in a struct: struct NodeID { int id; MyGraph *g; auto neighbors() { return g.nodesImpl[id].neighbors(); } ... // other wrappers around node methods } So I'd say just return nodes and edges. They can be actual nodes and edges, or IDs equipped with requisite methods, or references to node/edge objects (e.g. class instances -- references are essentially a kind of ID -- just happens to be unique across all objects and directly mappable to memory addresses), etc.. Another thought I have is whether we should require a method that enumerates all edges. The number of edges in a graph can grow very quickly w.r.t. the number of nodes, and some graphs may be so complex that the edges can't all be stored in memory at once, but only computed per-node. In such cases, requiring edge enumeration may be counterproductive. I'm inclined towards leaving out edge enumeration unless some specific algorithms require it (even then, we could just make it a requirement for those particular algorithms). I admit my knowledge of graph algorithms is rather limited, but AFAIK none of them require edge enumeration on the entire graph. > You'd also need to define various characteristics such as isDirected, isMultiGraph, isMixedGraph, isSimpleGraph, isWeighted, etc. Ideally most of these would be compile-time constraints but might also be implemented as runtime queries (e.g. isBipartiteGraph could be a compile-time constraint but equally one could at runtime query a graph without this constraint and prove that it could indeed be divided into two disjoint sets which satisfy the conditions of bipartitivity [bipartiteness?]). Does it make sense to have a common interface between, say, directed and non-directed graphs? That is, are there algorithms that can operate on both, or do most of the interesting algorithms only work on one or the other? I'm just wondering if it's worth the trouble of doing compile-time (or runtime) checks if we could just have two unrelated graph types. Also, I'm not sure about the usefulness of isBipartiteGraph, unless it entails some specific graph method (say a method that returns nodes grouped by which partition it lies in) that takes advantage of this fact. OTOH, maybe it does serve the purpose of letting the user promise that the input graph will be bipartite, so if it isn't and the algorithm blows up, it's not our fault. :-P > Then for nodes/vertices you'd want to define a number of properties: > > neighbours /* OK, OK, neighbors. Purely in order to save keystroke > time for programmers. Returns a RandomAccessRange to > the collection of nodes that the given node links to. > For directed graphs you might want to add a template > qualifier of "out" (default) or "in" to allow getting > neighbours that this node points to, or neighbours that > point to this node. Should be O(1) [but maybe needs to > be O(d)?] */ Personally, I'd shorten it to ngbrs, but most people here would disagree. :-P > edges /* Returns RandomAccessRange of edges that connect to this > node. For a directed graph you might want again a > template qualifier of "out" (default), "in" or "all". > Should be O(1) [but maybe needs to be O(d)?] */ I'm not so sure about providing both in and out neighbours / edges. In some cases this may be very inefficient. I think it may be best simply to adopt the convention that a directed graph will always return out neighbours, and undirected graphs will always return all neighbours. We can always provide an optional method for inverting a graph where this is useful / necessary. > Again, these should be ideally be O(1) to return but are obviously > O(d) to traverse where d is the degree of the node. Note that degree > (and in/out degree for directed graphs) can be implicitly calculated > from the above, but we could add courtesy functions to this extent. > > It should also be possible to define arbitrary collections of node properties, such as colour, type, textual labels, numerical properties etc., and to query if a node has them (preferably at compile-time). Hmm. I'm not sure about requiring such attributes. I'd go for a more generic approach, such as if an algorithm requires some attribute (say weight), then it could take a compile-time parameter that maps .weight to whatever actual attribute is in the graph node / edge. For example: auto findShortestPath(alias weight,G)(G graph, G.node start, G.node end) { ... } struct Graph { struct Node { ... } alias node = Node; struct Edge { double length; @property double euclideanDistance(); ... } } Graph g; Graph.Node x, y; auto pathByLength = findShortestPath!"length"(g, x, y); auto pathByDist = findShortestPath!"euclideanDistance"(g, x, y); This allows the same graph to be evaluated multiple ways, as this example shows, without needing to create wrappers or proxy objects. I'm tentatively considering defining the node type within the graph itself (G.node above), as a way of reducing the number of compile-time parameters, but this may not be necessary, and may be limiting in some cases. Have to think about it a bit more. > You'd also want to define functions on node [node ID?] pairs, such as a function that returns the edge [edge ID?] linking two nodes if they _are_ linked, and null otherwise. I would evaluate the graph algorithms we want to implement before doing something like this. I think it would be best to try to keep to a minimal API as much as is possible. Phobos ranges, for example, while very powerful, also suffer a bit from too many possible parameters, leading to a lot of complicated interactions that increase code complexity (hence Jonathan's push for simplifying the range API as much as possible). > Similarly for edges you might want to define various properties: > > head, tail /* the origin and destination nodes respectively */ > > isDirected /* is this a directed link? */ > > isLoop /* does this edge connect a node to itself? */ > > isWeighted /* does this edge have an associated weight? */ > > weight /* ... if it does ... */ > > And again, you'd want to be able to define arbitrary collections of edge properties besides "weight", and be able to query (preferably at compile-time) if an edge has them. I think it's better if each algorithm has a specific set of properties they expect to be present, and use compile-time parameters to map them onto concrete graph properties. Also, I'm not sure about isWeighted... I would expect most graph algorithms to either not care about weights (in which case the entire graph can be weightless) or require weights on all edges (in which case isWeighted is redundant). Or am I ignorant of algorithms that process weights that only exist in some edges but not others? > OK, those are some VERY provisional thoughts (on a very tentative proposal:-) But I'd be happy to receive feedback, particularly on some of the choices (e.g. should we consider it essential for the ranges in all circumstances to be RandomAccessRanges or will it help to make some of them simpler? Should the ranges be of actual node/edge entities or node/edge IDs? etc.) I think it would help discussion if we have a list (even if tentative) of algorithms we'd like to implement, along with what each algorithm expects from the graph (weights, colors, directedness, etc.). Then we can evaluate which properties/functions/etc. are really necessary, and which are just icing on the cake that can be omitted with no real drawbacks. I think the more minimal the API the better -- it makes the resulting module easier to understand and use, and makes it more likely we'll actually finish implementing it. :-P Anyway, some further thoughts: we could have graph adaptors in the same vein as the range constructors in std.range, that allows one to wrap a graph in another form. E.g., transposeGraph(G) that returns a graph with the sense of edges reversed, etc. T -- Computers shouldn't beep through the keyhole. |
May 08, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
On 05/08/2013 06:12 PM, H. S. Teoh wrote: > Hmm. Is it necessary to provide a random access ranges? For some graph algorithms that's necessary, but for others, it may not be. Agree that it's debatable. It's probably desirable but not necessary. >> Note also that both the above automatically define the count of nodes/edges, but we could of course add courtesy functions to give them directly. > > Something along the lines of std.range's .hasLength, I guess? Makes sense. If nodes() and edges() return a RandomAccessRanges then they will automatically have the .length property. So you could add a courtesy function like nodecount() or edgecount() that in this case would just wrap nodes.length but for other cases might calculate the node or edge count in a different way. >> Finally, a reasonable question is whether they should return ranges of actual nodes or edges, or ranges of node/edge IDs. > > I think the distinction is mostly superfluous. One could always wrap IDs in a struct Well, I was considering whether making it a range of IDs might be more flexible with respect to underlying data structures. I may be falling into "premature optimization" here. :-) > Another thought I have is whether we should require a method that enumerates all edges. The number of edges in a graph can grow very quickly w.r.t. the number of nodes, and some graphs may be so complex that the edges can't all be stored in memory at once, but only computed per-node. In such cases, requiring edge enumeration may be counterproductive. I'm inclined towards leaving out edge enumeration unless some specific algorithms require it (even then, we could just make it a requirement for those particular algorithms). I admit my knowledge of graph algorithms is rather limited, but AFAIK none of them require edge enumeration on the entire graph. Well, if we accept that we're going to be offering the user a range that lists all edges, the underlying code might handle that in many different ways -- just reading from an array, cacheing input from a database, or whatever else is appropriate. That shouldn't make it difficult to get the overall number of edges -- that's always going to be a stored number of one kind or another, whether it's an array length or a record of the number of fields in a database, or whatever. > Does it make sense to have a common interface between, say, directed and non-directed graphs? That is, are there algorithms that can operate on both, or do most of the interesting algorithms only work on one or the other? I'm just wondering if it's worth the trouble of doing compile-time (or runtime) checks if we could just have two unrelated graph types. Reasonable question. As far as I can tell most graph libraries offer a broadly common interface even though the underlying data structure may vary quite a bit. The practical implementation of those checks might well be simply reading an immutable boolean variable in the graph type. > Also, I'm not sure about the usefulness of isBipartiteGraph, unless it entails some specific graph method (say a method that returns nodes grouped by which partition it lies in) that takes advantage of this fact. OTOH, maybe it does serve the purpose of letting the user promise that the input graph will be bipartite, so if it isn't and the algorithm blows up, it's not our fault. :-P Yes, that would be the purpose. The bipartite graph might also be just a different data type. > Personally, I'd shorten it to ngbrs, but most people here would disagree. :-P I don't like those kinds of abbreviations in general, I like English. :-P > I'm not so sure about providing both in and out neighbours / edges. In some cases this may be very inefficient. I think it may be best simply to adopt the convention that a directed graph will always return out neighbours, and undirected graphs will always return all neighbours. We can always provide an optional method for inverting a graph where this is useful / necessary. Maybe, but in some cases you want to be able to quickly get the in-degree or whatever. Inverting the graph is going to be an expensive method that will require a lot of calculation (and maybe also a lot more memory, if you wind up storing both versions). I'd much rather have the option to get in-links and say, "Hey, in some cases this may be really expensive" than to have no option at all by design. I'm getting dragged out of the office so will respond to your other points later ... :-P |
May 10, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
On 05/08/2013 06:12 PM, H. S. Teoh wrote: > On Wed, May 08, 2013 at 02:34:32PM +0200, Joseph Rushton Wakeling wrote: >> Finally, a reasonable question is whether they should return ranges of actual nodes or edges, or ranges of node/edge IDs. > > I think the distinction is mostly superfluous. One could always wrap IDs in a struct: I'm being daft. Of _course_ the ranges should be of nodes or edges, because you want to be able to do things like foreach(n; g.nodes) { // g is the graph ... :-) foreach(m; n.neighbours) { ... } } >> It should also be possible to define arbitrary collections of node properties, such as colour, type, textual labels, numerical properties etc., and to query if a node has them (preferably at compile-time). > > Hmm. I'm not sure about requiring such attributes. I'd go for a more generic approach, such as if an algorithm requires some attribute (say weight), then it could take a compile-time parameter that maps .weight to whatever actual attribute is in the graph node / edge. For example: > > auto findShortestPath(alias weight,G)(G graph, G.node start, > G.node end) { ... } > > struct Graph { > struct Node { ... } > alias node = Node; > struct Edge { > double length; > @property double euclideanDistance(); > ... > } > } > > Graph g; > Graph.Node x, y; > auto pathByLength = findShortestPath!"length"(g, x, y); > auto pathByDist = findShortestPath!"euclideanDistance"(g, x, y); > > This allows the same graph to be evaluated multiple ways, as this example shows, without needing to create wrappers or proxy objects. I don't think what I was proposing was at odds with what you describe here. For sure it's helpful to be able to map attributes as you describe here. What I mean is simply it's helpful to be able to define arbitrary properties (and property names) that will be associated with edges, not that edges should be _obliged_ to have a certain set of associated parameters. > I'm tentatively considering defining the node type within the graph itself (G.node above), as a way of reducing the number of compile-time parameters, but this may not be necessary, and may be limiting in some cases. Have to think about it a bit more. Can you expand on this? > I think it's better if each algorithm has a specific set of properties they expect to be present, and use compile-time parameters to map them onto concrete graph properties. Again, I don't think this is in contradiction with what I'm suggesting -- of course individual algorithms should expect and check for certain parameters, and compile-time parameters can be used to map them to the algorithm. > Also, I'm not sure about isWeighted... I would expect most graph algorithms to either not care about weights (in which case the entire graph can be weightless) or require weights on all edges (in which case isWeighted is redundant). Or am I ignorant of algorithms that process weights that only exist in some edges but not others? Consider path length -- e.g. a path v1 -> v2 -> v3 has length 2 for an unweighted graph, length w_{1, 2} + w_{2, 3} for a weighted graph. That in turn has implications for shortest-path-length calculations, and the optimal algorithm to calculate shortest paths thus depends on whether you're dealing with the weighted or unweighted case. All of this in turn impacts on any metrics or algorithms that depend on distances between nodes. For example, today I'm going to be implementing [*] an algorithm to calculate betweenness centrality: http://www.cs.ucc.ie/~rb4/resources/Brandes.pdf This does need to be modified, albeit trivially, for the weighted case. [* Inside some custom-written code for a very particular application, I'm afraid. This is why I want to create a general-purpose graph library, to stop me having to do things like this :-P ] > I think it would help discussion if we have a list (even if tentative) of algorithms we'd like to implement, along with what each algorithm expects from the graph (weights, colors, directedness, etc.). Then we can evaluate which properties/functions/etc. are really necessary, and which are just icing on the cake that can be omitted with no real drawbacks. I think the more minimal the API the better -- it makes the resulting module easier to understand and use, and makes it more likely we'll actually finish implementing it. :-P Agree about minimal API, but I think what I've described previously is pretty minimal ... :-) As for algorithms, igraph has a very substantial list: http://igraph.sourceforge.net/doc/html/igraph-Structural.html We could probably "scratch our own itch" to a degree (ha!) in terms of choosing what to implement, but stuff like shortest path calculation, centrality measures and so on seem like good starting points. > Anyway, some further thoughts: we could have graph adaptors in the same vein as the range constructors in std.range, that allows one to wrap a graph in another form. E.g., transposeGraph(G) that returns a graph with the sense of edges reversed, etc. Yes, that would be nice. :-) |
May 15, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On Wednesday, 8 May 2013 at 16:14:05 UTC, H. S. Teoh wrote: > Does it make sense to have a common interface between, say, directed and > non-directed graphs? That is, are there algorithms that can operate on > both, or do most of the interesting algorithms only work on one or the > other? I'm just wondering if it's worth the trouble of doing > compile-time (or runtime) checks if we could just have two unrelated > graph types. I suggest building the graphs as structs with policy based design, possibly having separate policies for how the graphs are stored in memory, possibly not. So the basic graph structure would be like this. // Struct so stack allocation becomes possible later. struct GraphImplementation(GraphPolicy) if (isGraphPolicy!GraphPolicy){ GraphPolicy graphPolicy = GraphPolicy(); } The graph policies could look like this. struct DigraphPolicy(Node, Weight) { // This could be the first naive policy to get the ball rolling. // unit test to verify a few basic algorithms first, optimise later. Weight[Node][Node] edgeMap; ... } The policies can then offer methods for accessing and mutating the graphs. So that would be stuff like bool addNode(Node), bool removeNode(Node), bool addEdge(Node, Node), NodeRange!Node neighbors(Node), etc. All of the internals of GraphImplementation could then access the very basic things through these methods, and then expose a few to the users by wrapping/aliasing the functions. Perhaps encapsluation + alias this is the way to go here. { ... alias graphPolicy this; // Now we expose addNode(Node), etc. } Then for your final trick, you could throw in a few templates to bring it all together. template isGraphPolicy(T) { ... } template isSomeGraph(T) { ... } template NodeType(SomeGraph) { ... } template EdgeType(SomeGraph) { ... } Most importantly... template Digraph(Node, Weight = int) { auto Digraph = GraphImplementation!(DigraphPolicy!(Node, Weight)); } Then we are back to basics: auto graph = Digraph!char; graph.addEdge('a', 'b'); // bla, bla, bla I think doing things in this way makes it easy to get the ball rolling, but also easy to add in optimisations later. ("Aha, this is an undirected graph, I can assume this... static if") I think the most important thing is to get the graph basics right, then to worry about how storage works, etc. later. |
Copyright © 1999-2021 by the D Language Foundation