May 15, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to w0rp | On 05/15/2013 09:29 AM, w0rp wrote: > I suggest building the graphs as structs with policy based design, possibly having separate policies for how the graphs are stored in memory, possibly not. So the basic graph structure would be like this. What advantage does this have over just defining a few different structs, e.g., struct Graph { // undirected graph } struct Digraph { // directed graph } ... which each define certain expected interfaces such as nodes(), edges(), etc.? Not objecting, just curious. :-) > The policies can then offer methods for accessing and mutating the graphs. So that would be stuff like bool addNode(Node), bool removeNode(Node), bool addEdge(Node, Node), NodeRange!Node neighbors(Node), etc. All of the internals of GraphImplementation could then access the very basic things through these methods, and then expose a few to the users by wrapping/aliasing the functions. Perhaps encapsluation + alias this is the way to go here. ... why returning bool? Again, not objecting, just curious. > I think doing things in this way makes it easy to get the ball rolling, but also easy to add in optimisations later. ("Aha, this is an undirected graph, I can assume this... static if") I think the most important thing is to get the graph basics right, then to worry about how storage works, etc. later. I kind of agree, my inclination is that we need to be able to work out what are the interfaces we need in order for different algorithms to work, the underlying data structures can be refined later. Just as an example, take a look at this piece of code in igraph, which implements Brandes' algorithm for betweenness centrality: https://bazaar.launchpad.net/~igraph/igraph/0.7-main/view/head:/src/centrality.c#L1714 (It's OK, the algorithm is public and any D implementation will look nothing like this, so you're not really killing your clean-room opportunities by glancing through this:-) It's a bit horrible to read, with lots of the inevitable C boilerplate, lots of macros, lots of custom data-structures etc., but in terms of the graph, what it really boils down to is just two things, line 1720: https://bazaar.launchpad.net/~igraph/igraph/0.7-main/view/head:/src/centrality.c#L1720 ... which calls a function returning the number of vertices, and: https://bazaar.launchpad.net/~igraph/igraph/0.7-main/view/head:/src/centrality.c#L1833 ... where they ask for the neighbours of a given node. Technically it's a little bit more complex than that (e.g. this function allocates and creates from scratch an adjacency-list representation of the whole graph, which might conceivably be expensive both memory-wise and time-wise) but you get the general picture -- there's actually a relatively small number of things you need to be able to get from a graph in order for different algorithms to work effectively. Their _actual_ underlying graph data structure (OK, OK, the room is starting to look dirty...) is incredibly simple -- "from" and "to" arrays of vertex IDs, which define the start- and endpoints of edges; and a few extra things built around that to facilitate calculating other graph quantities. And their actual definition of "directed" is literally a bool. If you think about that in D terms you might start with arrays from and to, and e.g. edges() would return lockstep(from, to). I suspect we could also do many nice things with D based around those fundamental structures. |
May 15, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
On 05/15/2013 07:21 PM, Joseph Rushton Wakeling wrote:
> If you think about that in D terms you might start with arrays from and to, and
> e.g. edges() would return lockstep(from, to).
... actually, probably better to use zip(from, to). Better still if there's
some way to combine this with labels like "from" and "to" (or "head" and "tail")
so that the user can do something like,
foreach(e; g.edge)
writeln(e.head, "\t", e.tail);
|
May 15, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On Wednesday, 15 May 2013 at 17:21:24 UTC, Joseph Rushton Wakeling wrote: > On 05/15/2013 09:29 AM, w0rp wrote: >> I suggest building the graphs as structs with policy based design, possibly >> having separate policies for how the graphs are stored in memory, possibly not. >> So the basic graph structure would be like this. > > What advantage does this have over just defining a few different structs, e.g., > > struct Graph { > // undirected graph > } > > struct Digraph { > // directed graph > } > > ... which each define certain expected interfaces such as nodes(), edges(), etc.? > > Not objecting, just curious. :-) There are two advantages. 1. Some functions for undirected and directed graphs are either identical or nearly identical. So you can write a single implementation with different policies. 2. Storage implementation can be interchangeable, as you change the storage for a graph by changing its policy. Point 1 could be equally addressed by writing free template functions for the graphs. (So two structs and free functions work work there.) Point 2 is more interesting because a different method of storage (stack vs heap, nested map vs some array combination) implies a different addNode(Node) function, and so on. So the policies let you swap the underlying implementation out via a template parameter without having to write another struct to deal with it. >> The policies can then offer methods for accessing and mutating the graphs. So >> that would be stuff like bool addNode(Node), bool removeNode(Node), bool >> addEdge(Node, Node), NodeRange!Node neighbors(Node), etc. All of the internals >> of GraphImplementation could then access the very basic things through these >> methods, and then expose a few to the users by wrapping/aliasing the functions. >> Perhaps encapsluation + alias this is the way to go here. > > ... why returning bool? Again, not objecting, just curious. bool is a nice convenience for "did this change anything?" So you return false when addNode actually had no effect, say when the node was already in the graph. This idea is basically borrowed from Java collections. > If you think about that in D terms you might start with arrays from and to, and > e.g. edges() would return lockstep(from, to). I suspect we could also do many > nice things with D based around those fundamental structures. I think writing functions that return ranges for the graphs (vertices, edges, neighbours, etc.) and building the algorithms based on these ranges is the right way to go. |
May 15, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to w0rp | On 05/15/2013 07:40 PM, w0rp wrote: > There are two advantages. > > 1. Some functions for undirected and directed graphs are either identical or > nearly identical. So you can write a single implementation with different policies. > 2. Storage implementation can be interchangeable, as you change the storage for > a graph by changing its policy. > > Point 1 could be equally addressed by writing free template functions for the graphs. (So two structs and free functions work work there.) Point 2 is more interesting because a different method of storage (stack vs heap, nested map vs some array combination) implies a different addNode(Node) function, and so on. So the policies let you swap the underlying implementation out via a template parameter without having to write another struct to deal with it. Point 1 I'm not so sure matters for the reasons you cite. On the other hand point 2 seems quite relevant. For example, you might prefer to store your data as an edge list (à la igraph), as an adjacency list, or other functions, because depending on your needs the information retrieval might be quicker. Talking of adjacency lists, now I'm trying to think of a clever method for deriving an adjacency list from an edge list, using slices ... > bool is a nice convenience for "did this change anything?" So you return false when addNode actually had no effect, say when the node was already in the graph. This idea is basically borrowed from Java collections. Maybe I'm being harsh, but I was assuming you might actually throw an error if someone tries to add an edge that already exists (although perhaps you'd want to avoid throwing in order not to hit performance). > I think writing functions that return ranges for the graphs (vertices, edges, neighbours, etc.) and building the algorithms based on these ranges is the right way to go. Yep, me too. |
May 15, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On Wednesday, 15 May 2013 at 17:59:01 UTC, Joseph Rushton Wakeling wrote:
> Maybe I'm being harsh, but I was assuming you might actually throw an error if
> someone tries to add an edge that already exists (although perhaps you'd want to
> avoid throwing in order not to hit performance).
I don't think that's too useful or necessary. A graph is at its heart a pair (V, E) where V is the vertex set and E is the edge set. So addNode is really just V' = V ∪ {n} for some node n. I think it's best to just allow redundantly adding a node, which ought to be an O(1) function.
|
May 15, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to w0rp | On the subject of possibly making addNode nothrow, I think it may be difficult or impossible at current to make the Graph functions pure, which I think is rather unfortunate. At my last try in D, I backed my graphs with Weight[Node][Node] exactly as mentioned before, but this ruined the purity. This was all because the properties for associative arrays are not pure, which is one of my top change requests for D. I really would like to see a graph type that is usable inside pure functions. |
May 15, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to w0rp | On 05/15/2013 08:28 PM, w0rp wrote:
> I don't think that's too useful or necessary. A graph is at its heart a pair (V, E) where V is the vertex set and E is the edge set. So addNode is really just V' = V ∪ {n} for some node n. I think it's best to just allow redundantly adding a node, which ought to be an O(1) function.
In the same way as one could add a word to a dictionary, say?
Consider that it's possible to have multigraphs where there _can_ be multiple edges from one node to another. You might well want to explicitly enforce the user's awareness of error if they try adding multiple edges to the wrong type of graph.
|
May 15, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to w0rp | On 05/15/2013 08:34 PM, w0rp wrote:
> On the subject of possibly making addNode nothrow, I think it may be difficult or impossible at current to make the Graph functions pure, which I think is rather unfortunate. At my last try in D, I backed my graphs with Weight[Node][Node] exactly as mentioned before, but this ruined the purity. This was all because the properties for associative arrays are not pure, which is one of my top change requests for D. I really would like to see a graph type that is usable inside pure functions.
I don't think that associative arrays are the right storage class anyway. My feeling is that if you're going to have public labels for nodes, the best way to do that is to have an associative array mapping those labels to underlying node IDs (and back again), and for the labels to not be used at all internally.
|
May 15, 2013 Re: D graph library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On Wednesday, 15 May 2013 at 18:38:59 UTC, Joseph Rushton Wakeling wrote: > On 05/15/2013 08:28 PM, w0rp wrote: >> I don't think that's too useful or necessary. A graph is at its heart a pair (V, >> E) where V is the vertex set and E is the edge set. So addNode is really just V' >> = V ∪ {n} for some node n. I think it's best to just allow redundantly adding a >> node, which ought to be an O(1) function. > > In the same way as one could add a word to a dictionary, say? > > Consider that it's possible to have multigraphs where there _can_ be multiple > edges from one node to another. You might well want to explicitly enforce the > user's awareness of error if they try adding multiple edges to the wrong type of > graph. I have considered 'multigraphs,' which is kind of like having an STL multimap in my mind. Having policies again makes the implementation a little easier. I would prefer to not have addNode throw exceptions when applied redundantly. Not throwing is very familiar behaviour if you think in terms of set theory. Then you can have a separate function to support runtime checks to make sure you don't trip over yourself when really don't want to add more than once. void addNodeOnce(SomeGraph)(SomeGraph graph, NodeType!SomeGraph node) if (isSomeGraph!SomeGraph) in { assert(!graph.hasNode(node)); } body { graph.addNode(node); } out { // Let's go all the way with this. assert(graph.hasNode(node)); } Now you have graph.addNodeOnce(node). |
Copyright © 1999-2021 by the D Language Foundation