Thread overview
Re: D graph library
May 11, 2013
H. S. Teoh
May 22, 2013
H. S. Teoh
May 11, 2013
On 05/10/2013 07:52 PM, H. S. Teoh wrote:
> I suppose we can make it an optional thing, such that algorithm A may ask for only an InputRange (and a graph with bidirectional range of nodes will still work) but algorithm B may ask for more, say a ForwardRange or RandomAccessRange. Each algorithm will require the minimum necessary to perform efficiently (though of course it can be special-cased to take advantage of additional features of a higher range, if that is available in a given instantiation).

Yes, algorithms should certainly ask for no more than they need.

> In fact, now that I think of it... most of the features you listed can arguably be optional: only a few algorithms (none that I know of, actually) require enumeration of edges, some algorithms may only require an input range of nodes, each of which gives an input range of edges, etc.. Would it make sense to define a set of graph properties (e.g., via templates ala std.range.is{Input,Forward,Bidirectional,...}Range, or hasLength, hasSwappableElements, etc.), and have each algorithm require whatever minimal subset of them is necessary to do the work? Because it makes little sense to require, say, a graph that returns a total count of nodes, if a given algorithm never needs to use that information anyway. This way, a structure for which the total number of nodes is expensive to compute can still be used with that algorithm.

That's a fair point.  I don't know that I like the idea of defining all those different kinds of checks for different graph attributes, though.

> Not if the graph is computed on-the-fly. E.g. chess analysis applications in which the complete graph is infeasible to compute.

Well, it ought still to be possible to define a range -- albeit, not a random-access one -- that covers all the edges in that graph.  It's just that it'll take forever to finish.

Agree though that this is a case where an edges() property is probably not
desirable.

> In that case, I wonder if it's even necessary to unify the two graph types. Why not just have two separate types and have the type system sort out compatibility with algorithms for us?

That might be a possible solution, but whether directed or undirected (or mixed), a graph will broadly have the same general public interface.

> Are there any algorithms that need to do something different depending on whether the input graph is bipartite or not? Just wondering if it's worth the trouble of introducing an additional field (it may well be necessary -- I admit my knowledge of graph algorithms is rather spotty).

Put it this way: sometimes, you might want to check if a graph _is_ bipartite, that is, if its nodes can be divided into two disjoint sets A and B such that links in the graph are always between nodes in the different sets, never between nodes from the same set.  That could be a runtime test that you could apply to any graph.

Then again, you might deliberately want to construct an explicitly bipartite graph, and you might want to have checks and balances in place to make sure that you don't accidentally add nodes between nodes in the same set.

> Right. Which brings me back to my idea that perhaps these graph properties should be optional, or perhaps we should have some kind of hierarchy of graph types (ala input range, forward range, bidirectional range, etc.), so that individual algorithms can choose which features are mandatory and which are optional.
> 
> (I really dislike algorithms that are artificially restricted just because the author decided that feature X is required, no matter if the algorithm doesn't even use it.)

Fair point.  _Algorithms_ should require the minimal number of constraints. Most of my suggestions for range types etc. represent what I think is useful for a general-purpose graph data type rather than what is absolutely necessary for individual specialized applications.

Must dash again, so will catch up on your other points later ... ! :-)
May 11, 2013
On Sat, May 11, 2013 at 03:18:16PM +0200, Joseph Rushton Wakeling wrote:
> On 05/10/2013 07:52 PM, H. S. Teoh wrote:
[...]
> > In fact, now that I think of it... most of the features you listed can arguably be optional: only a few algorithms (none that I know of, actually) require enumeration of edges, some algorithms may only require an input range of nodes, each of which gives an input range of edges, etc.. Would it make sense to define a set of graph properties (e.g., via templates ala std.range.is{Input,Forward,Bidirectional,...}Range, or hasLength, hasSwappableElements, etc.), and have each algorithm require whatever minimal subset of them is necessary to do the work? Because it makes little sense to require, say, a graph that returns a total count of nodes, if a given algorithm never needs to use that information anyway. This way, a structure for which the total number of nodes is expensive to compute can still be used with that algorithm.
> 
> That's a fair point.  I don't know that I like the idea of defining all those different kinds of checks for different graph attributes, though.

Well, maybe what we can do is to make a list of attributes that are needed by various graph algorithms, and see if there's a way to distill them into a small set of attributes without too many scattered attribute checks. For example, if we can categorize graphs into a small set of types (ala input range, forward range, etc.) plus a small number of optoinal attributes, that would be ideal, I think.


> > Not if the graph is computed on-the-fly. E.g. chess analysis applications in which the complete graph is infeasible to compute.
> 
> Well, it ought still to be possible to define a range -- albeit, not a random-access one -- that covers all the edges in that graph.  It's just that it'll take forever to finish.
> 
> Agree though that this is a case where an edges() property is probably
> not desirable.

Are there actually algorithms that depend on having an .edges property? AFAIK, most iterate over nodes in some way. I can see *nodes* having an .edges property, but I'm not sure about the utility of enumerating edges over the entire graph. Or am I missing something obvious?


> > In that case, I wonder if it's even necessary to unify the two graph types. Why not just have two separate types and have the type system sort out compatibility with algorithms for us?
> 
> That might be a possible solution, but whether directed or undirected (or mixed), a graph will broadly have the same general public interface.

True.


> > Are there any algorithms that need to do something different depending on whether the input graph is bipartite or not? Just wondering if it's worth the trouble of introducing an additional field (it may well be necessary -- I admit my knowledge of graph algorithms is rather spotty).
> 
> Put it this way: sometimes, you might want to check if a graph _is_ bipartite, that is, if its nodes can be divided into two disjoint sets A and B such that links in the graph are always between nodes in the different sets, never between nodes from the same set.  That could be a runtime test that you could apply to any graph.
> 
> Then again, you might deliberately want to construct an explicitly bipartite graph, and you might want to have checks and balances in place to make sure that you don't accidentally add nodes between nodes in the same set.

Hmm. Maybe something similar to std.range.SortedRange would be useful:

	struct BipartiteGraph(G) {
		enum isBipartite = true;
		G impl;
		alias impl this;
	}

	BipartiteGraph!G verifyIsBipartite(G)(G graph) {
		// verifies bipartiteness of G and returns graph wrapped
		// in a BipartiteGraph!G.
	}

	auto createBipartiteGraph(T...)(T graphData) {
		// return an instantiation of BipartiteGraph that's
		// guaranteed to be bipartite by construction.
	}

	auto someBipartiteAlgo(G)(BipartiteGraph!G graph) {
		// freely assume bipartiteness of graph
	}

This way we can make use of the type system to statically ensure bipartiteness, while still allowing runtime verifications of arbitrary graphs.


> > Right. Which brings me back to my idea that perhaps these graph properties should be optional, or perhaps we should have some kind of hierarchy of graph types (ala input range, forward range, bidirectional range, etc.), so that individual algorithms can choose which features are mandatory and which are optional.
> > 
> > (I really dislike algorithms that are artificially restricted just because the author decided that feature X is required, no matter if the algorithm doesn't even use it.)
> 
> Fair point.  _Algorithms_ should require the minimal number of constraints.  Most of my suggestions for range types etc. represent what I think is useful for a general-purpose graph data type rather than what is absolutely necessary for individual specialized applications.

Aha. I think we found the source of contention here. You're thinking about concrete graph types that can be used for general graph applications, whereas I'm thinking of Phobos-style generic graph algorithm *templates* that can process any graph-like type satisfying certain requirements.

In that light, I think it makes sense to provide a set of concrete graph types that people can use (e.g., an adjacency list type, an incidence list type, matrix representations maybe, to list a few commonly-used graph representations), but the algorithms themselves will be generic templates that can work with any concrete type as long as it has the attributes required by the algorithm. The concrete graph types (which may themselves be templates, e.g. to allow arbitary node types or edge labels) will of course satisfy (most of) these requirements by default, so they can be immediately used with the algorithms without further ado. At the same time, the user can decide to create their own graph types, and as long as the target algorithm's prerequisites are satisfied, it can be used with those custom types.


> Must dash again, so will catch up on your other points later ... ! :-)

Actually, I'll be on vacation for a week and a half, with probably spotty internet connectivity, so no need to hurry. :)


--T
May 11, 2013
On 05/11/2013 07:08 PM, H. S. Teoh wrote:
> Well, maybe what we can do is to make a list of attributes that are needed by various graph algorithms, and see if there's a way to distill them into a small set of attributes without too many scattered attribute checks. For example, if we can categorize graphs into a small set of types (ala input range, forward range, etc.) plus a small number of optoinal attributes, that would be ideal, I think.

Sure, we can have a go at this.

> Are there actually algorithms that depend on having an .edges property? AFAIK, most iterate over nodes in some way.

I have written code that iterates over edges (see my "dregs" code on GitHub). In the case of the problem I was addressing, it enabled the most space-efficient storage of the data.

There's no reason why an algorithm should be required to assume the existence of
an edges() property, but for an explicit graph type I think having an edges()
property is generally expected (and is certainly implemented in all the examples
I've looked at).

> Hmm. Maybe something similar to std.range.SortedRange would be useful:
> 
> 	struct BipartiteGraph(G) {
> 		enum isBipartite = true;
> 		G impl;
> 		alias impl this;
> 	}
> 
> 	BipartiteGraph!G verifyIsBipartite(G)(G graph) {
> 		// verifies bipartiteness of G and returns graph wrapped
> 		// in a BipartiteGraph!G.
> 	}
> 
> 	auto createBipartiteGraph(T...)(T graphData) {
> 		// return an instantiation of BipartiteGraph that's
> 		// guaranteed to be bipartite by construction.
> 	}
> 
> 	auto someBipartiteAlgo(G)(BipartiteGraph!G graph) {
> 		// freely assume bipartiteness of graph
> 	}
> 
> This way we can make use of the type system to statically ensure bipartiteness, while still allowing runtime verifications of arbitrary graphs.

Yes, that could be nice.  Bear in mind that the BipartiteGraph type would have to not just have a boolean claiming bipartiteness, but checks to ensure that edges added did not violate the bipartite property.

> Aha. I think we found the source of contention here. You're thinking about concrete graph types that can be used for general graph applications, whereas I'm thinking of Phobos-style generic graph algorithm *templates* that can process any graph-like type satisfying certain requirements.

Yes, that seems to be the main source of our differences of opinion.  I entirely agree with you about the need for generic graph algorithm templates, but we do need graph types as well, and most of my specifications were intended with those in mind.

> In that light, I think it makes sense to provide a set of concrete graph types that people can use (e.g., an adjacency list type, an incidence list type, matrix representations maybe, to list a few commonly-used graph representations), but the algorithms themselves will be generic templates that can work with any concrete type as long as it has the attributes required by the algorithm. The concrete graph types (which may themselves be templates, e.g. to allow arbitary node types or edge labels) will of course satisfy (most of) these requirements by default, so they can be immediately used with the algorithms without further ado. At the same time, the user can decide to create their own graph types, and as long as the target algorithm's prerequisites are satisfied, it can be used with those custom types.

I think we've come to some broad agreement here ... :-)

> Actually, I'll be on vacation for a week and a half, with probably spotty internet connectivity, so no need to hurry. :)

So, have fun, and look forward to catching up with you when you get back :-)

May 22, 2013
Hi, I'm back! :)


On Sat, May 11, 2013 at 07:34:50PM +0200, Joseph Rushton Wakeling wrote:
> On 05/11/2013 07:08 PM, H. S. Teoh wrote:
[...]
> > Aha. I think we found the source of contention here. You're thinking about concrete graph types that can be used for general graph applications, whereas I'm thinking of Phobos-style generic graph algorithm *templates* that can process any graph-like type satisfying certain requirements.
> 
> Yes, that seems to be the main source of our differences of opinion. I entirely agree with you about the need for generic graph algorithm templates, but we do need graph types as well, and most of my specifications were intended with those in mind.
[...]

Agreed. I think your suggestions as described in previous posts are good for concrete graph data types provided by the library.

The algorithms themselves, of course, will obviously be geared toward these data types, but where possible, they should be made generic so that other data types satisfying the required APIs will be usable with them. So I think writing them as templates parametrized on input graph type (with signature constraints to ensure the type is usable with the algorithm) would be the way to go.

And since the algorithms themselves will be templates, if the library exports, say, BipartiteGraph, then that can be a dedicated bipartite graph type that doesn't need to have any relation (though it could if it improves the implementation) to more general graph types. Some optimizations specific to bipartite graphs can be used in the implementation, for example.


T

-- 
Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder
May 23, 2013
On 05/23/2013 01:55 AM, H. S. Teoh wrote:
> The algorithms themselves, of course, will obviously be geared toward these data types, but where possible, they should be made generic so that other data types satisfying the required APIs will be usable with them. So I think writing them as templates parametrized on input graph type (with signature constraints to ensure the type is usable with the algorithm) would be the way to go.

Another possibility is to have the algorithm extract the preferred data structure from the underlying type, and we could probably do very good things with compile-time constraints here.

For example, as I mentioned in another email, the igraph data type is essentially just a list of links.  Their implementation of betweenness centrality constructs an adjacency list internally in order to do the calculation.  My guess is that there are some tradeoffs to do with scale that make this worthwhile for them.

What _we_ can do, though, is have a compile time check for whether a particular graph data type has an adjacency-list interface or not -- if so, it can be used; if not, an adjacency list can be constructed; and we could also construct "wrapping" data structures that layer on additional interfaces (e.g. the basic graph type might just be a list of links, we could wrap an adjacency list around that, and so on).

> And since the algorithms themselves will be templates, if the library exports, say, BipartiteGraph, then that can be a dedicated bipartite graph type that doesn't need to have any relation (though it could if it improves the implementation) to more general graph types. Some optimizations specific to bipartite graphs can be used in the implementation, for example.

Indeed, and alternatively or additionally we could implement a bipartite graph type as a wrapper around the basic graph model, which might first check that any existing graph is indeed bipartite and then give a bipartite-friendly external API.