Thread overview
Kuhn-Munkres Algorithm (a.k.a. The Hungarian Algorithm)
Feb 19, 2021
Sameer Pradhan
Feb 19, 2021
Max Haughton
Feb 20, 2021
anonymous
Feb 25, 2021
Sameer Pradhan
Feb 25, 2021
anonymous
Feb 20, 2021
drug
Feb 25, 2021
Sameer Pradhan
Feb 25, 2021
bachmeier
Feb 26, 2021
drug
February 19, 2021
Dear D-ers,

I wanted to port a currently Perl code used for scoring an NLP task to D.  A few folks have reimplemented it in Python but I would like to create a D implementation.

The catch is that each of the above two routines relies on a library or module that implements the Kuhn-Munkres assignment algorithm (https://en.wikipedia.org/wiki/Hungarian_algorithm).  Both the Perl and Python implementations are based on the the following (likely more general version the algorithm—I believe one that handles rectangular matrices)

http://csclab.murraystate.edu/~bob.pilgrim/445/munkres.html

I did a quick search to find out whether there already exists a D library that implements this routine.   I did not find any library satisfying this requirement.  So, it is likely that I would have to write it myself. My assumption is that it won't be very complicated, but for the most efficient implementation I would probably need to use the one of the D libraries that do NumPy style, efficient computation over matrices.  OR, I could find a implementation in C and create a binding for it in D.  I am not sure what is the easiest and least effort path that would ensure an accurate and optimal (in terms of speed) solution.

I have a feeling that someone one on this forum might have more information that could shed some more light on the best path to take.


--
Sameer

February 19, 2021
On Friday, 19 February 2021 at 20:53:33 UTC, Sameer Pradhan wrote:
> Dear D-ers,
>
> I wanted to port a currently Perl code used for scoring an NLP task to D.  A few folks have reimplemented it in Python but I would like to create a D implementation.
>
> [...]

I'm not familiar with this algorithm, but the Mir libraries have fast matrix routines so worth taking a look at those.
February 20, 2021
On Friday, 19 February 2021 at 20:53:33 UTC, Sameer Pradhan wrote:

> I have a feeling that someone one on this forum might have more information that could shed some more light on the best path to take.

I'm solving the linear assignment problem (LAP), but not in D. Therefore I can't share code...

Some general notes:
* For larger problems
R. Jonker and A. Volgenant, "A Shortest Augmenting Path Algorithm for Dense and Sparse Linear Assignment Problems," Computing, vol. 38, pp. 325-340, 1987.
  is often faster than the hungarian method. Scipy switched their implementation lately.
* you can use single commodity min-cost-flow algorithms for solving the LAP. In my experience lemon::NetworkSimplex [1] (written in C++ -> some glue code for a template free API needed) beats a reasonably good implementation of the hungarian method by a factor of 5-10.
   * min-cost flow can handle non-square matrices.
   * I have no implementation of the Jonker/Volgenant to compare to.



[1] http://lemon.cs.elte.hu/pub/doc/1.3.1/a00276.html
February 20, 2021
On 2/19/21 11:53 PM, Sameer Pradhan wrote:
> Dear D-ers,
> 
> I wanted to port a currently Perl code used for scoring an NLP task to D.  A few folks have reimplemented it in Python but I would like to create a D implementation.
> 
> The catch is that each of the above two routines relies on a library or module that implements the Kuhn-Munkres assignment algorithm (https://en.wikipedia.org/wiki/Hungarian_algorithm). Both the Perl and Python implementations are based on the the following (likely more general version the algorithm—I believe one that handles rectangular matrices)
> 
> http://csclab.murraystate.edu/~bob.pilgrim/445/munkres.html
> 
> I did a quick search to find out whether there already exists a D library that implements this routine.   I did not find any library satisfying this requirement.  So, it is likely that I would have to write it myself. My assumption is that it won't be very complicated, but for the most efficient implementation I would probably need to use the one of the D libraries that do NumPy style, efficient computation over matrices.  OR, I could find a implementation in C and create a binding for it in D.  I am not sure what is the easiest and least effort path that would ensure an accurate and optimal (in terms of speed) solution.
> 
> I have a feeling that someone one on this forum might have more information that could shed some more light on the best path to take.
> 
> 
> -- 
> Sameer
> 

As I understand all you need for efficient Hungarian algorithm implementation is a contiguous 2D array, not jagged one like built-in 2D array in D. All other operations (like conditional modification of a matrix element) are too trivial/specific to be highly optimized in 3rd party libraries (in contrast to finding the inverse of a matrix, for example). Also an operation like row modification should be well optimized by a compiler (ldc/gdc first of all). I would just take mir.ndslice to get a contiguous dynamic 2D array and implement the algorithm like in example you mentioned above. Or if your matrix has fixed size you can use gfm.math.Matrix for contiguous static arrays. gfm is less advanced than Mir because intended for 3D games first of all, but it can be much easier to use. Also gfm has only dependencies.
February 25, 2021
On Saturday, 20 February 2021 at 10:35:27 UTC, drug wrote:
> On 2/19/21 11:53 PM, Sameer Pradhan wrote:
>> Dear D-ers,
>> 
>> I wanted to port a currently Perl code used for scoring an NLP task to D.  A few folks have reimplemented it in Python but I would like to create a D implementation.
>> 
>> The catch is that each of the above two routines relies on a library or module that implements the Kuhn-Munkres assignment algorithm (https://en.wikipedia.org/wiki/Hungarian_algorithm). Both the Perl and Python implementations are based on the the following (likely more general version the algorithm—I believe one that handles rectangular matrices)
>> 
>> http://csclab.murraystate.edu/~bob.pilgrim/445/munkres.html
>> 
>> I did a quick search to find out whether there already exists a D library that implements this routine.   I did not find any library satisfying this requirement.  So, it is likely that I would have to write it myself. My assumption is that it won't be very complicated, but for the most efficient implementation I would probably need to use the one of the D libraries that do NumPy style, efficient computation over matrices.  OR, I could find a implementation in C and create a binding for it in D.  I am not sure what is the easiest and least effort path that would ensure an accurate and optimal (in terms of speed) solution.
>> 


Thanks for your responses. I had one other related question: I also found a C-implementation of the algorithm. I have never written a Dlang binding for a C library, but since D has binary compatibility with a C library/object file, would it be easier to write a wrapper for this C library in D? Or there can be some gotchas that I need to keep in mind if I take this path?

--
Sameer

February 25, 2021
On Saturday, 20 February 2021 at 07:50:40 UTC, anonymous wrote:
> * For larger problems
> R. Jonker and A. Volgenant, "A Shortest Augmenting Path Algorithm for Dense and Sparse Linear Assignment Problems," Computing, vol. 38, pp. 325-340, 1987.
>   is often faster than the hungarian method. Scipy switched their implementation lately.
> * you can use single commodity min-cost-flow algorithms for solving the LAP. In my experience lemon::NetworkSimplex [1] (written in C++ -> some glue code for a template free API needed) beats a reasonably good implementation of the hungarian method by a factor of 5-10.
>    * min-cost flow can handle non-square matrices.
>    * I have no implementation of the Jonker/Volgenant to compare to.
> [1] http://lemon.cs.elte.hu/pub/doc/1.3.1/a00276.html

Thanks for these details! It is interesting that an algorithm from 1987 is valid and potentially 5-10x faster than the hungarian method. Sorry for my ignorance in this area, but do both algorithms provide optimal solutions, or would there be some compromise in quality of the solutions for the Jonker vs Hungarian method?

February 25, 2021
On Thursday, 25 February 2021 at 20:35:36 UTC, Sameer Pradhan wrote:

> Thanks for your responses. I had one other related question: I also found a C-implementation of the algorithm. I have never written a Dlang binding for a C library, but since D has binary compatibility with a C library/object file, would it be easier to write a wrapper for this C library in D? Or there can be some gotchas that I need to keep in mind if I take this path?

If you have a working implementation, save yourself the time and call the C code from D. You should not need to write bindings, you should use dstep to write a binding for you

https://github.com/jacob-carlborg/dstep

or dpp to include the header file directly in your D code

https://github.com/atilaneves/dpp

Once you get that working, you can create a wrapper with better functionality if it's worth your time to do so. I typically just call the C functions as they're written.
February 25, 2021
On Thursday, 25 February 2021 at 20:40:37 UTC, Sameer Pradhan wrote:
> Thanks for these details! It is interesting that an algorithm from 1987 is valid and potentially 5-10x faster than the hungarian method. Sorry for my ignorance in this area, but do both algorithms provide optimal solutions, or would there be some compromise in quality of the solutions for the Jonker vs Hungarian method?

well, there are at least 3 approaches to the linear assignment problem
1) the hungarian method (from the 1950s)
2) Jonker / Volgenant algorithm
3) reduction to min-cost flow. There are several algorithms for min-cost flow. One of them is the network simplex method.

all are exact, i.e. return solutions with the same costs/objective value. Note that there might be more than one optimal solution.

In theory one could also use a general purpose LP solver (any basic solution will be integer), but even the (expensive) commercial packages are slower than a reasonable implementation of the 3 approaches above.

As mentioned before, I have no practical experience with 2) as the license terms of the original Pascal implementation in  the paper are somewhat unclear (and therefore of all literal translations to other programming languages).
The 5-10x speedup is between a reasonable (school-bookish) implementation of the hungarian method I can not share and lemon::NetworkSimplex (-> option 3) - but lemon is a template heavy C++ library, so interfacing it to D will require some C++ programming.
February 26, 2021
On 2/25/21 11:35 PM, Sameer Pradhan wrote:
> 
> 
> Thanks for your responses. I had one other related question: I also found a C-implementation of the algorithm. I have never written a Dlang binding for a C library, but since D has binary compatibility with a C library/object file, would it be easier to write a wrapper for this C library in D? Or there can be some gotchas that I need to keep in mind if I take this path?
> 
> -- 
> Sameer
> 

Bindings would be the easier way. It is really trivial, imho, no gotchas at all. Also you can ask a question here.