Howdy all :)
We recently encountered an interesting issue during development of our automated packaging tool, (
boulder new). The computation time for levenshteinDifference severely impeded performance when scanning LICENSE files and matching them with the SPDX data set.
As an experiment, we've begun porting Python's difflib to D and it can be found here:
It's a bit ugly internally right now but it does implement the bare basics, i.e.
SequenceMatcher with the following APIs:
- `longestMatch` - Return the longest match between two sequences - `matchingBlocks` - Return Match of all encountered matches - `ratio` - Similarity between two sequences, 0.0f-1.0f
Anyway, thought it may come in handy for people in future, so if there's interest
we can flesh it out beyond our own use case. :)