wordladder - code improvement

Jan 30, 2020

mark

Jan 31, 2020

dwdv

Jan 31, 2020

mark

Jan 31, 2020

dwdv

Jan 31, 2020

mark

January 30, 2020

wordladder - code improvement

Posted by mark

Permalink

mark

Permalink

Below is a program that produces a wordladder.

The algorithm is probably suboptimal, but I don't care since I've implemented the same one in Python, Rust, Go, Java, and Nim, so I find it useful for language comparison purposes.

What I'd like some feedback on is how to improve the code (keeping the algorithm the same) to make it into more idiomatic D and to take the most advantage of D's features (while keeping it as understandable as possible).

For example, in the last function, compatibleWords(), I can't help thinking that I could avoid the foreach loop.

Also I'm still not really clear on the appropriateness of const/immutable/in in function arguments. The docs seem to discourage in, and other things I've read seem to favour const. Similarly, the appropriateness of const/immutable inside functions.

// wordladder.d
import core.time: MonoTime;
import std.algorithm: any, count, filter, map, sum, until;
import std.array: array, join;
import std.conv: dtext, to;
import std.functional: not;
import std.range: assocArray, repeat, zip;
import std.random: choice;
import std.stdio: File, write, writeln;
import std.uni: isAlpha, toUpper;

enum WORDFILE = "/usr/share/hunspell/en_GB.dic";
enum WORDSIZE = 4; // Should be even
enum STEPS = WORDSIZE;

alias WordList = string[];
alias WordSet = int[string]; // key = word; value = 0

void main() {
    immutable start = MonoTime.currTime;
    auto words = getWords(WORDFILE, WORDSIZE);
    int count = 1;
    WordList ladder;
    write("Try ");
    while (true) {
	write('.');
	ladder = generate(words, STEPS);
	if (ladder.length != 0)
	    break;
	++count;
    }
    writeln(' ', count);
    writeln(join(ladder, '\n'));
    writeln(MonoTime.currTime - start);
}

WordSet getWords(const string filename, const int wordsize) {
    return File(filename).byLine
	.map!(line => line.until!(not!isAlpha))
	.filter!(word => word.count == wordsize)
	.map!(word => word.to!string.toUpper)
	.assocArray(0.repeat);
}

WordList generate(WordSet allWords, const int steps) {
    WordList ladder;
    auto words = allWords.dup;
    auto compatibles = allWords.dup;
    auto prev = update(ladder, words, compatibles);
    for (int i = 0; i <= steps; ++i) {
	compatibles = compatibleWords(prev, words);
	if (compatibles.length == 0)
	    return [];
	prev = update(ladder, words, compatibles);
    }
    immutable first = dtext(ladder[0]);
    immutable last = dtext(ladder[$ - 1]);
    if (any!(t => t[0] == t[1])(zip(first, last)))
	return []; // Don't accept any vertical letters in common
    return ladder;
}

string update(ref WordList ladder, ref WordSet words,
	      const WordSet compatibles) {
    immutable word = compatibles.byKey.array.choice;
    ladder ~= word;
    words.remove(word);
    return word;
}

// Add words that are 1 letter different to prev
WordSet compatibleWords(const string prev, const WordSet words) {
    WordSet compatibles;
    immutable prevChars = dtext(prev);
    immutable size = prevChars.length - 1;
    foreach (word; words.byKey)
	if (sum(map!(t => int(t[0] == t[1]))
		     (zip(prevChars, dtext(word)))) == size)
	    compatibles[word] = 0;
    return compatibles;
}

From one noob to another: Not much of a difference, but levenshteinDistance seems to be a good fit here. About as fast as your solution, slightly lower memory usage. byCodeUnit/byChar might shave off a few more ms. For small scripts like these I'm usually not even bothering with const correctness and selective imports. There's also import std; but that might lead to ambiguous imports. --- import std.algorithm, std.array, std.conv, std.datetime.stopwatch, std.functional, std.random, std.range, std.stdio, std.uni; enum WORDFILE = "/usr/share/hunspell/en_GB.dic"; enum WORDSIZE = 4; enum STEPS = WORDSIZE; // could be verified at compile-time static assert(WORDSIZE % 2 == 0, "WORDSIZE should be even"); alias WordList = string[]; alias WordSet = bool[string]; void main() { auto sw = StopWatch(AutoStart.yes); auto words = getWords(WORDFILE, WORDSIZE); // would be slicker with proper destructuring support auto res = generate!(() => genLadder(words.dup, STEPS)) .enumerate(1) .find!(a => !a[1].empty) .front; writeln("tries: ", res[0]); res[1].each!writeln; writeln(sw.peek); // or sw.peek.total!"msecs" } WordSet getWords(string filename, uint wordsize) { return File(filename).byLine .map!(line => line.until!(not!isAlpha)) .filter!(word => word.count == wordsize) .map!(word => word.to!string.toUpper) .assocArray(true.repeat); } WordList genLadder(WordSet words, uint steps) { WordList ladder; string update(WordList comps) { ladder ~= comps.choice; words.remove(ladder.back); return ladder.back; } WordList compWords(string prev) { return words.byKey.filter!(word => levenshteinDistance(word, prev) == 1).array; } auto prev = update(words.byKey.array); foreach (comps; generate!(() => compWords(prev)).take(steps + 1)) { if (comps.empty) return comps; prev = update(comps); } if (levenshteinDistance(ladder.front, ladder.back) < WORDSIZE) return []; return ladder; }

Thanks for your implementation. I can't use the levenshtien distance because although it is a better solution, I want to keep the implementation as compatible with those in the other languages as possible. Your main() is much shorter than mine, but doesn't match the behaviour (and again I want to keep the same as the other implementations). Nonetheless, I didn't know about generate!()(), so I've learnt from yours. Nor did I know about the StopWatch which I'm now using, plus I've added your static assert, and also copied your idea of making update() an inner function (and now only needing one parameter). I still need to study your genLadder() more carefully to understand it because it is so compact.

I forgot to mention: I know it isn't worth bothering with const/immutable for this tiny example. But I want to learn how to write large D programs, so I need to get into the right habits and know the right things.

On 2020-01-31 09:44, mark via Digitalmars-d-learn wrote: > I can't use the levenshtien distance because although it is a better solution, [...] Nah, it isn't, sorry for the noise, should have slept before sending the message, was thinking of hamming distance: auto a = "abcd"; auto b = "bcda"; auto hammingDistance(string a, string b) { return zip(a, b).count!(t => t[0] != t[1]); } levenshteinDistance(a, b).writeln; // => 2 hammingDistance(a, b).writeln; // => 4 > main() [...] doesn't match the behaviour There's also std.range.tee if you ever need to perform additional side-effects in a range pipeline, which restores the original dot printing: write("Try "); auto res = generate!(() => genLadder(words.dup, STEPS)) .enumerate(1) .tee!(_ => write('.'), No.pipeOnPop) // No.pipeOnPop ensures we're not one dot short .find!(a => !a[1].empty) .front; writeln(" ", res[0]); > genLadder() [...] is so compact. Thinking about it, you could even eliminate the prev variable all together when using ladder.back in compWords. Regarding const correctness, this article and thread might contain useful information: https://forum.dlang.org/thread/2735451.YHZktzbKJo@lyonel By the way, keep the documentation improvements coming, much appreciated!

Forums