| |
| Posted by H. S. Teoh | PermalinkReply |
|
H. S. Teoh
| While writing a tool for dissecting various file formats, I found a useful coding pattern that helps your D code be cleaner, more modular, and more easily unittestable.
Initially, I wrote a parser module that directly accessed std.stdio.File to parse file contents. Pretty standard approach, but it had the disadvantage of needing to do all sorts of shenanigans around rawRead, setting up buffers, managing buffers, etc.. It obscured an otherwise straightforward parser structure and made the code hard to maintain and hard to unittest.
So I came up with an idea to abstract file contents as a random-access range of ubyte with lazy loading, so that I can rewrite file parsing code in the nice, comfortable syntax of array slices without incurring the cost of actually loading the entire file into a ubyte[] buffer.
So I started a new module for wrapping std.stdio.File with a random-access range API. My initial attempt looked something like this:
module fileslice;
import std.stdio;
struct FileSlice {
File f;
... // range API
}
Seems straightforward enough, and helps tidy up the code by localizing buffer management to a single place. But again, the dependency on std.stdio.File made it cumbersome to write unittests. It's better than before, because now I don't have to setup buffers and call rawRead direectly, but I'd still have to create temporary files just to be able to test various parts of the code. It's ugly that unittests would have side-effects on the filesystem. Plus, I couldn't test error handling directly because I can't insert artificial I/O errors into std.stdio.File (at least, not without truly revolting system-dependent hacks).
My first idea was to change the above code to this:
module fileslice;
import std.stdio;
struct FileSliceImpl(F) {
F f; // N.B.: concrete File type abstracted away
... // range API
}
// Preserve compatibility with existing code.
alias FileSlice = FileSliceImpl!File;
This allows me to substitute std.stdio.File with mock file objects designed to test various aspects of FileSliceImpl, for example:
unittest {
struct MockFile {
ubyte[] data = [1,2,3]; // prebaked data
void seek(long offset) { ... }
ubyte[] rawRead(ubyte[] buf) { ... }
...
}
FileSliceImpl!MockFile s;
// ... do various tests on s
}
This allowed me to unittest FileSliceImpl without actually writing anything to the filesystem, and prebaking various test data into the code as array literals.
Coming back to the parser module, I now had something like this:
module parser;
import fileslice;
...
auto parseFile(FileSlice input) {
... // use nice input[x .. y] syntax, yay!
return result;
}
This was nice, but the unittests would need to create FileSlice instances of various sorts, possibly by manually instantiating FileSliceImpl with various mock files to be able to use prebaked ubyte[] data as test cases. It didn't taste right: that's too much coupling between the parser and fileslice modules. FileSliceImpl really shouldn't be visible outside the fileslice module.
But then inspiration struck: the only thing parseFile() *really* depended on was array-slicing syntax in the input! The whole issue with lazy loading of data only mattered when the input happens to come from a File. But there's no reason it couldn't come from an array, where there's no need for any buffering or lazy loading! So the dependency on FileSlice is actually redundant. So I rewrote the above code as:
module parser;
import std.range.primitives;
// N.B.: no more import of fileslice needed!!
...
auto parseFile(Slice)(Slice input)
if (isRandomAccessRange!Slice && hasSlicing!Slice &&
is(ElementType!Slice : ubyte))
{
... // use nice input[x .. y] syntax, yay!
return result;
}
Voila! Now my unittests can simply pass ubyte[] data directly into parseFile, and the parser module is now no longer coupled to the fileslice module. The main program would be what puts the two together. And now parseFile is able to handle data from any source, as long as slicing and random-access are available. We don't even need to import std.stdio in the parser module!!
And to top that off, now that everything has become templatized, the compiler will do attribute inference for us for free, so we don't even need to deal with ugly attribute soup in the code. Huzzah!
tl;dr: Whenever you have a data structure or a function that depends on a concrete type like File that introduces a dependency between modules, templatize it! In fact, templatize your code whenever possible -- the more the better. D templates make your code cleaner, more modular, and more unittestable. And you get attribute inference for free. Learn to love templates! :-P
T
--
I'm still trying to find a pun for "punishment"...
|