Thread overview
Proper way to work with huge binary files
Jan 24, 2012
C
Jan 24, 2012
Justin Whear
Jan 24, 2012
Jonathan M Davis
January 24, 2012
Hello all.
After a quick recce on the main D site I couldn't find a how-to for large
binary files.

Suppose I'm using an old machine with limited memory, and I want to compute
the MD5 hash of a 4 GB file.
Do I need to buffer the input as in C++? And how should I check for exceptions?
Example code much appreciated.

Thank you for reading.
January 24, 2012
On Tue, 24 Jan 2012 15:53:48 +0000, C wrote:

> Hello all.
> After a quick recce on the main D site I couldn't find a how-to for
> large binary files.
> 
> Suppose I'm using an old machine with limited memory, and I want to
> compute the MD5 hash of a 4 GB file.
> Do I need to buffer the input as in C++? And how should I check for
> exceptions? Example code much appreciated.
> 
> Thank you for reading.

std.stream has a BufferedStream class which doesn't care whether the underlying stream is text or binary. I'd start there.

Justin
January 24, 2012
On Tuesday, January 24, 2012 15:53:48 C wrote:
> Hello all.
> After a quick recce on the main D site I couldn't find a how-to for large
> binary files.
> 
> Suppose I'm using an old machine with limited memory, and I want to compute
> the MD5 hash of a 4 GB file.
> Do I need to buffer the input as in C++? And how should I check for
> exceptions? Example code much appreciated.
> 
> Thank you for reading.

You'd probably just use std.stdio.File's byChunk with std.md5.MD5_CTX. Something like

MD5_CTX md5;
md5.start();
auto file = File(filename);

foreach(ubyte[] buffer; file.byChunk(4096))
 md5.update(buffer);

ubyte[16] result;
md5.finish(result);

I've never used std.md5 though, so I'm just going by the docs on how to use it.

- Jonathan M Davis