September 21, 2019
Hi,

This is a D port of a Go package implementing Content-Defined Chunking:

https://github.com/CyberShadow/chunker

The package contains the following modules:

- chunker.polynomials - implements Pol, a type which represents a polynomial from F_2[X]. I'm not quite sure what that is, but they seem to be very useful.

- chunker.rabin - implements RabinHash, which calculates a rolling Rabin Fingerprint.

- chunker - implements Chunker, an adapter range which accepts chunks of bytes (such as from File.byChunk) and emits variable-size content-defined chunks, which are split when the local Rabin Fingerprint reaches a certain value.

Links
-----

- Wikipedia: https://en.wikipedia.org/wiki/Rolling_hash#Rabin_fingerprint

- Original Go version: https://github.com/restic/chunker

- Dub package: https://code.dlang.org/packages/chunker

- Documentation: https://chunker.dpldocs.info/chunker.html (courtesy of Adam Ruppe's dpldocs service)

- Example: https://github.com/cybershadow/chunker/blob/master/src/chunker/example.d

Differences from the Go version
-------------------------------

- Chunker was adapted to be a D range and accept D ranges as input.

- The Rabin Fingerprint implementation was extracted out of Chunker and into its own module. It is usable stand-alone.

- Significant refactorings and simplifications of the implementation. The original code made some sacrifices in code readability to work around limitations of the language and compiler optimization to achieve reasonable performance.

- 20% faster than the Go version (LDC release build).

- Improved test coverage and symbol documentation.

The original package was written by Alexander Neumann and is used in the restic backup program.

September 21, 2019
On Saturday, 21 September 2019 at 03:11:11 UTC, Vladimir Panteleev wrote:
> Hi,
>
> This is a D port of a Go package implementing Content-Defined Chunking:
>
> https://github.com/CyberShadow/chunker

[...]

> - Significant refactorings and simplifications of the implementation. The original code made some sacrifices in code readability to work around limitations of the language and compiler optimization to achieve reasonable performance.
>
> - 20% faster than the Go version (LDC release build).

Marvellous! Well done.

[...]

> The original package was written by Alexander Neumann and is used in the restic backup program.

Sounds like D would have been the right language for Restic. Maybe this is enough to spark Alexander’s interest in D?

Cheers,
Bastiaan.