September 21, 2019 Chunker - Content-Defined Chunking based on Rabin Checksums | ||||
---|---|---|---|---|
| ||||
Hi, This is a D port of a Go package implementing Content-Defined Chunking: https://github.com/CyberShadow/chunker The package contains the following modules: - chunker.polynomials - implements Pol, a type which represents a polynomial from F_2[X]. I'm not quite sure what that is, but they seem to be very useful. - chunker.rabin - implements RabinHash, which calculates a rolling Rabin Fingerprint. - chunker - implements Chunker, an adapter range which accepts chunks of bytes (such as from File.byChunk) and emits variable-size content-defined chunks, which are split when the local Rabin Fingerprint reaches a certain value. Links ----- - Wikipedia: https://en.wikipedia.org/wiki/Rolling_hash#Rabin_fingerprint - Original Go version: https://github.com/restic/chunker - Dub package: https://code.dlang.org/packages/chunker - Documentation: https://chunker.dpldocs.info/chunker.html (courtesy of Adam Ruppe's dpldocs service) - Example: https://github.com/cybershadow/chunker/blob/master/src/chunker/example.d Differences from the Go version ------------------------------- - Chunker was adapted to be a D range and accept D ranges as input. - The Rabin Fingerprint implementation was extracted out of Chunker and into its own module. It is usable stand-alone. - Significant refactorings and simplifications of the implementation. The original code made some sacrifices in code readability to work around limitations of the language and compiler optimization to achieve reasonable performance. - 20% faster than the Go version (LDC release build). - Improved test coverage and symbol documentation. The original package was written by Alexander Neumann and is used in the restic backup program. |
September 21, 2019 Re: Chunker - Content-Defined Chunking based on Rabin Checksums | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | On Saturday, 21 September 2019 at 03:11:11 UTC, Vladimir Panteleev wrote: > Hi, > > This is a D port of a Go package implementing Content-Defined Chunking: > > https://github.com/CyberShadow/chunker [...] > - Significant refactorings and simplifications of the implementation. The original code made some sacrifices in code readability to work around limitations of the language and compiler optimization to achieve reasonable performance. > > - 20% faster than the Go version (LDC release build). Marvellous! Well done. [...] > The original package was written by Alexander Neumann and is used in the restic backup program. Sounds like D would have been the right language for Restic. Maybe this is enough to spark Alexander’s interest in D? Cheers, Bastiaan. |
Copyright © 1999-2021 by the D Language Foundation