New hash API: Update - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » New hash API: Update

Thread overview

New hash API: Update

Jun 24, 2012

Jun 24, 2012

Jun 24, 2012

Jun 24, 2012

Masahiro Nakagawa

Jun 24, 2012

Jun 24, 2012

Jun 24, 2012

Dmitry Olshansky

Jun 24, 2012

David Nadlinger

Jun 24, 2012

Alex Rønne Petersen

Jun 25, 2012

Jun 25, 2012

Jonathan M Davis

Jun 24, 2012

Dmitry Olshansky

Jun 25, 2012

Jun 25, 2012

Jonathan M Davis

Jun 25, 2012

Jun 25, 2012

Alex Rønne Petersen

New hash API: namespace
Jun 25, 2012 Johannes Pfau
Jun 25, 2012 Jonathan M Davis
Jun 25, 2012 Piotr Szturmaj
Jun 25, 2012 Jacob Carlborg
Jun 25, 2012 Jonathan M Davis
Jun 25, 2012 Felix Hufnagel
Jun 25, 2012 Dmitry Olshansky
Jun 25, 2012 nazriel
Jun 25, 2012 Jesse Phillips
Jun 29, 2012 Don Clugston
Jun 29, 2012 Piotr Szturmaj

June 24, 2012

New hash API: Update

Posted by Johannes Pfau

Johannes Pfau

I'm mostly finished with my hash API proposal. I also ported the existing crc, md5 and the proposed sha1 hash to this new API.

I changed the namespace to std.util.digest. Andrei once said he thinks std.digest/std.hash is a too narrow package and someone else said putting crc into std.crypto.digest is ridiculous. So I did what tango and other libraries do and created a std.util module.

I think std.uuid would also fit well into std.util so it'd become std.util.uuid.

Here's the documentation: http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_digest.html http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_crc.html http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_md5.html http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_sha.html

And here's a pull request for the code: https://github.com/D-Programming-Language/phobos/pull/646

Github branch:
https://github.com/jpf91/phobos/tree/newHash



There are still some open questions:

OOP interface: Digest.finish()

This can only throw if the supplied buffer is too small. Make this nothrow & throw an Error on too small buffer? Or check buffer only in debug mode using asserts or preconditions?



CRC32:
The current implementation doesn't seem to be compliant
to the 'common' CRC-32-IEEE 802.3 form, at least it doesn't pass these
test vectors:
http://www.febooti.com/products/filetweak/members/hash-and-crc/test-vectors/
http://www.lammertbies.nl/comm/info/crc-calculation.html
http://rosettacode.org/wiki/CRC-32

Turns out we'd need to invert the value and make sure to use LSB first order. I made those changes, so now we produce the same output as the rest of the world, but the new std.util.digest.crc will produce different values than the old std.crc now. Is this OK?

I'm also not happy about the crc license:
provided that the above copyright notice appear in all copies and
 * that both that copyright notice and this permission notice appear
 * in supporting documentation.


I'll put all my modifications under boost license. The only thing left of the original code is the crc32_table and the implementation of the put function.

The table and the one line of code is also available as public domain
code here:
http://www.csbruce.com/~csbruce/software/crc32.c

So I think it should be possible to change the license to boost?

June 24, 2012

Re: New hash API: Update

Posted by Piotr Szturmaj
in reply to Johannes Pfau

Piotr Szturmaj

Posted in reply to Johannes Pfau

Johannes Pfau wrote:
> I'm mostly finished with my hash API proposal. I also ported the
> existing crc, md5 and the proposed sha1 hash to this new API.
>
> I changed the namespace to std.util.digest. Andrei once said he thinks
> std.digest/std.hash is a too narrow package and someone else said
> putting crc into std.crypto.digest is ridiculous. So I did what tango
> and other libraries do and created a std.util module.
>
> I think std.uuid would also fit well into std.util so it'd become
> std.util.uuid.

I vote for std.crypto.hash and std.uuid. CRC, Adler and others could fit in std.checksum.

June 24, 2012

Re: New hash API: Update

Posted by Masahiro Nakagawa
in reply to Johannes Pfau

Masahiro Nakagawa

Posted in reply to Johannes Pfau

On Sunday, 24 June 2012 at 15:23:19 UTC, Johannes Pfau wrote:
> I'm mostly finished with my hash API proposal. I also ported the
> existing crc, md5 and the proposed sha1 hash to this new API.

Great! I will read docs and souce code later.

> I changed the namespace to std.util.digest. Andrei once said he thinks
> std.digest/std.hash is a too narrow package and someone else said
> putting crc into std.crypto.digest is ridiculous. So I did what tango
> and other libraries do and created a std.util module.
>
> I think std.uuid would also fit well into std.util so it'd become
> std.util.uuid.
>

I disagree this point. The name 'util' does not make any sense.
'util' seems to be very subjective.

June 24, 2012

Re: New hash API: Update

Posted by David
in reply to Masahiro Nakagawa

David

Posted in reply to Masahiro Nakagawa

Am 24.06.2012 18:13, schrieb Masahiro Nakagawa:
> I disagree this point. The name 'util' does not make any sense.
> 'util' seems to be very subjective.
>

Right `util` can be everything.

June 24, 2012

Re: New hash API: Update

Posted by Johannes Pfau
in reply to David

Johannes Pfau

Posted in reply to David

Am Sun, 24 Jun 2012 18:21:06 +0200
schrieb David <d@dav1d.de>:

> Am 24.06.2012 18:13, schrieb Masahiro Nakagawa:
> > I disagree this point. The name 'util' does not make any sense. 'util' seems to be very subjective.
> >
> 
> Right `util` can be everything.
> 

Yep that's the idea. You put everything in there which is not important enough to be in the std. namespace, but which also doesn't fit into another submodule either.

For example I don't think uuid should be in the top-level std namespace. But where to put it then?

June 24, 2012

Re: New hash API: Update

Posted by Johannes Pfau
in reply to Piotr Szturmaj

Johannes Pfau

Posted in reply to Piotr Szturmaj

Am Sun, 24 Jun 2012 18:07:53 +0200
schrieb Piotr Szturmaj <bncrbme@jadamspam.pl>:

> Johannes Pfau wrote:
> > I'm mostly finished with my hash API proposal. I also ported the existing crc, md5 and the proposed sha1 hash to this new API.
> >
> > I changed the namespace to std.util.digest. Andrei once said he thinks std.digest/std.hash is a too narrow package and someone else said putting crc into std.crypto.digest is ridiculous. So I did what tango and other libraries do and created a std.util module.
> >
> > I think std.uuid would also fit well into std.util so it'd become std.util.uuid.
> 
> I vote for std.crypto.hash and std.uuid. CRC, Adler and others could fit in std.checksum.

We could do this as well. Then we also have to decide whether we want to have a common API for std.checksum and std.crypto.hash. And we have to decide where to put the common parts, those that are in std.util.digest.digest right now.

June 24, 2012

Re: New hash API: Update

Posted by Dmitry Olshansky
in reply to Johannes Pfau

Dmitry Olshansky

Posted in reply to Johannes Pfau

On 24-Jun-12 19:23, Johannes Pfau wrote:
> I'm mostly finished with my hash API proposal. I also ported the
> existing crc, md5 and the proposed sha1 hash to this new API.
>
> I changed the namespace to std.util.digest. Andrei once said he thinks
> std.digest/std.hash is a too narrow package and someone else said
> putting crc into std.crypto.digest is ridiculous. So I did what tango
> and other libraries do and created a std.util module.
>
> I think std.uuid would also fit well into std.util so it'd become
> std.util.uuid.


To be frank almost anything could 'nicely fit' into util. That's why we should avoid this abomination at all costs.
std.crypto
or
std.checksum

-- 
Dmitry Olshansky

June 24, 2012

Re: New hash API: Update

Posted by David Nadlinger
in reply to Dmitry Olshansky

David Nadlinger

Posted in reply to Dmitry Olshansky

On Sunday, 24 June 2012 at 18:10:03 UTC, Dmitry Olshansky wrote:
> To be frank almost anything could 'nicely fit' into util. That's why we should avoid this abomination at all costs.

I tend to agree. An »util« package might make sense in an application, where it could e.g. hold smaller self-contained helper modules, but not so much in a library.

David

June 24, 2012

Re: New hash API: Update

Posted by Alex Rønne Petersen
in reply to Johannes Pfau

Alex Rønne Petersen

Posted in reply to Johannes Pfau

On 24-06-2012 17:23, Johannes Pfau wrote:
> I'm mostly finished with my hash API proposal. I also ported the
> existing crc, md5 and the proposed sha1 hash to this new API.

Thanks.

>
> I changed the namespace to std.util.digest. Andrei once said he thinks
> std.digest/std.hash is a too narrow package and someone else said
> putting crc into std.crypto.digest is ridiculous. So I did what tango
> and other libraries do and created a std.util module.

I am strongly against this. Java's 'java.util' package should be a fair indicator of why; text processing, collections, date/time, ...

>
> I think std.uuid would also fit well into std.util so it'd become
> std.util.uuid.

I really really don't like where that is going. A package with a name like 'util' basically means /nothing/. It's as good as having this in the 'std' package in the first place. I understand that you want to group things into proper packages (I do this heavily in my projects), but we need a better name than 'util'. I really don't think there's anything wrong with just plain 'std.hash', especially since we're likely to add more algorithms over time.

>
> Here's the documentation:
> http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_digest.html
> http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_crc.html
> http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_md5.html
> http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_sha.html

These modules aren't quite consistent: There exist many versions of SHA-1 and so the module is named 'sha'. Makes sense. There are several MD algorithms, but the module is named 'md5'?

I think we should either have a module per specific algorithm or modules with neutral names like 'md'.

>
> And here's a pull request for the code:
> https://github.com/D-Programming-Language/phobos/pull/646
>
> Github branch:
> https://github.com/jpf91/phobos/tree/newHash
>
>
>
> There are still some open questions:
>
> OOP interface: Digest.finish()
>
> This can only throw if the supplied buffer is too small. Make this
> nothrow & throw an Error on too small buffer? Or check buffer only in
> debug mode using asserts or preconditions?

Error. It's a logic error to pass in a too small buffer.

Also, most (if not all) Digest methods really should be pure nothrow. Same for some free functions (I think it's good practice to mark template functions as pure nothrow explicitly if you want to guarantee this).

I agree with the overall interface. I think it's going in the right direction.

>
>
>
> CRC32:
> The current implementation doesn't seem to be compliant
> to the 'common' CRC-32-IEEE 802.3 form, at least it doesn't pass these
> test vectors:
> http://www.febooti.com/products/filetweak/members/hash-and-crc/test-vectors/
> http://www.lammertbies.nl/comm/info/crc-calculation.html
> http://rosettacode.org/wiki/CRC-32
>
> Turns out we'd need to invert the value and make sure to use LSB first
> order. I made those changes, so now we produce the same output as the
> rest of the world, but the new std.util.digest.crc will produce
> different values than the old std.crc now. Is this OK?

Yes, correctness over *.

>
> I'm also not happy about the crc license:
> provided that the above copyright notice appear in all copies and
>   * that both that copyright notice and this permission notice appear
>   * in supporting documentation.

Yes, this annoyed me too.

>
>
> I'll put all my modifications under boost license. The only thing left
> of the original code is the crc32_table and the implementation of the
> put function.
>
> The table and the one line of code is also available as public domain
> code here:
> http://www.csbruce.com/~csbruce/software/crc32.c
>
> So I think it should be possible to change the license to boost?
>
>

Yes IMO. Anyone could have gone and implemented it from that C file.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org

June 24, 2012

Re: New hash API: Update

Posted by Dmitry Olshansky
in reply to Johannes Pfau

Dmitry Olshansky

Posted in reply to Johannes Pfau

On 24-Jun-12 19:23, Johannes Pfau wrote:
>
> There are still some open questions:
>
> OOP interface: Digest.finish()
>
> This can only throw if the supplied buffer is too small. Make this
> nothrow & throw an Error on too small buffer? Or check buffer only in
> debug mode using asserts or preconditions?
>

Yup. I'd suggest to simply assert on it.

>
> CRC32:
> The current implementation doesn't seem to be compliant
> to the 'common' CRC-32-IEEE 802.3 form, at least it doesn't pass these
> test vectors:
> http://www.febooti.com/products/filetweak/members/hash-and-crc/test-vectors/
> http://www.lammertbies.nl/comm/info/crc-calculation.html
> http://rosettacode.org/wiki/CRC-32
>

I believe there ought to be CRC32 with *any* kind of license on the web.

I'll throw in some things that looked plain wrong to me:
- calculateDigestOOP. Besides having awful, awful name it really should be final method of Digest interface (yes, we have these since quite some time)
- digestToString helper. What kind of string? Why not just toString as member? Or rather clarify that it obtains hexadecimal representation of digest.
e.g. toHexString looks far more intuitive for me  (again check out toString debate - I hardly believe that hashes have only one possible string representation)

- calculateDigest (also called calculateHash somewhere) - why not just digest ? In general I'm weary and tired of no-brainer prefixes. They add extra symbols for no benefit, because, of course, digest is calculated. And so does sum for instance, yet we (would) have sum in algorithm not calculate sum, same for min/max etc.
(I think one of hardest goals of std.* is to meet the best balance of clarity and brevity.)

- same goes for md5Sum --> md5Of (i'd love to do plain md5 but maybe it's much)
- crc32Sum --> scr32Of //basically the fact that both are sums helps very little, yet the suffix 'Of' (I think) indicates that the calculation is to happen right here (i.e. that it's not initialization or smth).

Everything else looks good, though docs may need some proof-reading.
Thanks for pushing this proposal.

-- 
Dmitry Olshansky

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation