New hash API: Update (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » New hash API: Update (page 2)

June 25, 2012

Re: New hash API: Update

Posted by Jonathan M Davis
in reply to Johannes Pfau

Jonathan M Davis

Posted in reply to Johannes Pfau

On Sunday, June 24, 2012 17:23:18 Johannes Pfau wrote:
> I'm mostly finished with my hash API proposal. I also ported the existing crc, md5 and the proposed sha1 hash to this new API.
> 
> I changed the namespace to std.util.digest. Andrei once said he thinks std.digest/std.hash is a too narrow package and someone else said putting crc into std.crypto.digest is ridiculous. So I did what tango and other libraries do and created a std.util module.
> 
> I think std.uuid would also fit well into std.util so it'd become std.util.uuid.

No, no, no, no, no. util is _useless_ as a name. _Everything_ in Phobos is a utiliity of one sort or another. Just leave it as std.hash and std.uuid.

> Here's the documentation: http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_digest.html http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_crc.html http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_md5.html http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_sha.html
> 
> And here's a pull request for the code: https://github.com/D-Programming-Language/phobos/pull/646
> 
> Github branch:
> https://github.com/jpf91/phobos/tree/newHash

I'll have to look it over later, but this is enough of a change, that I suspect that a proper review cycle is order rather than simply making the tweaks and creating a pull request for it.

> The table and the one line of code is also available as public domain
> code here:
> http://www.csbruce.com/~csbruce/software/crc32.c
> 
> So I think it should be possible to change the license to boost?

As long as the only parts that are left of the original with the non-Boost license are publicly available, I don't see any reason why we couldn't put a Boost license on the version. Ideally, we would have _no_ licenses other than Boost in Phobos. The only reason that we do is due to old D1 code when Walter was doing most of it and the contributor situation was very different.

- Jonathan M Davis

June 25, 2012

Re: New hash API: Update

Posted by Johannes Pfau
in reply to Jonathan M Davis

Johannes Pfau

Posted in reply to Jonathan M Davis

Am Sun, 24 Jun 2012 17:58:47 -0700
schrieb Jonathan M Davis <jmdavisProg@gmx.com>:

> On Sunday, June 24, 2012 17:23:18 Johannes Pfau wrote:
> > I'm mostly finished with my hash API proposal. I also ported the existing crc, md5 and the proposed sha1 hash to this new API.
> > 
> > I changed the namespace to std.util.digest. Andrei once said he thinks std.digest/std.hash is a too narrow package and someone else said putting crc into std.crypto.digest is ridiculous. So I did what tango and other libraries do and created a std.util module.
> > 
> > I think std.uuid would also fit well into std.util so it'd become std.util.uuid.
> 
> No, no, no, no, no. util is _useless_ as a name. _Everything_ in Phobos is a utiliity of one sort or another. Just leave it as std.hash and std.uuid.

OK, OK I'm convinced.


> 
> > Here's the documentation: http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_digest.html http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_crc.html http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_md5.html http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_sha.html
> > 
> > And here's a pull request for the code: https://github.com/D-Programming-Language/phobos/pull/646
> > 
> > Github branch:
> > https://github.com/jpf91/phobos/tree/newHash
> 
> I'll have to look it over later, but this is enough of a change, that I suspect that a proper review cycle is order rather than simply making the tweaks and creating a pull request for it.

Yeah probably. We really should disable the new std.crc32 then, though.

> 
> > The table and the one line of code is also available as public domain code here: http://www.csbruce.com/~csbruce/software/crc32.c
> > 
> > So I think it should be possible to change the license to boost?
> 
> As long as the only parts that are left of the original with the non-Boost license are publicly available, I don't see any reason why we couldn't put a Boost license on the version. Ideally, we would have _no_ licenses other than Boost in Phobos. The only reason that we do is due to old D1 code when Walter was doing most of it and the contributor situation was very different.
> 
Great, I'll change that then :-)

June 25, 2012

Re: New hash API: Update

Posted by Johannes Pfau
in reply to Dmitry Olshansky

Johannes Pfau

Posted in reply to Dmitry Olshansky

Am Mon, 25 Jun 2012 00:02:00 +0400
schrieb Dmitry Olshansky <dmitry.olsh@gmail.com>:

> On 24-Jun-12 19:23, Johannes Pfau wrote:
> >
> > There are still some open questions:
> >
> > OOP interface: Digest.finish()
> >
> > This can only throw if the supplied buffer is too small. Make this nothrow & throw an Error on too small buffer? Or check buffer only in debug mode using asserts or preconditions?
> >
> 
> Yup. I'd suggest to simply assert on it.
OK

> 
> >
> > CRC32:
> > The current implementation doesn't seem to be compliant
> > to the 'common' CRC-32-IEEE 802.3 form, at least it doesn't pass
> > these test vectors:
> > http://www.febooti.com/products/filetweak/members/hash-and-crc/test-vectors/
> > http://www.lammertbies.nl/comm/info/crc-calculation.html
> > http://rosettacode.org/wiki/CRC-32
> >
> 
> I believe there ought to be CRC32 with *any* kind of license on the web.
> 
> I'll throw in some things that looked plain wrong to me:
> - calculateDigestOOP. Besides having awful, awful name it really
> should be final method of Digest interface (yes, we have these since
> quite some time)
OK.

> - digestToString helper. What kind of string? Why not just toString
> as member? Or rather clarify that it obtains hexadecimal
> representation of digest.
> e.g. toHexString looks far more intuitive for me  (again check out
> toString debate - I hardly believe that hashes have only one possible
> string representation)

toString as a member doesn't work well for some hashes. Those hashes destroy the internal state in finish(). So after a toString() call the hash would be either invalid or reset to initial state, which is counterintuitive.

But toHexString sounds like a good solution. (digestToString was the
name used in std.md5, btw)

> 
> - calculateDigest (also called calculateHash somewhere) - why not
> just digest ? In general I'm weary and tired of no-brainer prefixes.
> They add extra symbols for no benefit, because, of course, digest is
> calculated. And so does sum for instance, yet we (would) have sum in
> algorithm not calculate sum, same for min/max etc.
> (I think one of hardest goals of std.* is to meet the best balance of
> clarity and brevity.)
OK, I'll rename it to digest.

> 
> - same goes for md5Sum --> md5Of (i'd love to do plain md5 but maybe
> it's much)
> - crc32Sum --> scr32Of //basically the fact that both are sums helps
> very little, yet the suffix 'Of' (I think) indicates that the
> calculation is to happen right here (i.e. that it's not
> initialization or smth).
OK. those were originally called 'sum', but I think it's common to use multiple hash modules at the same time so name clashes would happen. md5Of/crcOf sound great.

> 
> Everything else looks good, though docs may need some proof-reading. Thanks for pushing this proposal.

Yep that was only a first draft. I still need to run it through a spell checker, proof read it and add some links.

June 25, 2012

Re: New hash API: Update

Posted by Johannes Pfau
in reply to Alex Rønne Petersen

Johannes Pfau

Posted in reply to Alex Rønne Petersen

Am Sun, 24 Jun 2012 21:40:53 +0200
schrieb Alex Rønne Petersen <alex@lycus.org>:

> Also, most (if not all) Digest methods really should be pure nothrow. Same for some free functions (I think it's good practice to mark template functions as pure nothrow explicitly if you want to guarantee this).

I still don't understand pure on member functions completely. The this pointer is considered as a function parameter, right?

So even put, start, reset... can be pure as those produce the same result as long as they receive the same arguments and the same _this_ 'state'?

I tried to add pure to the interface, but I seems it's not a good idea right now. It doesn't work for the SHA1 implementation for example cause that uses memcpy.

June 25, 2012

New hash API: namespace

Posted by Johannes Pfau
in reply to Johannes Pfau

Johannes Pfau

Posted in reply to Johannes Pfau

OK, so I understand std.util is probably not a good idea.

So the candidates for the namespace are:
* std.crypto.hash
* std.checksum
* std.crypto.hash and std.checksum
* std.hash

and the same with hash replaced by digest.
So which one should we use?

June 25, 2012

Re: New hash API: namespace

Posted by Jonathan M Davis
in reply to Johannes Pfau

Jonathan M Davis

Posted in reply to Johannes Pfau

On Monday, June 25, 2012 11:35:33 Johannes Pfau wrote:
> OK, so I understand std.util is probably not a good idea.
> 
> So the candidates for the namespace are:
> * std.crypto.hash
> * std.checksum
> * std.crypto.hash and std.checksum
> * std.hash
> 
> and the same with hash replaced by digest.
> So which one should we use?

The previous discussions on this resulted in us going with std.hash.md5, std.hash.sha1, and std.hash.crc32. I don't see any reason to change that, and crypto was specifically _not_ chosen, because crc32 isn't cryptographically sound. But std.hash encompasses things quite nicely, since they're all hashes.

- Jonathan M Davis

June 25, 2012

Re: New hash API: namespace

Posted by Piotr Szturmaj
in reply to Jonathan M Davis

Piotr Szturmaj

Posted in reply to Jonathan M Davis

Jonathan M Davis wrote:
> On Monday, June 25, 2012 11:35:33 Johannes Pfau wrote:
>> OK, so I understand std.util is probably not a good idea.
>>
>> So the candidates for the namespace are:
>> * std.crypto.hash
>> * std.checksum
>> * std.crypto.hash and std.checksum
>> * std.hash
>>
>> and the same with hash replaced by digest.
>> So which one should we use?
>
> The previous discussions on this resulted in us going with std.hash.md5,
> std.hash.sha1, and std.hash.crc32. I don't see any reason to change that, and
> crypto was specifically _not_ chosen, because crc32 isn't cryptographically
> sound. But std.hash encompasses things quite nicely, since they're all hashes.

IMHO crypto should be chosen because beside of hashes there are other cryptographic primitives (ciphers, PKI, MACs, etc.) and it would be nice to have them in one place. std.hash is too narrow because when std gets crypto there will be too many namespaces like std.ciphers, std.ssl, std.mac. All of them will nicely fit in std.crypto or similar.

As you can see crypto isn't good candidate for checksums so another package std.checksum is proposed. Likewise mixing checksums and cryptographic hashes under one namespace (std.hash) isn't a right choice IMO.

Having cryptographic primitives splitted to std.hash and std.crypto.* isn't a good choice either.

June 25, 2012

Re: New hash API: Update

Posted by Jonathan M Davis
in reply to Johannes Pfau

Jonathan M Davis

Posted in reply to Johannes Pfau

On Monday, June 25, 2012 11:30:41 Johannes Pfau wrote:
> Am Sun, 24 Jun 2012 21:40:53 +0200
> 
> schrieb Alex Rønne Petersen <alex@lycus.org>:
> > Also, most (if not all) Digest methods really should be pure nothrow. Same for some free functions (I think it's good practice to mark template functions as pure nothrow explicitly if you want to guarantee this).
> 
> I still don't understand pure on member functions completely. The this pointer is considered as a function parameter, right?
> 
> So even put, start, reset... can be pure as those produce the same result as long as they receive the same arguments and the same _this_ 'state'?
> 
> I tried to add pure to the interface, but I seems it's not a good idea right now. It doesn't work for the SHA1 implementation for example cause that uses memcpy.

_All_ that pure means by itself is that a function does not access any mutable global or static variables and that it does not call any functions which are not pure. It means _nothing_ more. There are _zero_ guarantees that the function will return the same result or anything like that. It's what we sometimes term a weakly pure function. That's all pure is by itself.

A strongly pure function, on the other hand, is a pure function whose arguments are all either immutable or implicitly convertible to immutable. And because that guarantees that the function's arguments won't be mutated, and because you have the guarantee that that function and any function it calls cannot access any state which isn't passed to it via those immutable arguments or which it creates itself, the compiler can guarantee that calling that function with the same arguments multiple times will result in the same return value.

So, in general, _all_ that you're really guaranteeing when you mark a function pure is that it's not accessing mutable global or static state either directly or via any functions that it calls. There are then cases where the compiler can generate improved guarantees based on that information (e.g. when the function is strongly pure), but that's arguably a detail of the compiler's optimizer.

David Nadlinger wrote up an article on it not to long ago which may help clarify matters (though I still haven't gotten around to reading it, so I don't really know how good it is), so you may want to check that out:

http://klickverbot.at/blog/2012/05/purity-in-d/

- Jonathan M Davis

June 25, 2012

Re: New hash API: Update

Posted by Alex Rønne Petersen
in reply to Johannes Pfau

Alex Rønne Petersen

Posted in reply to Johannes Pfau

On 25-06-2012 11:13, Johannes Pfau wrote:
> Am Sun, 24 Jun 2012 17:58:47 -0700
> schrieb Jonathan M Davis <jmdavisProg@gmx.com>:
>
>> On Sunday, June 24, 2012 17:23:18 Johannes Pfau wrote:
>>> I'm mostly finished with my hash API proposal. I also ported the
>>> existing crc, md5 and the proposed sha1 hash to this new API.
>>>
>>> I changed the namespace to std.util.digest. Andrei once said he
>>> thinks std.digest/std.hash is a too narrow package and someone else
>>> said putting crc into std.crypto.digest is ridiculous. So I did
>>> what tango and other libraries do and created a std.util module.
>>>
>>> I think std.uuid would also fit well into std.util so it'd become
>>> std.util.uuid.
>>
>> No, no, no, no, no. util is _useless_ as a name. _Everything_ in
>> Phobos is a utiliity of one sort or another. Just leave it as
>> std.hash and std.uuid.
>
> OK, OK I'm convinced.
>
>
>>
>>> Here's the documentation:
>>> http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_digest.html
>>> http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_crc.html
>>> http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_md5.html
>>> http://dl.dropbox.com/u/24218791/d/phobos/std_util_digest_sha.html
>>>
>>> And here's a pull request for the code:
>>> https://github.com/D-Programming-Language/phobos/pull/646
>>>
>>> Github branch:
>>> https://github.com/jpf91/phobos/tree/newHash
>>
>> I'll have to look it over later, but this is enough of a change, that
>> I suspect that a proper review cycle is order rather than simply
>> making the tweaks and creating a pull request for it.
>
> Yeah probably. We really should disable the new std.crc32 then, though.

That, or make sure that your changes get into 2.060 (the changes aren't huge, so reviewing them shouldn't take /that/ long). But on the other hand, people seem to really want to get 2.060 out the door ASAP, so I don't really know...

In any case, just comment std.hash.crc32 in the makefiles and remove the deprecation label in the top-level crc32 module (yes, really, it's not even in std...).

>
>>
>>> The table and the one line of code is also available as public
>>> domain code here:
>>> http://www.csbruce.com/~csbruce/software/crc32.c
>>>
>>> So I think it should be possible to change the license to boost?
>>
>> As long as the only parts that are left of the original with the
>> non-Boost license are publicly available, I don't see any reason why
>> we couldn't put a Boost license on the version. Ideally, we would
>> have _no_ licenses other than Boost in Phobos. The only reason that
>> we do is due to old D1 code when Walter was doing most of it and the
>> contributor situation was very different.
>>
> Great, I'll change that then :-)
>
>

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org

June 25, 2012

Re: New hash API: namespace

Posted by Jacob Carlborg
in reply to Piotr Szturmaj

Jacob Carlborg

Posted in reply to Piotr Szturmaj

On 2012-06-25 12:24, Piotr Szturmaj wrote:
> Jonathan M Davis wrote:
>> On Monday, June 25, 2012 11:35:33 Johannes Pfau wrote:
>>> OK, so I understand std.util is probably not a good idea.
>>>
>>> So the candidates for the namespace are:
>>> * std.crypto.hash
>>> * std.checksum
>>> * std.crypto.hash and std.checksum
>>> * std.hash
>>>
>>> and the same with hash replaced by digest.
>>> So which one should we use?
>>
>> The previous discussions on this resulted in us going with std.hash.md5,
>> std.hash.sha1, and std.hash.crc32. I don't see any reason to change
>> that, and
>> crypto was specifically _not_ chosen, because crc32 isn't
>> cryptographically
>> sound. But std.hash encompasses things quite nicely, since they're all
>> hashes.
>
> IMHO crypto should be chosen because beside of hashes there are other
> cryptographic primitives (ciphers, PKI, MACs, etc.) and it would be nice
> to have them in one place. std.hash is too narrow because when std gets
> crypto there will be too many namespaces like std.ciphers, std.ssl,
> std.mac. All of them will nicely fit in std.crypto or similar.
>
> As you can see crypto isn't good candidate for checksums so another
> package std.checksum is proposed. Likewise mixing checksums and
> cryptographic hashes under one namespace (std.hash) isn't a right choice
> IMO.
>
> Having cryptographic primitives splitted to std.hash and std.crypto.*
> isn't a good choice either.

Can't we have two namespaces, one for checksums and one for the rest. Or one for cryptographically safe primitives and one for the rest.

Is there a general enough name to fit all these into one namespace?

-- 
/Jacob Carlborg

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation