August 15, 2012
On Wed, Aug 15, 2012 at 2:40 AM, RivenTheMage <riven-mage@id.ru> wrote:
> Another example is a systematic error-correcting codes. The "only" difference between them and checksums is the ability to correct errors, not just detect them. CRC or MD5 can be viewed as systematic code with zero error-correcting ability.
>
> Should we mix Reed-Solomon codes and MD5 in one module? I don't think so.

Some people's point is that MD5 was consider a cryptographic digest function 16 years ago. It is not consider cryptographically secure today. So why make any design assumption today on how the landscape will look tomorrow? Specially on a field that is always changing. Why not lumped them all together and explain the current situation and recommendation in the comments.

Looks at Python's passlib module for example. They enumerate every
password encoding scheme under the sun (except for scrypt :() and give
a recommendation on the appropriate algorithm to use in the current
computing landscape.
http://packages.python.org/passlib/lib/passlib.hash.html#module-passlib.hash

Thanks,
-Jose
August 15, 2012
On Wednesday, 15 August 2012 at 14:36:00 UTC, José Armando García Sancio wrote:
> Some people's point is that MD5 was consider a cryptographic digest
> function 16 years ago. It is not consider cryptographically secure
> today. So why make any design assumption today on how the landscape
> will look tomorrow? Specially on a field that is always changing. Why
> not lumped them all together and explain the current situation and
> recommendation in the comments.
>
> Looks at Python's passlib module for example. They enumerate every
> password encoding scheme under the sun (except for scrypt :() and give
> a recommendation on the appropriate algorithm to use in the current
> computing landscape.
> http://packages.python.org/passlib/lib/passlib.hash.html#module-passlib.hash
>
> Thanks,
> -Jose

I agree that MD5 isn't cryptographically secure anymore, but it was designed as a cryptographic hash algorithm, and it shows. It's statistical and performance proprieties are completely different from CRCs, and no matter how broken, it still has a little of cryptographic strength (no practical preimage attack was found till this date, for example).

Note that in the Python passlib, there is no mention to CRC, FNV, ROT13, etc. Their place is different.
August 15, 2012
On 15-Aug-12 12:45, Kagamin wrote:
> On Wednesday, 15 August 2012 at 08:25:51 UTC, Dmitry Olshansky wrote:
>> Brrr. It's how convenience wrapper works :)
>>
>> And I totally expect this to call the same code and keep the same
>> state during the work.
>>
>> E.g. see std.digest.digest functions digest or hexDigest you could
>> call it stateless in the same vane.
>
> Well there was a wish for stateless hash, Walter even posted the
> required interface:
> auto result = file.byChunk(4096 * 1025).joiner.hash();

auto result = file.byChunk(4096 * 1025).joiner.digest();

and is already supported in the proposal, peek at updated docs.

There is no need for additional methods and whatnot.

-- 
Olshansky Dmitry
August 15, 2012
On Wed, Aug 15, 2012 at 8:11 AM, ReneSac <reneduani@yahoo.com.br> wrote:
>
> Note that in the Python passlib, there is no mention to CRC, FNV, ROT13, etc. Their place is different.

Thats because it is a "password module" and nobody or a small percentage of the population uses CRC for password digest. Note that the Python passlib module also has archaic plaintext encodings mainly for interacting with legacy systems.

The basic point is that std.digest/std.hash (whatever people decide) should probably just have generic digesting algorithm. The user can decided which one to use given their requirements. Also, it would be beneficial if the module also includes a section where it recommends digest based on the current landscape of computing. High-level documentation and suggestions are easy to change; APIs are not.

Thanks,
-Jose
August 16, 2012
On Wednesday, 15 August 2012 at 19:38:34 UTC, José Armando
García Sancio wrote:

> Thats because it is a "password module" and nobody or a small
> percentage of the population uses CRC for password digest.

In turn, that's because CRC is not not a crytographic hash and
not suited for password hashing :)

> The basic point is that std.digest/std.hash (whatever people decide) should probably just have generic digesting algorithm.

Generic digesting algorithm should probably go into std.algorithm.

It could be used like that:

------------
import std.algorithm;
import std.checksum;
import std.crypto.mdc;

ushort num = 1234;
auto hash1 = hash!("(a >>> 20) ^ (a >>> 12) ^ (a >>> 7) ^ (a >>>
4) ^ a")(str); // indexing hash

string str = "abcd";
auto hash3 = hash!(CRC32)(str); // checksum
auto hash2 = hash!(MD5)(str); // crytographic hash
------------

CRC32 and MD5 are ranges and/or classes, derived from
HashAlgorithm interface.
August 16, 2012
On Thursday, 16 August 2012 at 03:02:59 UTC, RivenTheMage wrote:

> ushort num = 1234;
> auto hash1 = hash!("(a >>> 20) ^ (a >>> 12) ^ (a >>> 7) ^ (a >>>
> 4) ^ a")(str); // indexing hash

I forgot that this case is already covered by reduce!(...)
August 16, 2012
Le 09/08/2012 11:48, Johannes Pfau a écrit :
> Am Wed, 08 Aug 2012 12:31:29 -0700
> schrieb Walter Bright<newshound2@digitalmars.com>:
>
>> On 8/8/2012 12:14 PM, Martin Nowak wrote:
>>> That hardly works for event based programming without using
>>> coroutines. It's the classical inversion-of-control dilemma of
>>> event based programming that forces you to save/restore your state
>>> with every event.
>>
>> See the discussion on using reduce().
>>
>
> I just don't understand it. Let's take the example by Martin Nowak and
> port it to reduce: (The code added as comments is the same code for
> hashes, working with the current API)
>
> int state; //Hash state;
>
> void onData(void[] data)
> {
>       state = reduce(state, data); //copy(data, state);
>       //state = copy(data, state); //also valid, but not necessary
>       //state.put(data); //simple way, doesn't work for ranges
> }
>
> void main()
> {
>       state = 0; //state.start();
>       auto stream = new EventTcpStream("localhost", 80);
>       stream.onData =&onData;
>       //auto result = hash.finish();
> }
>
> There are only 2 differences:
>
> 1:
> the order of the arguments passed to copy and reduce is swapped. This
> kinda makes sense (if copy is interpreted as copyTo). Solution: Provide
> a method copyInto with swapped arguments if consistency is really so
> important.
>
> 2:
> We need an additional call to finish. I can't say it often enough, I
> don't see a sane way to avoid it. Hashes work on blocks, if you didn't
> pass enough data finish will have to fill the rest of the block with
> zeros before you can get the hash value. This operation can't be
> undone. To get a valid result with every call to copy, you'd have to
> always call finish. This is
> * inefficient, you calculate intermediate values you don't need at all
> * you have to copy the hashes state, as you can't continue hashing
>    after finish has been called
>
> and both, the state and the result would have to fit into the one value
> (called seed for reduce). But then it's still not 100% consistent, as
> reduce will return a single value, not some struct including internal
> state.

I'm pretty sure it is possible to pad and finish when a result is required without messing up the internal state.
August 17, 2012
On Thu, 16 Aug 2012 21:25:55 +0100, deadalnix <deadalnix@gmail.com> wrote:

> Le 09/08/2012 11:48, Johannes Pfau a écrit :
>> Am Wed, 08 Aug 2012 12:31:29 -0700
>> schrieb Walter Bright<newshound2@digitalmars.com>:
>>
>>> On 8/8/2012 12:14 PM, Martin Nowak wrote:
>>>> That hardly works for event based programming without using
>>>> coroutines. It's the classical inversion-of-control dilemma of
>>>> event based programming that forces you to save/restore your state
>>>> with every event.
>>>
>>> See the discussion on using reduce().
>>>
>>
>> I just don't understand it. Let's take the example by Martin Nowak and
>> port it to reduce: (The code added as comments is the same code for
>> hashes, working with the current API)
>>
>> int state; //Hash state;
>>
>> void onData(void[] data)
>> {
>>       state = reduce(state, data); //copy(data, state);
>>       //state = copy(data, state); //also valid, but not necessary
>>       //state.put(data); //simple way, doesn't work for ranges
>> }
>>
>> void main()
>> {
>>       state = 0; //state.start();
>>       auto stream = new EventTcpStream("localhost", 80);
>>       stream.onData =&onData;
>>       //auto result = hash.finish();
>> }
>>
>> There are only 2 differences:
>>
>> 1:
>> the order of the arguments passed to copy and reduce is swapped. This
>> kinda makes sense (if copy is interpreted as copyTo). Solution: Provide
>> a method copyInto with swapped arguments if consistency is really so
>> important.
>>
>> 2:
>> We need an additional call to finish. I can't say it often enough, I
>> don't see a sane way to avoid it. Hashes work on blocks, if you didn't
>> pass enough data finish will have to fill the rest of the block with
>> zeros before you can get the hash value. This operation can't be
>> undone. To get a valid result with every call to copy, you'd have to
>> always call finish. This is
>> * inefficient, you calculate intermediate values you don't need at all
>> * you have to copy the hashes state, as you can't continue hashing
>>    after finish has been called
>>
>> and both, the state and the result would have to fit into the one value
>> (called seed for reduce). But then it's still not 100% consistent, as
>> reduce will return a single value, not some struct including internal
>> state.
>
> I'm pretty sure it is possible to pad and finish when a result is required without messing up the internal state.

Without copying it?  AFAICR padding/finishing mutates the state, I mean, that's the whole point of it.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
August 20, 2012
Changelog:
* moved the package to std.digest:
    std.hash.hash --> std.digest.digest
    std.hash.md   --> std.digest.md
    std.hash.sha  --> std.digest.sha
    std.hash.crc  --> std.digest.crc

* make sure the docs are consistent regarding names (digest vs. hash)


Code: (location changed!) https://github.com/jpf91/phobos/tree/newHash/std/digest https://github.com/jpf91/phobos/compare/master...newHash

Docs: (location changed!) http://dl.dropbox.com/u/24218791/d/phobos/std_digest_digest.html http://dl.dropbox.com/u/24218791/d/phobos/std_digest_md.html http://dl.dropbox.com/u/24218791/d/phobos/std_digest_sha.html http://dl.dropbox.com/u/24218791/d/phobos/std_digest_crc.html


August 29, 2012
All this discussion on the use of auto in the docs made me notice something else about the docs I missed.

I like how ranges are documented and think digest could do the same. Instead of an ExampleDigest, just write the details under isDigest.

I don't see a need for template the constraint example (D idiom).

This would require changing examples which use ExampleDigest, but maybe that should happen anyway since it doesn't exist.

I don't see a reason to change my vote because of this, its all documentation.