View mode: basic / threaded / horizontal-split · Log in · Help
August 08, 2012
Re: The review of std.hash package
"Regan Heath" , dans le message (digitalmars.D:174462), a écrit :
> "Message-Digest Algorithm" is the proper term, "hash" is another, correct,  
> more general term.
> 
> "hash" has other meanings, "Message-Digest Algorithm" does not.

I think the question is: is std.hash going to contain only 
message-digest algorithm, or could it also contain other hash functions?
I think there is enough room in a package to have both message-digest 
algorithm and other kinds of hash functions.
August 08, 2012
Re: The review of std.hash package
On 8/8/12 8:54 AM, Regan Heath wrote:
> "Hash" has too many meanings, we should avoid it.

Yes please.

Andrei
August 08, 2012
Re: The review of std.hash package
On Wednesday, 8 August 2012 at 13:38:26 UTC, 
travert@phare.normalesup.org (Christophe Travert) wrote:
> I think the question is: is std.hash going to contain only
> message-digest algorithm, or could it also contain other hash 
> functions?
> I think there is enough room in a package to have both 
> message-digest
> algorithm and other kinds of hash functions.

Even if that were the case, I'd say they should be kept separate. 
Cryptographic hash functions serve extremely different purposes 
from regular hash functions. There is no reason they should be 
categorized the same.
August 08, 2012
Re: The review of std.hash package
On Wed, 08 Aug 2012 14:50:22 +0100, Chris Cain <clcain@uncg.edu> wrote:

> On Wednesday, 8 August 2012 at 13:38:26 UTC,  
> travert@phare.normalesup.org (Christophe Travert) wrote:
>> I think the question is: is std.hash going to contain only
>> message-digest algorithm, or could it also contain other hash functions?
>> I think there is enough room in a package to have both message-digest
>> algorithm and other kinds of hash functions.
>
> Even if that were the case, I'd say they should be kept separate.  
> Cryptographic hash functions serve extremely different purposes from  
> regular hash functions. There is no reason they should be categorized  
> the same.

I don't think there is any reason to separate them.  People should know  
which digest algorithm they want, they're not going to pick one at random  
and assume it's "super secure!"(tm).  And if they do, well tough, they  
deserve what they get.

"std.digest" can encompass all message digest algorithms, whether secure  
or not.

We could create a 2nd level below "secure" or "crypto" or similar if we  
really want, but I don't see much point TBH.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
August 08, 2012
Re: The review of std.hash package
"Chris Cain" , dans le message (digitalmars.D:174466), a écrit :
> On Wednesday, 8 August 2012 at 13:38:26 UTC, 
> travert@phare.normalesup.org (Christophe Travert) wrote:
>> I think the question is: is std.hash going to contain only
>> message-digest algorithm, or could it also contain other hash 
>> functions?
>> I think there is enough room in a package to have both 
>> message-digest
>> algorithm and other kinds of hash functions.
> 
> Even if that were the case, I'd say they should be kept separate. 
> Cryptographic hash functions serve extremely different purposes 
> from regular hash functions. There is no reason they should be 
> categorized the same.

They should not be categorized the same. I don't expect a regular hash 
function to pass the isDigest predicate. But they have many 
similarities, which explains they are all called hash functions. There 
is enough room in a package to put several related concepts!

Here, we have a package for 4 files, with a total number of line that is 
about one third of the single std.algorithm file (which is probably too 
big, I conceed). There aren't hundreds of message-digest functions to 
add here.

If it where me, I would have the presently reviewed module std.hash.hash 
be called std.hash.digest, and leave room here for regular hash 
functions. In any case, I think regular hash HAVE to be in a std.hash 
module or package, because people looking for a regular hash function 
will look here first.
August 08, 2012
Re: The review of std.hash package
Am Wed, 08 Aug 2012 02:49:00 -0700
schrieb Walter Bright <newshound2@digitalmars.com>:

> 
> It should accept an input range. But using an Output Range confuses
> me. A hash function is a reduce algorithm - it accepts a sequence of
> input values, and produces a single value. You should be able to
> write code like:
> 
>    ubyte[] data;
>    ...
>    auto crc = data.crc32();

auto crc = crc32Of(data);
auto crc = data.crc32Of(); //ufcs

This doesn't wok with every InputRange and this needs to be fixed.
That's a quite simple fix (max 10 lines of code, one new overload) and
not a inherent problem of the API (see below for more).

> 
> For example, the hash example given is:
> 
>    foreach (buffer; file.byChunk(4096 * 1024))
>        hash.put(buffer);
>    auto result = hash.finish();
> 
> Instead it should be something like:
> 
>    auto result = file.byChunk(4096 * 1025).joiner.hash();

But it also says this:
//As digests implement OutputRange, we could use std.algorithm.copy
//Let's do it manually for now

You can basically do this with a range interface in 1 line:
----
import std.algorithm : copy;

auto result = copy(file.byChunk(4096 * 1024), hash).finish();
----
or with ufcs:
----
auto result = file.byChunk(4096 * 1024).copy(hash).finish();
----

OK, you have to initialize hash and you have to call finish. With a new
overload for digest it's as simple as this:
----
auto result = file.byChunk(4096 * 1024).digest!CRC32();
auto result = file.byChunk(4096 * 1024).crc32Of(); //with alias
----

The digests are OutputRanges, you can write data to them. There's
absolutely no need to make them InputRanges as they only produce 1
value, and the hash sum is produced at once, so there's no way to
receive the result in a partial way. A digest is very similar to
Appender and it's .data property in this regard.

The put function could accept an InputRange, but I think there was a
thread recently which said this is evil for OutputRanges as the same
feature can be achieved with copy.

There's also no big benefit in doing it that way. If your InputRange is
really unbuffered you could avoid double buffering. But then you
transfer data byte by byte which will be horribly slow.
If your InputRange has an internal buffer copy should just copy from
that internal buffer to the 64 byte buffer used inside the digest
implementation.
This double buffering could only be avoided if the put function
accepted an InputRange and could supply a buffer for that InputRange so
the InputRange could write directly into the 64 byte buffer. But
there's nothing like that in phobos, so this is all speculation.

(Also the internal buffer is only used for the first 64 bytes (or less)
of the supplied data. The rest is processed without copying. It could
probably be optimized so that there's absolutely no copying as long as
the input buffer length is a multiple of 64)

> 
> The magic is that any input range that produces bytes could be used,
> and that byte producing input range can be hooked up to the input of
> any reducing function.
See above. Every InputRange with byte element type does work. You just
have to use copy.

> 
> The use of a member finish() is not what any other reduce algorithm
> has, and so the interface is not a general component interface.

It's a struct with state, not a simple reduce function so it needs that
finish member. It works like that way in every other language (and this
is not cause those languages don't have ranges; streams and iterators
(as in C#) work exactly the same in this case).

Let's take a real world example: You want to download a huge file with
std.net.curl and hash it on the fly. Completely reading into a buffer
is not possible (large file!). Now std.net.curl has a callback
interface (which is forced on us by libcurl). How would you map that
into an InputRange? (The byLine range in std.net.curl is eager,
byLineAsync needs an additional thread). A newbie trying to do that
will despair as it would work just fine in every other language, but
D forces that InputRange interface.

Implementing it as an OutputRange is much better. The described
scenario works fine and hashing an InputRange also works fine - just
use copy. OutputRange is much more universal for this usecase.

However, I do agree digest!Hash, md5Of, sha1Of should have an additional
overload which takes a InputRange. It would be implemented with copy
and be a nice convenience function.

> 
> I know the documentation on ranges in Phobos is incomplete and
> confusing.

Especially for copy, as the documentation doesn't indicate the line I
posted could work in any way ;-)
August 08, 2012
Re: The review of std.hash package
Am Wed, 08 Aug 2012 11:27:49 +0200
schrieb Piotr Szturmaj <bncrbme@jadamspam.pl>:

> > BTW: How does it work in CTFE? Don't you have to do endianness
> > conversions at some time? According to Don that's not really
> > supported.
> 
> std.bitmanip.swapEndian() works for me

Great! I always tried the *endianToNative and nativeTo*Endian functions.
So I didn't expect swapEndian to work.
> 
> > Another problem with prevents CTFE for my proposal would be that the
> > internal state is currently implemented as an array of uints, but
> > the API uses ubyte[] as a return type. That sort of reinterpret
> > cast is not supposed to work in CTFE though. I wonder how you
> > avoided that issue?
> 
> There is set of functions that abstract some operations to work with 
> CTFE and at runtime: 
> https://github.com/pszturmaj/phobos/blob/master/std/crypto/hash/base.d#L66. 
> Particularly memCopy().

I should definitely look at this later. Would be great if hashes worked
in CTFE.

> > And another problem is that void[][] (as used in the 'digest'
> > function) doesn't work in CTFE (and it isn't supposed to work). But
> > that's a problem specific to this API.
> 
> Yes, that's why I use ubyte[].
But then you can't even hash a string in CTFE. I wanted to special case
strings, but for various reasons it didn't work out in the end.
> 
> I don't think std.typecons.scoped is cumbersome:
> 
> auto sha = scoped!SHA1(); // allocates on the stack
> auto digest = sha.digest("test");

Yes I'm not sure about this. But a class only based interface probably
hasn't high chances of being accepted into phobos. And I think the
struct interface+wrappers approach isn't bad.

> 
> Why I think classes should be supported is the need of polymorphism.
And ABI compatibility and switching the backend (OpenSSL, native D,
windows crypto) at runtime. I know it's very useful, this is why we
have the OOP api. It's very easy to wrap the OOP api onto the struct
api. These are the implementations of MD5Digest, CRC32Digest and
SHA1Digest:

alias WrapperDigest!CRC32 CRC32Digest;
alias WrapperDigest!MD5 MD5Digest;
alias WrapperDigest!SHA1 SHA1Digest;

with the support code in std.hash.hash 1LOC is enough to implement the
OOP interface if a struct interface is available, so I don't think
maintaining two APIs is a problem.

A bigger problem is that the real implementation must be the struct
interface, so you can't use polymorphism there. I hope alias this is
enough.
August 08, 2012
OT: scrypt
Am Wed, 08 Aug 2012 11:27:49 +0200
schrieb Piotr Szturmaj <bncrbme@jadamspam.pl>:

> 
> Yes, there should be bcrypt, scrypt and PBKDF2.

Wow, I didn't know about scrypt. Seems to be pretty cool.
August 08, 2012
Re: The review of std.hash package
On Wednesday, 8 August 2012 at 14:14:29 UTC, Regan Heath wrote:
> I don't think there is any reason to separate them.  People 
> should know which digest algorithm they want, they're not going 
> to pick one at random and assume it's "super secure!"(tm).  And 
> if they do, well tough, they deserve what they get.

In this case, I'm not suggesting keep them separate to not 
confuse those who don't know better. They're simply disparate in 
actual use.

What do you use a traditional hash function for? Usually to turn 
a large multibyte stream into some finite size so that you can 
use a lookup table or maybe to decrease wasted time in 
comparisons.

What do you use a cryptographic hash function for? Almost always 
it's to verify the integrity of some data (usually files) or 
protect the original form from prying eyes (passwords ... though, 
there are better approaches for that now).

You'd _never_ use a cryptographic hash function in place of a 
traditional hash function and vice versa because they designed 
for completely different purposes. At a cursory glance, they bare 
only one similarity and that's the fact that they turn a big 
chunk of data into a smaller form that has a fixed size.

On Wednesday, 8 August 2012 at 14:16:40 UTC, 
travert@phare.normalesup.org (Christophe Travert) wrote:
> function to pass the isDigest predicate. But they have many
> similarities, which explains they are all called hash 
> functions. There
> is enough room in a package to put several related concepts!

Crytographic hash functions are also known as "one-way 
compression functions." They also have similarities to file 
compression algorithms. After all, both of them turn large files 
into smaller data. However, the actual use of them is completely 
different and you wouldn't use one in place of the other. I 
wouldn't put the Burrows-Wheeler transform in the same package.



It's just my opinion of course, but I just feel it wouldn't be 
right to intermingle normal hash functions and cryptographic hash 
functions in the same package. If we had to make a compromise and 
group them with something else, I'd really like to see 
cryptographic hash functions put in the same place we'd put other 
cryptography (such as AES) ... in a std.crypto package. But 
std.digest is good if they can exist in their own package.


It also occurs to me that a lot of people are confounding 
cryptographic hash functions and normal hash functions enough 
that they think that a normal hash function has a "digest" ... 
I'm 99% sure that's exclusive to the cryptographic hash functions 
(at least, I've never heard of a normal hash function producing a 
digest).
August 08, 2012
Re: The review of std.hash package
Am Wed, 8 Aug 2012 17:50:33 +0200
schrieb Johannes Pfau <nospam@example.com>:

> However, I do agree digest!Hash, md5Of, sha1Of should have an
> additional overload which takes a InputRange. It would be implemented
> with copy and be a nice convenience function.

I implemented the function, it's actually quite simple:
----
digestType!Hash digestRange(Hash, Range)(Range data) if(isDigest!Hash &&
   isInputRange!Range && __traits(compiles,
   digest!Hash(ElementType!(Range).init)))
{
   Hash hash;
   hash.start();
   copy(data, hash);
   return hash.finish();
}
----

but I don't know how make it an overload. See thread "overloading a
function taking a void[][]" in D.learn for details.
1 2 3 4 5 6 7 8
Top | Discussion index | About this forum | D home