August 08, 2012
Am Wed, 8 Aug 2012 14:16:40 +0000 (UTC)
schrieb travert@phare.normalesup.org (Christophe Travert):

> If it where me, I would have the presently reviewed module std.hash.hash be called std.hash.digest, and leave room here for regular hash functions. In any case, I think regular hash HAVE to be in a std.hash module or package, because people looking for a regular hash function will look here first.
> 
> 

std.hash.digest doesn't sound too bad. We could have std.hash.func (or a better named module ;-) for general hash functions later.
August 08, 2012
On 08-Aug-12 18:16, Christophe Travert wrote:
> "Chris Cain" , dans le message (digitalmars.D:174466), a écrit :
>> On Wednesday, 8 August 2012 at 13:38:26 UTC,
>> travert@phare.normalesup.org (Christophe Travert) wrote:
>>> I think the question is: is std.hash going to contain only
>>> message-digest algorithm, or could it also contain other hash
>>> functions?
>>> I think there is enough room in a package to have both
>>> message-digest
>>> algorithm and other kinds of hash functions.
>>
>> Even if that were the case, I'd say they should be kept separate.
>> Cryptographic hash functions serve extremely different purposes
>> from regular hash functions. There is no reason they should be
>> categorized the same.
>
> They should not be categorized the same. I don't expect a regular hash
> function to pass the isDigest predicate. But they have many
> similarities, which explains they are all called hash functions. There
> is enough room in a package to put several related concepts!
>

You still can use say crc32 as normal hash function for some binary object. The notions are not as desperate as some designers would want them to be.

> Here, we have a package for 4 files, with a total number of line that is
> about one third of the single std.algorithm file (which is probably too
> big, I conceed). There aren't hundreds of message-digest functions to
> add here.

I'd rather see clean by family separation, as importing one huge digest module only to use SHA is kind of creepy. On the other hand as all of code is templated it's not a big deal.
>
> If it where me, I would have the presently reviewed module std.hash.hash
> be called std.hash.digest, and leave room here for regular hash
> functions. In any case, I think regular hash HAVE to be in a std.hash
> module or package, because people looking for a regular hash function
> will look here first.
>

I thing concerns me: if incremental digest hashes are all in one module
what are the (would be) other modules in std.hash?


-- 
Dmitry Olshansky
August 08, 2012
On 08-Aug-12 21:00, Dmitry Olshansky wrote:
> On 08-Aug-12 18:16, Christophe Travert wrote:

>> They should not be categorized the same. I don't expect a regular hash
>> function to pass the isDigest predicate. But they have many
>> similarities, which explains they are all called hash functions. There
>> is enough room in a package to put several related concepts!
>>
>
> You still can use say crc32 as normal hash function for some binary
> object. The notions are not as desperate as some designers would want
> them to be.

Damned spellcheckers: desperate -> disparate

-- 
Dmitry Olshansky
August 08, 2012
"Chris Cain" , dans le message (digitalmars.D:174477), a écrit :

I think you misunderstood me (and it's probably my fault, since I don't know much of hash functions), I was wanted to compare two kind of concepts:

1/ message digest functions, like md5, or sha1, used on large files,
which is what is covered by this std.hash proposal.
2/ small hash function. Like what are use in an associative array, and
are called toHash when used a member function.

And I didn't thought of:
3/ cryptographic hash functions

My opinion was that in a module or package called hash, I expect tools concerning #2. But #1 and #2 can coexist in the same package. The proposed std.hash.hash defines a digest concept for #1. That's why I would rather have it named std.hash.digest, leaving room in the hash package to other concepts, like small hash functions that can be used in associative arrays (#2).

I don't know the difference between #1 and #3, so I can't tell if they should share a common package. In anycase, I think putting #3 be in a crypto package makes sense.

Having 3 different packages seems too much to me. #1 is too restricted to be a whole package IMHO, and should be along #2 or #3.

-- 
Christophe
August 08, 2012
Johannes Pfau , dans le message (digitalmars.D:174478), a écrit :
> but I don't know how make it an overload. See thread "overloading a function taking a void[][]" in D.learn for details.

Don't overload the function taking a void[][]. Remplace it. void[][] is a range of void[].
August 08, 2012
On Wed, 08 Aug 2012 18:33:01 +0100, Christophe Travert <travert@phare.normalesup.org> wrote:

> "Chris Cain" , dans le message (digitalmars.D:174477), a écrit :
>
> I think you misunderstood me (and it's probably my fault, since I don't
> know much of hash functions), I was wanted to compare two kind of
> concepts:
>
> 1/ message digest functions, like md5, or sha1, used on large files,
> which is what is covered by this std.hash proposal.
> 2/ small hash function. Like what are use in an associative array, and
> are called toHash when used a member function.
>
> And I didn't thought of:
> 3/ cryptographic hash functions
>
> My opinion was that in a module or package called hash, I expect tools
> concerning #2. But #1 and #2 can coexist in the same package. The
> proposed std.hash.hash defines a digest concept for #1. That's why I
> would rather have it named std.hash.digest, leaving room in the hash
> package to other concepts, like small hash functions that can be used in
> associative arrays (#2).
>
> I don't know the difference between #1 and #3, so I can't tell if they
> should share a common package. In anycase, I think putting #3 be in a
> crypto package makes sense.
>
> Having 3 different packages seems too much to me. #1 is too
> restricted to be a whole package IMHO, and should be along #2 or #3.

Here is a perfect example of why we need to avoid using "hash", it has too many meanings to different people.

I suggest:

std.digest <- cryptographic "hash" algorithms
std.crc    <- crc "hash" algorithms
std.uuid   <- identity "hash" algorithms

This is assuming we cannot have more levels of depth in the package/module tree, otherwise you could group them all under the package "hash":

std.hash.digest
std.hash.crc
std.hash.uuid

Some people are going to argue it should be:

std.crypto.digest or..
std.crypto.hash

But that leads us to something like:

std.crypto.hash
std.crc.hash
std.uuid.hash

And that seems back-to-front to me, and more importantly would assume/suggest/require we have more packages to put in std.crc and std.uuid, which I suspect we wont.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
August 08, 2012
On Wednesday, 8 August 2012 at 17:33:01 UTC, travert@phare.normalesup.org (Christophe Travert) wrote:
> I think you misunderstood me (and it's probably my fault, since I don't
> know much of hash functions), I was wanted to compare two kind of
> concepts:
>
> 1/ message digest functions, like md5, or sha1, used on large files,
> which is what is covered by this std.hash proposal.
> 2/ small hash function. Like what are use in an associative array, and
> are called toHash when used a member function.
>
> And I didn't thought of:
> 3/ cryptographic hash functions

Actually, maybe I'm the one not doing a good job of explaining.

1 and 3 are the same things (what you're calling "message digest" functions are cryptographic hash functions). I'm saying that even though similar in name, cryptographic hash functions really can't (IMO, I suppose I should make clear) be put in the same place as normal hash functions because they barely have anything in common. You can't use on in the place of another nor are they really used in similar manners.

> My opinion was that in a module or package called hash, I expect tools
> concerning #2.

I agree. I'd think similarly (I'd assume std.hash has something to do with hash tables or hash functions used for hash tables).


If I were looking to use a cryptographic hash function like SHA1 or (eh) MD5, I'd look for std.crypto first, and probably pick std.digest if I saw that. As a last resort I'd look in std.hash and vomit profusely after seeing it grouped with the "times 33" hash.
August 08, 2012
On 8/8/2012 3:12 AM, Piotr Szturmaj wrote:
> Walter Bright wrote:
>>
>>    auto result = file.byChunk(4096 * 1025).joiner.hash();
>>
>> The magic is that any input range that produces bytes could be used, and
>> that byte producing input range can be hooked up to the input of any
>> reducing function.
>
> Suppose you have a callback that will give you blocks of bytes to hash. Blocks
> of bytes come from a socket, but not a blocking one. Instead, socket uses
> eventing mechanism (libevent) to get notifications about its readiness.
>
> How would you use the hash API in this situation?

Have the callback supply a range interface to call the hash with.
August 08, 2012
On 8/8/2012 5:13 AM, Martin Nowak wrote:
>> It should accept an input range. But using an Output Range confuses me. A hash
>> function is a reduce algorithm - it accepts a sequence of input values, and
>> produces a single value. You should be able to write code like:
>>
>>    ubyte[] data;
>>    ...
>>    auto crc = data.crc32();
>>
>> For example, the hash example given is:
>>
>>    foreach (buffer; file.byChunk(4096 * 1024))
>>        hash.put(buffer);
>>    auto result = hash.finish();
>>
>> Instead it should be something like:
>>
>>    auto result = file.byChunk(4096 * 1025).joiner.hash();
>>
> I think sha1Of/digest!SHA1 should do this.
> It's also important to have a stateful hash implementation that can be updated
> incrementally, e.g. from a callback.

Take a look at the reduce function in http://dlang.org/phobos/std_algorithm.html#reduce

It has provision for an initial state that can be the current running total.

August 08, 2012
On Wednesday, August 08, 2012 18:12:23 Johannes Pfau wrote:
> Am Wed, 08 Aug 2012 11:27:49 +0200
> 
> schrieb Piotr Szturmaj <bncrbme@jadamspam.pl>:
> > > BTW: How does it work in CTFE? Don't you have to do endianness conversions at some time? According to Don that's not really supported.
> > 
> > std.bitmanip.swapEndian() works for me
> 
> Great! I always tried the *endianToNative and nativeTo*Endian functions. So I didn't expect swapEndian to work.

What's wrong with the *endianToNative and nativeTo*Endian functions? They work just fine as far as I know. swapEndian works too if you want it to use that, but there should be nothing wrong with the endian-specific ones.

- Jonathan M Davis