August 08, 2012
Walter Bright wrote:
>
>    auto result = file.byChunk(4096 * 1025).joiner.hash();
>
> The magic is that any input range that produces bytes could be used, and
> that byte producing input range can be hooked up to the input of any
> reducing function.

Suppose you have a callback that will give you blocks of bytes to hash. Blocks of bytes come from a socket, but not a blocking one. Instead, socket uses eventing mechanism (libevent) to get notifications about its readiness.

How would you use the hash API in this situation?
August 08, 2012
On Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:
>> > The std.hash package also includes:
>> I think "std.crypto" is a better name for the package. At first I
>> thought it contained an implementation of a Hash table.
>
> That doesn't fly, because crc32 is going to be in there, and while it's a hash, it's no good for cryptography.

std.digest then?

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
August 08, 2012
Le 08/08/2012 11:49, Walter Bright a écrit :
> On 8/8/2012 1:44 AM, Johannes Pfau wrote:
>> Am Tue, 07 Aug 2012 17:39:15 -0700
>> schrieb Walter Bright <newshound2@digitalmars.com>:
>>
>>> On 8/7/2012 10:39 AM, Dmitry Olshansky wrote:
>>>> std.hash.hash is a new module for Phobos defining an uniform
>>>> interface for hashes and checksums. It also provides some useful
>>>> helper functions to deal with this new API.
>>>
>>> The hash functions must use a Range interface, not a file interface.
>>>
>>> This is extremely important.
>>
>> I guess this is meant as a general statement and not specifically
>> targeted at my std.hash proposal?
>
> Both.
>
>> I'm a little confused as all hashes already are OutputRanges in my
>> proposal. It's probably not explicit enough in the documentation, but
>> it's mentioned in one example and in the documentation for 'put';
>
> It should accept an input range. But using an Output Range confuses me.
> A hash function is a reduce algorithm - it accepts a sequence of input
> values, and produces a single value. You should be able to write code like:
>
> ubyte[] data;
> ...
> auto crc = data.crc32();
>
> For example, the hash example given is:
>
> foreach (buffer; file.byChunk(4096 * 1024))
> hash.put(buffer);
> auto result = hash.finish();
>
> Instead it should be something like:
>
> auto result = file.byChunk(4096 * 1025).joiner.hash();
>
> The magic is that any input range that produces bytes could be used, and
> that byte producing input range can be hooked up to the input of any
> reducing function.
>
> The use of a member finish() is not what any other reduce algorithm has,
> and so the interface is not a general component interface.
>
> I know the documentation on ranges in Phobos is incomplete and confusing.
>
> I appreciate the effort and care you're putting into this.
>

That is a really good point. +1
August 08, 2012
I'm not familiar with hash functions in general.

I think the core of std.hash is the digest function:

digestType!Hash digest(Hash)(scope const(void[])[] data...)
if(isDigest!Hash)
{
    Hash hash;
    hash.start();
    foreach(datum; data)
        hash.put(cast(const(ubyte[]))datum);
    return hash.finish();
}

That seems to be too restrictive: you can only provide a void[][] or one or several void[], but you should be able to give it any range of void[] or of ubyte[] like:

auto dig = file.byChunk.digest!MD5;

That's the point of the range interface.

this can be done by templatizing the function, something like (untested):

template digest(Hash) if(isDigest!Hash)
{
  auto digest(R)(R data)
    if (isInputRange!R &&  is(ElementType!R : void[])
  {
    Hash hash;
    hash.start();
    data.copy(hash);
    return hash.finish();
  }
}

An interesting overload for range of single ubyte could be provided. This overload would fill a buffer of with data from this range, feed the hash, and start again.


August 08, 2012
On Wed, Aug 08, 2012 at 11:37:35AM +0100, Regan Heath wrote:
> On Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> 
> >On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:
> >>I think "std.crypto" is a better name for the package. At first I thought it contained an implementation of a Hash table.
> >
> >That doesn't fly, because crc32 is going to be in there, and while it's a hash, it's no good for cryptography.
> 
> std.digest then?
[...]

+1. I think std.hash is needlessly confusing (I thought it was another hashtable implementation until I read this thread more carefully).


T

-- 
Two wrongs don't make a right; but three rights do make a left...
August 08, 2012
On Wednesday, 8 August 2012 at 12:00:42 UTC, H. S. Teoh wrote:
> On Wed, Aug 08, 2012 at 11:37:35AM +0100, Regan Heath wrote:
>> On Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis
>> <jmdavisProg@gmx.com> wrote:
>> 
>> >On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:
>> >>I think "std.crypto" is a better name for the package. At first I
>> >>thought it contained an implementation of a Hash table.
>> >
>> >That doesn't fly, because crc32 is going to be in there, and while
>> >it's a hash, it's no good for cryptography.
>> 
>> std.digest then?
> [...]
>
> +1. I think std.hash is needlessly confusing (I thought it was another
> hashtable implementation until I read this thread more carefully).
>
>
> T

-1
std.digest let's me think of http://en.wikipedia.org/wiki/Digestion
digest is just not common if you mean hash in my cycles.

I didn't think of an hash table implementation, maybe you are spoiled by writing one at the moment? (no offence).





August 08, 2012
> It should accept an input range. But using an Output Range confuses me. A hash function is a reduce algorithm - it accepts a sequence of input values, and produces a single value. You should be able to write code like:
>
>    ubyte[] data;
>    ...
>    auto crc = data.crc32();
>
> For example, the hash example given is:
>
>    foreach (buffer; file.byChunk(4096 * 1024))
>        hash.put(buffer);
>    auto result = hash.finish();
>
> Instead it should be something like:
>
>    auto result = file.byChunk(4096 * 1025).joiner.hash();
>
I think sha1Of/digest!SHA1 should do this.
It's also important to have a stateful hash implementation that can be updated incrementally, e.g. from a callback.
August 08, 2012
On Wed, 08 Aug 2012 13:11:43 +0100, Tobias Pankrath <tobias@pankrath.net> wrote:

> On Wednesday, 8 August 2012 at 12:00:42 UTC, H. S. Teoh wrote:
>> On Wed, Aug 08, 2012 at 11:37:35AM +0100, Regan Heath wrote:
>>> On Tue, 07 Aug 2012 19:41:12 +0100, Jonathan M Davis
>>> <jmdavisProg@gmx.com> wrote:
>>>  >On Tuesday, August 07, 2012 15:31:57 Ary Manzana wrote:
>>> >>I think "std.crypto" is a better name for the package. At >>first I
>>> >>thought it contained an implementation of a Hash table.
>>> >
>>> >That doesn't fly, because crc32 is going to be in there, and >while
>>> >it's a hash, it's no good for cryptography.
>>>  std.digest then?
>> [...]
>>
>> +1. I think std.hash is needlessly confusing (I thought it was another
>> hashtable implementation until I read this thread more carefully).
>>
>>
>> T
>
> -1
> std.digest let's me think of http://en.wikipedia.org/wiki/Digestion

That's exactly what it's supposed to suggest.  The algorithm does digest the input (AKA message) and output .. something else :p

> digest is just not common if you mean hash in my cycles.

Like it or not, Digest is the correct term:
http://en.wikipedia.org/wiki/MD5
"The MD5 Message-Digest Algorithm .."

> I didn't think of an hash table implementation, maybe you are spoiled by writing one at the moment? (no offence).

"Hash" has too many meanings, we should avoid it.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
August 08, 2012
On Wednesday, 8 August 2012 at 12:55:04 UTC, Regan Heath wrote:

> Like it or not, Digest is the correct term:
> http://en.wikipedia.org/wiki/MD5
> "The MD5 Message-Digest Algorithm .."

You could have cited the hole sentence

> The MD5 Message-Digest Algorithm is a widely used cryptographic hash function

So at least this implies that hash function is the more general term here and the corresponding wiki article is named "hash function" and does not even mention digest.

>> I didn't think of an hash table implementation, maybe you are spoiled by writing one at the moment? (no offence).
>
> "Hash" has too many meanings, we should avoid it.

At least hash table does not use a different meaning of the term hash.

But I'm not that deep into it, I'd just say that digest is not clearly better than hash.
August 08, 2012
On Wed, 08 Aug 2012 14:03:32 +0100, Tobias Pankrath <tobias@pankrath.net> wrote:

> On Wednesday, 8 August 2012 at 12:55:04 UTC, Regan Heath wrote:
>
>> Like it or not, Digest is the correct term:
>> http://en.wikipedia.org/wiki/MD5
>> "The MD5 Message-Digest Algorithm .."
>
> You could have cited the hole sentence

I could have, but I didn't read that far :p  I knew what I was looking for and I copy/pasted it.

>> The MD5 Message-Digest Algorithm is a widely used cryptographic hash function
>
> So at least this implies that hash function is the more general term here and the corresponding wiki article is named "hash function" and does not even mention digest.

"Message-Digest Algorithm" is the proper term, "hash" is another, correct, more general term.

"hash" has other meanings, "Message-Digest Algorithm" does not.

std.message-digest-algorithm is a bit wordy.

std.digest is not.

std.digest cannot be confused with anything else.

>>> I didn't think of an hash table implementation, maybe you are spoiled by writing one at the moment? (no offence).
>>
>> "Hash" has too many meanings, we should avoid it.
>
> At least hash table does not use a different meaning of the term hash.
>
> But I'm not that deep into it, I'd just say that digest is not clearly better than hash.

I think it is.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/