August 08, 2012
On 8/8/12 4:34 PM, Jonathan M Davis wrote:
> I say just keep at simple and leave it at std.hash. It's plenty clear IMHO.

Not clear to quite a few of us. IMHO it just makes us seem (to the larger community) clever about a petty point. There's plenty of other better names, and std.digest is very adequate.

Andrei


August 08, 2012
On Wednesday, August 08, 2012 18:47:04 Andrei Alexandrescu wrote:
> On 8/8/12 4:34 PM, Jonathan M Davis wrote:
> > I say just keep at simple and leave it at std.hash. It's plenty clear IMHO.
> 
> Not clear to quite a few of us. IMHO it just makes us seem (to the larger community) clever about a petty point. There's plenty of other better names, and std.digest is very adequate.

I prefer std.hash to std.digest, but I don't necessarily care all that much. What I was objecting to in particular was the suggestion to split it into std.hash.digest and std.hash.func. I think that all of the hashing algorithms should just go in the one package. Adding another layer is an unnecessary complication IMHO.

- Jonathan M Davis
August 09, 2012
Am Wed, 08 Aug 2012 16:44:03 -0400
schrieb "Jonathan M Davis" <jmdavisProg@gmx.com>:

> > 
> > in CTFE?
> > http://dpaste.dzfl.pl/0503b8af
> > 
> > According to Don reinterpret casts (even if done through unions) won't be supported in CTFE. So you can't convert from uint-->ubyte[4]
> 
> No. It wouldn't work in CTFE, because it uses a union.But what it's trying to doesn't really make sense in CTFE in most cases anyway, because the endianness of the target machine may not be the same endianness as the machine doing the compilation. Any computations which cared about endianness must be in a state where they don't care about endianness anymore once CTFE has completed, or you're going to have bugs.

I completely agree, but this is true for hashes. Once the final hash value is produced it doesn't depend on the endianness.

> 
> Though if the issue is std.hash being CTFEable, I don't know why anyone would even care. It's cool if it's CTFEable, but the sorts of things that you hash pretty much always require user or file input of some kind (which you can't do with CTFE).

Yeah it's not that useful, that's why I didn't care about CTFE support right now. The only usecase I can think of is to hash a string in CTFE, for example UUID could use it to support name based UUID literals.

> You'd have to have a use
> case where something within the program itself needed to be hashed
> for some reason for it to matter whether std.hash was CTFEable or
> not, and it wouldn't surprise me at all if it were typical in hash
> functions to do stuff that isn't CTFEable anyway.
> 
> - Jonathan M Davis


August 09, 2012
Am Wed, 08 Aug 2012 12:30:31 -0700
schrieb Walter Bright <newshound2@digitalmars.com>:

> On 8/8/2012 12:05 PM, Johannes Pfau wrote:
> > So the post in D.learn for a detailed description. Yes the code I posted takes a range, but digest (as it is now) takes void[][] to accept all kind of types _without_ template bloat. The difficulty is to combine those two overloads without causing unnecessary template bloat.
> 
> Have the templated version with overloads simply call the single version (with a different name) with void[][].
> 
> 

Well that's possible, but I don't like the template bloat it causes. AFAIK a function taking a void[][] is just one instance, with that redirecting approach we'll have one instance per array type. This seems unnecessary (and maybe the compiler can merge such template instances in the future), but I can't seem to find a way to avoid it, so we'll probably have to live with that.

http://dpaste.dzfl.pl/f86717f7

I guess a second function digestRange is not acceptable?
August 09, 2012
On 8/9/2012 2:05 AM, Johannes Pfau wrote:
> I guess a second function digestRange is not acceptable?

It's more the user API that matters, not how it works under the hood.


August 09, 2012
On 8/9/2012 2:05 AM, Johannes Pfau wrote:
> http://dpaste.dzfl.pl/f86717f7

The Range argument - is it an InputRange, an OutputRange? While it's just a type name, the name should reflect what kind of range it is from the menagerie of ranges in std.range.

August 09, 2012
On Tuesday, 7 August 2012 at 17:39:50 UTC, Dmitry Olshansky wrote:
> std.hash.hash is a new module for Phobos defining an uniform interface for hashes and checksums. It also provides some useful helper functions to deal with this new API.

Is it too late to ask to include MurmurHash 2 and/or 3? It's public domain, and great for things like hash tables.

You can steal some code from here:
https://github.com/CyberShadow/ae/blob/master/utils/digest.d
https://github.com/CyberShadow/ae/blob/master/utils/digest_murmurhash3.d

August 09, 2012
Am Wed, 08 Aug 2012 12:27:39 -0700
schrieb Walter Bright <newshound2@digitalmars.com>:

> On 8/8/2012 12:08 PM, Johannes Pfau wrote:
> > No where's the difference, except that for hashes the context ('hash') has to be setup and finished manually?
> 
> 
> The idea is to have hash act like a component - not with special added code the user has to write.
Please explain that. Nobody's going to simply replace a call to reduce with a call to a fictional 'hashReduce'. Why is it so important that reduce + hash API's match 100% even if the API doesn't fit hashes?

> 
> In this case, it needs to work like a reduce algorithm, because it is a reduce algorithm. Need to find a way to make this work.
> 

We could do some ugly, performance killing hacks to make it possible, but I just don't see why this is necessary.

----
struct InterHash
{
    MD5 ctx;
    ubyte[16] finished;
    alias finished this;
}

InterHash hashReduce(Range)(Range data)
{
    InterHash hash;
    hash.ctx.start();
    return hashReduce(hash, data);
}

InterHash hashReduce(Range)(InterHash hash, Range data)
{
    copy(data, hash);
    auto ctxCopy = hash.ctx;
    hash.finished = ctxCopy.finish();
    return hash;
}

auto a = hashReduce([1,2,3]);
auto b = hashReduce(a, [3,4]);
----

However, a and b are still not really valid hash values. I just don't see why we should force an interface onto hashes which just doesn't fit.
August 09, 2012
Am Wed, 08 Aug 2012 12:31:29 -0700
schrieb Walter Bright <newshound2@digitalmars.com>:

> On 8/8/2012 12:14 PM, Martin Nowak wrote:
> > That hardly works for event based programming without using coroutines. It's the classical inversion-of-control dilemma of event based programming that forces you to save/restore your state with every event.
> 
> See the discussion on using reduce().
> 

I just don't understand it. Let's take the example by Martin Nowak and port it to reduce: (The code added as comments is the same code for hashes, working with the current API)

int state; //Hash state;

void onData(void[] data)
{
     state = reduce(state, data); //copy(data, state);
     //state = copy(data, state); //also valid, but not necessary
     //state.put(data); //simple way, doesn't work for ranges
}

void main()
{
     state = 0; //state.start();
     auto stream = new EventTcpStream("localhost", 80);
     stream.onData = &onData;
     //auto result = hash.finish();
}

There are only 2 differences:

1:
the order of the arguments passed to copy and reduce is swapped. This
kinda makes sense (if copy is interpreted as copyTo). Solution: Provide
a method copyInto with swapped arguments if consistency is really so
important.

2:
We need an additional call to finish. I can't say it often enough, I
don't see a sane way to avoid it. Hashes work on blocks, if you didn't
pass enough data finish will have to fill the rest of the block with
zeros before you can get the hash value. This operation can't be
undone. To get a valid result with every call to copy, you'd have to
always call finish. This is
* inefficient, you calculate intermediate values you don't need at all
* you have to copy the hashes state, as you can't continue hashing
  after finish has been called

and both, the state and the result would have to fit into the one value (called seed for reduce). But then it's still not 100% consistent, as reduce will return a single value, not some struct including internal state.
August 09, 2012
Am Thu, 09 Aug 2012 02:13:10 -0700
schrieb Walter Bright <newshound2@digitalmars.com>:

> On 8/9/2012 2:05 AM, Johannes Pfau wrote:
> > http://dpaste.dzfl.pl/f86717f7
> 
> The Range argument - is it an InputRange, an OutputRange? While it's just a type name, the name should reflect what kind of range it is from the menagerie of ranges in std.range.
> 

It's an InputRange (of bytes) or an InputRange of some byte buffer
(ElementType == ubyte[] || ElementType == ubyte[num]). We get the
second version for free, so I just included it ;-)

The documentation would have to make that clear of course. I could also change the name, it's just a proof of concept right now.