View mode: basic / threaded / horizontal-split · Log in · Help
August 09, 2012
Re: The review of std.hash package
Am Thu, 09 Aug 2012 11:32:34 +0200
schrieb "Vladimir Panteleev" <vladimir@thecybershadow.net>:

> On Tuesday, 7 August 2012 at 17:39:50 UTC, Dmitry Olshansky wrote:
> > std.hash.hash is a new module for Phobos defining an uniform 
> > interface for hashes and checksums. It also provides some 
> > useful helper functions to deal with this new API.
> 
> Is it too late to ask to include MurmurHash 2 and/or 3? It's 
> public domain, and great for things like hash tables.
> 
> You can steal some code from here:
> https://github.com/CyberShadow/ae/blob/master/utils/digest.d
> https://github.com/CyberShadow/ae/blob/master/utils/digest_murmurhash3.d
> 

To be honest I didn't even know that MurmurHash can be used
incrementally. I could port that code soon, but I think it's
best to do it after the review.
After we have formalized a common API adding new hashes probably won't
require a full review. It should be possible to do that as a pull
request.
August 09, 2012
Re: The review of std.hash package
On Wednesday, 8 August 2012 at 19:27:54 UTC, Walter Bright wrote:
> The idea is to have hash act like a component - not with 
> special added code the user has to write.

Sorry, but I think this is a meaningless statement without 
specifying what kind of interface the component should adhere to. 
In my opinion, the proposed std.hash design would be a perfectly 
valid interface for »accumulate stuff and at some point get a 
result«-type components.

> In this case, it needs to work like a reduce algorithm, because 
> it is a reduce algorithm. Need to find a way to make this work.

Hash functions are _not_ analogous to reduce(), because the 
operation performed by reduce() is stateless, whereas hash 
functions generally have some internal state.

»Continuing« a reduce() operation by repeatedly calling it with 
the last partial result as the starting value is only possible 
because there is no additional state to carry over. To make this 
work with hashes, you'd have to return something encapsulating 
the internal state from your hash function. But then, you again 
need to obtain the actual result from that return value from that 
result somehow, which defeats the original intent of making it 
work like reduce – and incidentally is what finish() does.

David
August 09, 2012
Re: The review of std.hash package
On Thu, 09 Aug 2012 10:59:47 +0100, David Nadlinger <see@klickverbot.at>  
wrote:

> On Wednesday, 8 August 2012 at 19:27:54 UTC, Walter Bright wrote:
>> The idea is to have hash act like a component - not with special added  
>> code the user has to write.
>
> Sorry, but I think this is a meaningless statement without specifying  
> what kind of interface the component should adhere to. In my opinion,  
> the proposed std.hash design would be a perfectly valid interface for  
> »accumulate stuff and at some point get a result«-type components.
>
>> In this case, it needs to work like a reduce algorithm, because it is a  
>> reduce algorithm. Need to find a way to make this work.
>
> Hash functions are _not_ analogous to reduce(), because the operation  
> performed by reduce() is stateless, whereas hash functions generally  
> have some internal state.
>
> »Continuing« a reduce() operation by repeatedly calling it with the last  
> partial result as the starting value is only possible because there is  
> no additional state to carry over. To make this work with hashes, you'd  
> have to return something encapsulating the internal state from your hash  
> function.

This isn't necessarily a problem.

> But then, you again need to obtain the actual result from that return  
> value from that result somehow, which defeats the original intent of  
> making it work like reduce – and incidentally is what finish() does.

But, this is a problem.  finish in most cases pads the remaining data to a  
boundary of the internal state size, then completes one more iteration of  
the algorithm to produce the final result.

So, like David has just said, you can have 1 or the other.  Either you can  
chain hashreduce operations together, but you have to perform a manual  
finish step to get the actual result, or you cannot chain hashreduce  
operations together and finish is done automatically when the input range  
is consumed.

Wild thought.. and I have to admit I've not followed the proposed or  
suggested API closely, nor have I used ranges extensively so this may not  
be possible..

If the range/hash object stores the current state and returns this as the  
result of hashreduce, it would be chainable.  If it also had a "Digest"  
property/method which performed finish on a /temporary copy/ of the state  
it would almost be as automatic as reduce.  There would still be a manual  
step to get the result, but it would be analogous to calling toString on  
any range object to output it's "value".  The Digest property/method would  
not modify the internal state, and could be called at any time between  
(not sure there is a point to this) or after chained hashreduce operations.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
August 09, 2012
Re: The review of std.hash package
On Thu, 09 Aug 2012 10:58:10 +0100, Johannes Pfau <nospam@example.com>  
wrote:

> Am Thu, 09 Aug 2012 11:32:34 +0200
> schrieb "Vladimir Panteleev" <vladimir@thecybershadow.net>:
>
>> On Tuesday, 7 August 2012 at 17:39:50 UTC, Dmitry Olshansky wrote:
>> > std.hash.hash is a new module for Phobos defining an uniform
>> > interface for hashes and checksums. It also provides some
>> > useful helper functions to deal with this new API.
>>
>> Is it too late to ask to include MurmurHash 2 and/or 3? It's
>> public domain, and great for things like hash tables.
>>
>> You can steal some code from here:
>> https://github.com/CyberShadow/ae/blob/master/utils/digest.d
>> https://github.com/CyberShadow/ae/blob/master/utils/digest_murmurhash3.d
>>
>
> To be honest I didn't even know that MurmurHash can be used
> incrementally. I could port that code soon, but I think it's
> best to do it after the review.
> After we have formalized a common API adding new hashes probably won't
> require a full review. It should be possible to do that as a pull
> request.

Once the API is formalised I can contribute the hashes I have also :)

R


-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
August 09, 2012
Re: The review of std.hash package
On 8/9/2012 2:59 AM, David Nadlinger wrote:
> On Wednesday, 8 August 2012 at 19:27:54 UTC, Walter Bright wrote:
>> The idea is to have hash act like a component - not with special added code
>> the user has to write.
>
> Sorry, but I think this is a meaningless statement without specifying what kind
> of interface the component should adhere to. In my opinion, the proposed
> std.hash design would be a perfectly valid interface for »accumulate stuff and
> at some point get a result«-type components.

It is not a meaningless statement in that components have a predictable set of 
methods and properties. That's all a range is. Requiring extra methods means 
there's either an error in the component interface design or an error in the 
component instance design.

What I'm trying to get away from is the C library style where every library 
lives in its own world, and when the user tries to connect them he's got a fair 
amount of work to do building a scaffolding between them.

With component programming, the interfaces between disparate things is 
standardized. It does not have unique methods for different instances. For 
example, one component has a finish() method, another has a getResult() method, 
and a third has no method at all. This situation I wish to avoid.


>> In this case, it needs to work like a reduce algorithm, because it is a reduce
>> algorithm. Need to find a way to make this work.
>
> Hash functions are _not_ analogous to reduce(), because the operation performed
> by reduce() is stateless, whereas hash functions generally have some internal
> state.
>
> »Continuing« a reduce() operation by repeatedly calling it with the last partial
> result as the starting value is only possible because there is no additional
> state to carry over. To make this work with hashes, you'd have to return
> something encapsulating the internal state from your hash function.

Wouldn't that be simply the handle to the hash?

> But then,
> you again need to obtain the actual result from that return value from that
> result somehow, which defeats the original intent of making it work like reduce
> – and incidentally is what finish() does.

I understand what finish() does. The interesting part is trying to figure a way 
out of needing that method. Or perhaps the reduce component design is incorrect.
August 09, 2012
Re: The review of std.hash package
On 8/9/2012 2:48 AM, Johannes Pfau wrote:
> Am Wed, 08 Aug 2012 12:31:29 -0700
> schrieb Walter Bright <newshound2@digitalmars.com>:
>
>> On 8/8/2012 12:14 PM, Martin Nowak wrote:
>>> That hardly works for event based programming without using
>>> coroutines. It's the classical inversion-of-control dilemma of
>>> event based programming that forces you to save/restore your state
>>> with every event.
>>
>> See the discussion on using reduce().
>>
>
> I just don't understand it. Let's take the example by Martin Nowak and
> port it to reduce: (The code added as comments is the same code for
> hashes, working with the current API)
>
> int state; //Hash state;
>
> void onData(void[] data)
> {
>       state = reduce(state, data); //copy(data, state);
>       //state = copy(data, state); //also valid, but not necessary
>       //state.put(data); //simple way, doesn't work for ranges
> }
>
> void main()
> {
>       state = 0; //state.start();
>       auto stream = new EventTcpStream("localhost", 80);
>       stream.onData = &onData;
>       //auto result = hash.finish();
> }
>
> There are only 2 differences:
>
> 1:
> the order of the arguments passed to copy and reduce is swapped. This
> kinda makes sense (if copy is interpreted as copyTo). Solution: Provide
> a method copyInto with swapped arguments if consistency is really so
> important.

Consistency is important so that disparate components can fit together.

>
> 2:
> We need an additional call to finish. I can't say it often enough, I
> don't see a sane way to avoid it. Hashes work on blocks, if you didn't
> pass enough data finish will have to fill the rest of the block with
> zeros before you can get the hash value. This operation can't be
> undone. To get a valid result with every call to copy, you'd have to
> always call finish. This is
> * inefficient, you calculate intermediate values you don't need at all
> * you have to copy the hashes state, as you can't continue hashing
>    after finish has been called
>
> and both, the state and the result would have to fit into the one value
> (called seed for reduce). But then it's still not 100% consistent, as
> reduce will return a single value, not some struct including internal
> state.

That isn't a problem, the internal state can be private data for the struct, and 
the "finish value" can be the result of overloading operator() on that struct. 
I'm not sure if that would work, but it's worth investigating.
August 09, 2012
Re: The review of std.hash package
If a has is a range, it's an output range, because it's something you   
fee data to. Output range have only one method: put. Johannes used this 
method. But it's not sufficient, you need something to start and to 
finish the hash.

To bring consistency in the library, we should not remove this start and 
finish method. We should make all output range of the library use the 
same functions.

In the library, we have really few output range. We have writable input 
range and we have Appender. Is there more? There should be files, 
socket, maybe even signal, but IIRC these don't implement output range 
at the moment. What did I miss?

Appender doesn't use a finish method, but we have to 'get the 
result' of the appender, and for this we use appender.data. This name is 
not appropriate for generically getting a result or terminating an 
output range.

So We need a name that fits most output range use. start/finish sounds 
not bad. open/close fits files and socket, but many not all output 
ranges. Relying solely on constructors, opCall or alias this seems 
dangerous to me.

-- 
Christophe
August 09, 2012
Re: The review of std.hash package
On 8/9/12 5:05 AM, Johannes Pfau wrote:
> Well that's possible, but I don't like the template bloat it causes.

What have you measured, and what is your dislike based upon?

The library function must be generic. Then users worried about bloating 
may use it with a limited number of types.

A digest function only dealing with void[void[]] is unacceptable.


Andrei
August 09, 2012
Re: The review of std.hash package
Am Thu, 09 Aug 2012 08:48:37 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:

> On 8/9/12 5:05 AM, Johannes Pfau wrote:
> > Well that's possible, but I don't like the template bloat it causes.
> 
> What have you measured, and what is your dislike based upon?

What annoys me is that as long the function only supported arrays, it
didn't need templates _at all_. So template bloat for arrays = 0. But
adding range support means the version dealing with arrays now has to
be a template as well(which is probably a bug, can't overload template
and non template function) and will produce extra code for every array
type. I just think adding range support shouldn't cause the array code
to change in any way.

But that's why I said I don't like it, not that it's a show stopper.
The overhead is probably neglectable, but in theory it shouldn't be
there at all.

> 
> The library function must be generic. Then users worried about
> bloating may use it with a limited number of types.
> 
> A digest function only dealing with void[void[]] is unacceptable.

Sure I agree that range support is necessary, I just forgot to
implement it initially. I'm not against range support / ranges in
general / the template instances needed for ranges. I just dislike
that it affects the array implementation in this specific case.
August 09, 2012
Re: The review of std.hash package
Am Thu, 09 Aug 2012 11:16:36 +0100
schrieb "Regan Heath" <regan@netmail.co.nz>:

> 
> Once the API is formalised I can contribute the hashes I have also :)
> 
great! with all those contributions we'll probably have a rather
complete set of digests soon.
4 5 6 7 8 9 10 11 12
Top | Discussion index | About this forum | D home