August 11, 2012
N.B. I haven't yet reviewed the proposal.

There's been a lot of discussion about the behavior of hash accumulators, and I've just have a chat with Walter about such.

There are two angles in the discussion:

1. One is, the hash accumulator should work as an operand in an accumulation expression. Then the reduce() algorithm can be used as follows:

HashAccumulator ha;
reduce!((a, b) => a + b)(ha, [1, 2, 3]);
writeln(ha.finish());

This assumes the hash overloads operator +.

2. The other is, the hash accumulator is an output range - a sink! - that supports put() for a lot of stuff. Then the code would go:

HashAccumulator ha;
copy([1, 2, 3], ha);
writeln(ha.finish());

I think (2) is a much more fertile view than (1) because the notion of "reduce" emphasizes the accumulation operation (such as "+"), and that is a forced notion for hashes (we're not really adding stuff there). In contrast, the notion that the hash accumulator is a sink is very natural: you just dump a lot of stuff into the accumulator, and then you call finish and you get its digest.

So, where does this leave us?

I think we should reify the notion of finish() as an optional method for output ranges. We define in std.range a free finish(r) function that does nothing if r does not define a finish() method, and invokes the method if r does define it.

Then people can call r.finish() for all output ranges no problem.

For files, finish() should close the file (or at least flush it - unclear on that). I also wonder whether there exists a better name than finish(), and how to handle cases in which e.g. you finish() an output range and then you put more stuff into it, or you finish() a range several times, etc.


Destroy!

Andrei
August 11, 2012
On Saturday, August 11, 2012 19:29:53 Andrei Alexandrescu wrote:
> I also wonder whether there exists a better name than
> finish()

finish is what I've used for similar functions in the past. It seems like a fine name to me.

> and how to handle cases in which e.g. you finish() an output
> range and then you put more stuff into it, or you finish() a range
> several times, etc.

In all of the cases that I've dealt with where anything like finish is required, it's made no sense whatsoever to call finish mulitple times.

> Destroy!

Overall, seems like a sensible idea to me.

- Jonathan M Davis
August 12, 2012
On Sat, 2012-08-11 at 19:29 -0400, Andrei Alexandrescu wrote: […]
> I think (2) is a much more fertile view than (1) because the notion of "reduce" emphasizes the accumulation operation (such as "+"), and that is a forced notion for hashes (we're not really adding stuff there). In contrast, the notion that the hash accumulator is a sink is very natural: you just dump a lot of stuff into the accumulator, and then you call finish and you get its digest.

One could also consider the hash generator to be a builder, which would support 2 rather than 1.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


August 12, 2012
On Saturday, 11 August 2012 at 23:29:57 UTC, Andrei Alexandrescu wrote:
> N.B. I haven't yet reviewed the proposal.
>
> For files, finish() should close the file (or at least flush it - unclear on that). I also wonder whether there exists a better name than finish(), and how to handle cases in which e.g. you finish() an output range and then you put more stuff into it, or you finish() a range several times, etc.

How about naming "finish", flush? Which is unambiguous...

August 12, 2012
Am Sat, 11 Aug 2012 19:29:53 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:

> 2. The other is, the hash accumulator is an output range - a sink! - that supports put() for a lot of stuff. Then the code would go:
> 
> HashAccumulator ha;
> copy([1, 2, 3], ha);

This is a little off topic, but when I implemented the recent changes
for std.hash I noticed the above code doesn't work, as ha is passed by
value. You currently have to do this:
ha = copy([1, 2, 3], ha); //or
copy([1, 2, 3], &ha);

> I think we should reify the notion of finish() as an optional method
> for output ranges. We define in std.range a free finish(r) function
> that does nothing if r does not define a finish() method, and invokes
> the method if r does define it.
> 
> Then people can call r.finish() for all output ranges no problem.

Sounds good.
> 
> For files, finish() should close the file (or at least flush it -
> unclear on that). I also wonder whether there exists a better name
> than finish(), and how to handle cases in which e.g. you finish() an
> output range and then you put more stuff into it, or you finish() a
> range several times, etc.

The current behavior in std.hash is to reset the 'HashAccumulator' to it's initial state after finish was called, so it can be reused. Finish does some computation which leaves the 'HashAccumulator' in an invalid state and resetting it is cheap, so I thought an implicit reset is convenient.

I'm not sure about files.

The data property in Appender is also similar, but it doesn't modify the internal state AFAIK, so it's possible to continue using Appender after accessing data. It's probably more a peek function than a finish function.

Probably we should distinguish between finish functions which destroy
the internal state and peek functions which do not modify the internal
state.
The implicit reset done in the hash finish functions would probably
have to be removed then. The downside of this is that it's then possible
to have a 'HashAccumulator' with invalid state, so 'put' would have to
check for that (at least in debug mode). With the implicit reset it's
not possible to get a 'HashAccumulator' with invalid state.
August 12, 2012
On 8/12/2012 1:25 AM, Daniel wrote:
> How about naming "finish", flush? Which is unambiguous...


We discussed that and rejected it, because "flush" has connotations of being an intermediate operation, not a final one.

August 12, 2012
On 8/12/2012 1:36 AM, Johannes Pfau wrote:
> Probably we should distinguish between finish functions which destroy
> the internal state and peek functions which do not modify the internal
> state.

I worry about too many parts to an interface.

A peek function needs a really strong use case to justify it.
August 12, 2012
On Sunday, August 12, 2012 03:20:49 Walter Bright wrote:
> On 8/12/2012 1:36 AM, Johannes Pfau wrote:
> > Probably we should distinguish between finish functions which destroy the internal state and peek functions which do not modify the internal state.
> 
> I worry about too many parts to an interface.
> 
> A peek function needs a really strong use case to justify it.

The big question is whether it's merited in output ranges in general. A function could still be worth having on an individual type without making sense for output ranges in general as long as it's not required to use the type.

However, I really don't think that peek makes sense for output ranges in general. Most of the time, you're just writing to them and not worrying about what's already been written. It's basically the same as when you write to a stream. I don't think that I've seen a peek function on an output stream.

And I really don't think that peek makes sense in the context of hashes. You don't care what a hash is until it's finished. And once it's finished, it really doesn't make sense to keep adding to it. I don't know why you'd ever want an intermediate hash result.

If you call finish, you're done. And if finish gets called again, it's just like if you call popFront after an input range is empty. The behavior is undefined (though popFront - or finish in this case - probably has an assertion for checking in non-release mode). I really don't think that it's all that big a deal.

- Jonathan M Davis
August 12, 2012
On 12-Aug-12 14:35, Jonathan M Davis wrote:
> On Sunday, August 12, 2012 03:20:49 Walter Bright wrote:
>> On 8/12/2012 1:36 AM, Johannes Pfau wrote:
>>> Probably we should distinguish between finish functions which destroy
>>> the internal state and peek functions which do not modify the internal
>>> state.
>>
>> I worry about too many parts to an interface.
>>
>> A peek function needs a really strong use case to justify it.
>
> The big question is whether it's merited in output ranges in general. A
> function could still be worth having on an individual type without making
> sense for output ranges in general as long as it's not required to use the
> type.
>
> However, I really don't think that peek makes sense for output ranges in
> general. Most of the time, you're just writing to them and not worrying about
> what's already been written. It's basically the same as when you write to a
> stream. I don't think that I've seen a peek function on an output stream.
>

Agreed. Current appender has .data (a-la peek) just because of implementation details that allow it. In fact there was a pull for Phobos including s better Appender (that would end up being a Builder I guess) that doesn't allow to peek at array in creation until the very end but provides much better performance profile.

BTW what's happened with that pull? I recall github nickname sandford, but can't recall whose awesome work that was. I'd love to see it make its way into Phobos.

> And I really don't think that peek makes sense in the context of hashes. You
> don't care what a hash is until it's finished. And once it's finished, it really
> doesn't make sense to keep adding to it. I don't know why you'd ever want an
> intermediate hash result.
>

Having partial hashes over data is very useful e.g. for fast binary diff algorithms. However it requires specific form of hash function (so that you can look at result) and/or operating at specific block granularity.

> If you call finish, you're done. And if finish gets called again, it's just like
> if you call popFront after an input range is empty. The behavior is undefined
> (though popFront - or finish in this case - probably has an assertion for
> checking in non-release mode). I really don't think that it's all that big a
> deal.

Agreed. finish seems as a good name because indicates that it's an end of operation. But consider that hash accumulator can be reset after finish and reused (unlike say network stream).

-- 
Dmitry Olshansky
August 12, 2012
On Sunday, August 12, 2012 14:57:00 Dmitry Olshansky wrote:
> Agreed. Current appender has .data (a-la peek) just because of implementation details that allow it. In fact there was a pull for Phobos including s better Appender (that would end up being a Builder I guess) that doesn't allow to peek at array in creation until the very end but provides much better performance profile.
> 
> BTW what's happened with that pull? I recall github nickname sandford, but can't recall whose awesome work that was. I'd love to see it make its way into Phobos.

I'm not sure. He was basically told to redo it as ArrayBuilder rather than changing Appender, since changing Appender would break a lot of code, and his changes arguably made it so that it wasn't an appender anymore anyway. But he has yet to post a new pull request with those changes.

- Jonathan M Davis
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home