View mode: basic / threaded / horizontal-split · Log in · Help
August 11, 2012
finish function for output ranges
N.B. I haven't yet reviewed the proposal.

There's been a lot of discussion about the behavior of hash 
accumulators, and I've just have a chat with Walter about such.

There are two angles in the discussion:

1. One is, the hash accumulator should work as an operand in an 
accumulation expression. Then the reduce() algorithm can be used as follows:

HashAccumulator ha;
reduce!((a, b) => a + b)(ha, [1, 2, 3]);
writeln(ha.finish());

This assumes the hash overloads operator +.

2. The other is, the hash accumulator is an output range - a sink! - 
that supports put() for a lot of stuff. Then the code would go:

HashAccumulator ha;
copy([1, 2, 3], ha);
writeln(ha.finish());

I think (2) is a much more fertile view than (1) because the notion of 
"reduce" emphasizes the accumulation operation (such as "+"), and that 
is a forced notion for hashes (we're not really adding stuff there). In 
contrast, the notion that the hash accumulator is a sink is very 
natural: you just dump a lot of stuff into the accumulator, and then you 
call finish and you get its digest.

So, where does this leave us?

I think we should reify the notion of finish() as an optional method for 
output ranges. We define in std.range a free finish(r) function that 
does nothing if r does not define a finish() method, and invokes the 
method if r does define it.

Then people can call r.finish() for all output ranges no problem.

For files, finish() should close the file (or at least flush it - 
unclear on that). I also wonder whether there exists a better name than 
finish(), and how to handle cases in which e.g. you finish() an output 
range and then you put more stuff into it, or you finish() a range 
several times, etc.


Destroy!

Andrei
August 11, 2012
Re: finish function for output ranges
On Saturday, August 11, 2012 19:29:53 Andrei Alexandrescu wrote:
> I also wonder whether there exists a better name than
> finish()

finish is what I've used for similar functions in the past. It seems like a fine 
name to me.

> and how to handle cases in which e.g. you finish() an output
> range and then you put more stuff into it, or you finish() a range
> several times, etc.

In all of the cases that I've dealt with where anything like finish is 
required, it's made no sense whatsoever to call finish mulitple times.

> Destroy!

Overall, seems like a sensible idea to me.

- Jonathan M Davis
August 12, 2012
Re: finish function for output ranges
On Sat, 2012-08-11 at 19:29 -0400, Andrei Alexandrescu wrote:
[…]
> I think (2) is a much more fertile view than (1) because the notion of 
> "reduce" emphasizes the accumulation operation (such as "+"), and that 
> is a forced notion for hashes (we're not really adding stuff there). In 
> contrast, the notion that the hash accumulator is a sink is very 
> natural: you just dump a lot of stuff into the accumulator, and then you 
> call finish and you get its digest.

One could also consider the hash generator to be a builder, which would
support 2 rather than 1.

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder
August 12, 2012
Re: finish function for output ranges
On Saturday, 11 August 2012 at 23:29:57 UTC, Andrei Alexandrescu 
wrote:
> N.B. I haven't yet reviewed the proposal.
>
> For files, finish() should close the file (or at least flush it 
> - unclear on that). I also wonder whether there exists a better 
> name than finish(), and how to handle cases in which e.g. you 
> finish() an output range and then you put more stuff into it, 
> or you finish() a range several times, etc.

How about naming "finish", flush? Which is unambiguous...
August 12, 2012
Re: finish function for output ranges
Am Sat, 11 Aug 2012 19:29:53 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:

> 2. The other is, the hash accumulator is an output range - a sink! - 
> that supports put() for a lot of stuff. Then the code would go:
> 
> HashAccumulator ha;
> copy([1, 2, 3], ha);

This is a little off topic, but when I implemented the recent changes
for std.hash I noticed the above code doesn't work, as ha is passed by
value. You currently have to do this:
ha = copy([1, 2, 3], ha); //or
copy([1, 2, 3], &ha);

> I think we should reify the notion of finish() as an optional method
> for output ranges. We define in std.range a free finish(r) function
> that does nothing if r does not define a finish() method, and invokes
> the method if r does define it.
> 
> Then people can call r.finish() for all output ranges no problem.

Sounds good.
> 
> For files, finish() should close the file (or at least flush it - 
> unclear on that). I also wonder whether there exists a better name
> than finish(), and how to handle cases in which e.g. you finish() an
> output range and then you put more stuff into it, or you finish() a
> range several times, etc.

The current behavior in std.hash is to reset the 'HashAccumulator' to
it's initial state after finish was called, so it can be reused. Finish
does some computation which leaves the 'HashAccumulator' in an invalid
state and resetting it is cheap, so I thought an implicit reset is
convenient.

I'm not sure about files.

The data property in Appender is also similar, but it doesn't modify
the internal state AFAIK, so it's possible to continue using Appender
after accessing data. It's probably more a peek function than a finish
function.

Probably we should distinguish between finish functions which destroy
the internal state and peek functions which do not modify the internal
state.
The implicit reset done in the hash finish functions would probably
have to be removed then. The downside of this is that it's then possible
to have a 'HashAccumulator' with invalid state, so 'put' would have to
check for that (at least in debug mode). With the implicit reset it's
not possible to get a 'HashAccumulator' with invalid state.
August 12, 2012
Re: finish function for output ranges
On 8/12/2012 1:25 AM, Daniel wrote:
> How about naming "finish", flush? Which is unambiguous...


We discussed that and rejected it, because "flush" has connotations of being an 
intermediate operation, not a final one.
August 12, 2012
Re: finish function for output ranges
On 8/12/2012 1:36 AM, Johannes Pfau wrote:
> Probably we should distinguish between finish functions which destroy
> the internal state and peek functions which do not modify the internal
> state.

I worry about too many parts to an interface.

A peek function needs a really strong use case to justify it.
August 12, 2012
Re: finish function for output ranges
On Sunday, August 12, 2012 03:20:49 Walter Bright wrote:
> On 8/12/2012 1:36 AM, Johannes Pfau wrote:
> > Probably we should distinguish between finish functions which destroy
> > the internal state and peek functions which do not modify the internal
> > state.
> 
> I worry about too many parts to an interface.
> 
> A peek function needs a really strong use case to justify it.

The big question is whether it's merited in output ranges in general. A 
function could still be worth having on an individual type without making 
sense for output ranges in general as long as it's not required to use the 
type.

However, I really don't think that peek makes sense for output ranges in 
general. Most of the time, you're just writing to them and not worrying about 
what's already been written. It's basically the same as when you write to a 
stream. I don't think that I've seen a peek function on an output stream.

And I really don't think that peek makes sense in the context of hashes. You 
don't care what a hash is until it's finished. And once it's finished, it really 
doesn't make sense to keep adding to it. I don't know why you'd ever want an 
intermediate hash result.

If you call finish, you're done. And if finish gets called again, it's just like 
if you call popFront after an input range is empty. The behavior is undefined 
(though popFront - or finish in this case - probably has an assertion for 
checking in non-release mode). I really don't think that it's all that big a 
deal.

- Jonathan M Davis
August 12, 2012
Re: finish function for output ranges
On 12-Aug-12 14:35, Jonathan M Davis wrote:
> On Sunday, August 12, 2012 03:20:49 Walter Bright wrote:
>> On 8/12/2012 1:36 AM, Johannes Pfau wrote:
>>> Probably we should distinguish between finish functions which destroy
>>> the internal state and peek functions which do not modify the internal
>>> state.
>>
>> I worry about too many parts to an interface.
>>
>> A peek function needs a really strong use case to justify it.
>
> The big question is whether it's merited in output ranges in general. A
> function could still be worth having on an individual type without making
> sense for output ranges in general as long as it's not required to use the
> type.
>
> However, I really don't think that peek makes sense for output ranges in
> general. Most of the time, you're just writing to them and not worrying about
> what's already been written. It's basically the same as when you write to a
> stream. I don't think that I've seen a peek function on an output stream.
>

Agreed. Current appender has .data (a-la peek) just because of 
implementation details that allow it. In fact there was a pull for 
Phobos including s better Appender (that would end up being a Builder I 
guess) that doesn't allow to peek at array in creation until the very 
end but provides much better performance profile.

BTW what's happened with that pull? I recall github nickname sandford, 
but can't recall whose awesome work that was. I'd love to see it make 
its way into Phobos.

> And I really don't think that peek makes sense in the context of hashes. You
> don't care what a hash is until it's finished. And once it's finished, it really
> doesn't make sense to keep adding to it. I don't know why you'd ever want an
> intermediate hash result.
>

Having partial hashes over data is very useful e.g. for fast binary diff 
algorithms. However it requires specific form of hash function (so that 
you can look at result) and/or operating at specific block granularity.

> If you call finish, you're done. And if finish gets called again, it's just like
> if you call popFront after an input range is empty. The behavior is undefined
> (though popFront - or finish in this case - probably has an assertion for
> checking in non-release mode). I really don't think that it's all that big a
> deal.

Agreed. finish seems as a good name because indicates that it's an end 
of operation. But consider that hash accumulator can be reset after 
finish and reused (unlike say network stream).

-- 
Dmitry Olshansky
August 12, 2012
Re: finish function for output ranges
On Sunday, August 12, 2012 14:57:00 Dmitry Olshansky wrote:
> Agreed. Current appender has .data (a-la peek) just because of
> implementation details that allow it. In fact there was a pull for
> Phobos including s better Appender (that would end up being a Builder I
> guess) that doesn't allow to peek at array in creation until the very
> end but provides much better performance profile.
> 
> BTW what's happened with that pull? I recall github nickname sandford,
> but can't recall whose awesome work that was. I'd love to see it make
> its way into Phobos.

I'm not sure. He was basically told to redo it as ArrayBuilder rather than 
changing Appender, since changing Appender would break a lot of code, and his 
changes arguably made it so that it wasn't an appender anymore anyway. But he 
has yet to post a new pull request with those changes.

- Jonathan M Davis
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home