January 30, 2009 Re: Heap: container or range? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly wrote:
> I don't know if it matters, but I added the heap routines to Tango because I wanted heapsort, so they kind of came for free. Perhaps a key distinction between ranges and algorithms is that algorithms may mutate the underlying data, but ranges may not? I've been trying to think of another example of a mutating range and I haven't come up with one yet... unless you consider input ranges mutating, I suppose.
A great use for heaps is topNCopy: given an input range (!), give me back the top N items in it, fast. The way that works is by maintaining a heap in the return value.
I'm sure interesting cases will arise with ranges that mutate their host, but heap doesn't look like a good candidate.
Andrei
| |||
January 30, 2009 Re: Heap: container or range? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Brad Roberts | Brad Roberts wrote:
> Since take is a mutator, that implies that h needs to be a reference
> type or a by ref value, right? So, why not make it so?
That's a great question, and one that I don't have a definitive answer to. There are several reasons for which by-ref manipulation is dicey:
1. Composition is harder.
Consider:
foreach (e; take(10, map!("a*a")(range))) ...
In this case, take is passed an rvalue. Rvalues can't and shouldn't bind to references (I think C++'s rule of binding rvalues to const references was a subtle disaster that necessitated endless contortions in defining rvalue references).
So then we need to explicitate:
auto t = map!("a*a")(range);
foreach (e; take(10, t)) ...
2. Difficulties in coding.
My first attempt at ranges was in some formatting code. A by-ref doctrine requires that "ref" is added religiously to the trafficked values. That's not much of a problem except that, if you forget, the code still compiles and runs as told, except that the way it's told is not the way you meant.
3. More difficulties in coding
Coding with more non-local dependencies is harder than coding that creates new, relatively independent entities. The examples given so far were as simple as:
int[] a = [ 1, 2, 3 ];
take(1, a);
// WHY IS A NOT [2, 3] NOW???
In reality, code is almost never that simple and the reality is that it becomes exceedingly harder to assess the state of a range after several other lazy ranges mutate it at their leisure.
That's what I have so far.
On the other hand, other functions are a tougher call. Consider "drop". drop(n, range) throws away at most n elements from the range. There are a few possible ways to define drop:
a) By reference
int[] a = [ 1, 2, 3 ];
drop(1, a);
assert(a == [ 2, 3]);
b) By value
int[] a = [ 1, 2, 3 ];
drop(1, a);
assert(a == [ 1, 2, 3]);
a = drop(1, a);
assert(a == [ 2, 3]);
Unlike take, drop is eager, hence its by-ref implementation doesn't raise as many problems. I'm not sure what's the best choice for drop.
Andrei
| |||
January 30, 2009 Re: Heap: container or range? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Thu, 29 Jan 2009 21:26:39 -0800, Andrei Alexandrescu wrote:
> Sean Kelly wrote:
>> Sean Kelly wrote:
>>> Andrei Alexandrescu wrote:
>>>>
>>>> Ah, never mind all that. I realized that I can't have a heap range. This is because heaps must mutate the underlying range in-place and any copy will mess the heap up. Here's the example that clarify it for me:
>>>>
>>>> int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
>>>> auto h = Heap!(int[])(a);
>>>> assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]);
>>>>
>>>> At this point the mutations done by take messed up h already!!
>>>
>>> Hm... so assuming this is a min heap, I have:
>>>
>>> a = [4,1,3,2,16,9,10,14,8,7];
>>> h = [1,2,3,4,7,9,10,14,8,16];
>>> take(5, h);
>>> h = [8,10,9,16,14];
>>>
>>> Shouldn't take call next repeatedly on the heap, which will in turn perform a popHeap operation? It's difficult to be sure what the result will be after take(5,h) occurs, but it should still be a valid heap, correct?
>>
>> Oh, I would also expect:
>>
>> a = [8,10,9,16,14, garbage];
>>
>> Since it isn't aware of the length adjustment. Perhaps this is what you meant?
>>
>> Sean
>
> Yah, that's what I meant. h still thinks its length is 10 and that those 10 elements respect the heap property.
Actually, if ranges are by value, I'd expect
assert(equal(take(5, h) == take(5, h)));
OTOH, I'd expect something like the following to also work:
auto f = new File("blah.txt");
auto top = take(10, f.lines);
I.e. I'd expect some ranges to be Translators, and others to be Mutators. Translators are read-only views into underlying anything which may involve some expensive bookkeeping. Mutators are basically references into a particular container and make sure that the container and other Mutators stay consistent.
| |||
January 30, 2009 Re: Heap: container or range? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | Sergey Gromov wrote: > Thu, 29 Jan 2009 21:26:39 -0800, Andrei Alexandrescu wrote: > >> Sean Kelly wrote: >>> Sean Kelly wrote: >>>> Andrei Alexandrescu wrote: >>>>> Ah, never mind all that. I realized that I can't have a heap range. This is because heaps must mutate the underlying range in-place and any copy will mess the heap up. Here's the example that clarify it for me: >>>>> >>>>> int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ]; >>>>> auto h = Heap!(int[])(a); >>>>> assert(equal(take(5, h) == [ 8, 9, 10, 14, 16 ]); >>>>> >>>>> At this point the mutations done by take messed up h already!! >>>> Hm... so assuming this is a min heap, I have: >>>> >>>> a = [4,1,3,2,16,9,10,14,8,7]; >>>> h = [1,2,3,4,7,9,10,14,8,16]; >>>> take(5, h); >>>> h = [8,10,9,16,14]; >>>> >>>> Shouldn't take call next repeatedly on the heap, which will in turn perform a popHeap operation? It's difficult to be sure what the result will be after take(5,h) occurs, but it should still be a valid heap, correct? >>> Oh, I would also expect: >>> >>> a = [8,10,9,16,14, garbage]; >>> >>> Since it isn't aware of the length adjustment. Perhaps this is what you meant? >>> >>> Sean >> Yah, that's what I meant. h still thinks its length is 10 and that those 10 elements respect the heap property. > > Actually, if ranges are by value, I'd expect > > assert(equal(take(5, h) == take(5, h))); Me too. It does work indeed BUT NOT FOR INPUT RANGES. So in short you are showing yet another reason why heaps can't be good ranges. > OTOH, I'd expect something like the following to also work: > > auto f = new File("blah.txt"); > auto top = take(10, f.lines); That also works (just that in the new std.stdio File is a struct, so it's not dynamically allocated). However, f.lines is understandably an input range so this will NOT work: auto f = new File("blah.txt"); assert(equal(take(5, f.lines) == take(5, f.lines))); The two f.lines instances operate on the same underlying file handle, so they are input ranges: advancing one advances all others. > I.e. I'd expect some ranges to be Translators, and others to be > Mutators. Translators are read-only views into underlying anything > which may involve some expensive bookkeeping. Mutators are basically > references into a particular container and make sure that the container > and other Mutators stay consistent. I think the distinction is input vs. forward and mutable vs. immutable ranges (the latter distinction is not checked in yet). Andrei | |||
January 30, 2009 Re: Heap: container or range? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | Sergey Gromov Wrote: [snip] > OTOH, I'd expect something like the following to also work: > > auto f = new File("blah.txt"); > auto top = take(10, f.lines); > Same here. I feel that take (and some other adaptors) should accept range by reference. One of the advantages of ranges over opApply is that you can start foreach, break somewhere (throw an exception, recover) and then continue. It also means that the range is mutated inside foreach. If foreach mutates the range, so can other algorithms that operate on ranges do. > I.e. I'd expect some ranges to be Translators, and others to be Mutators. Translators are read-only views into underlying anything which may involve some expensive bookkeeping. Mutators are basically references into a particular container and make sure that the container and other Mutators stay consistent. | |||
January 30, 2009 Re: Heap: container or range? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu Wrote:
[snip]
>
> int[] a = [ 1, 2, 3 ];
> take(1, a);
>
Why is a not [2, 3] now?
| |||
January 30, 2009 Re: Heap: container or range? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Denis Koroskin | Denis Koroskin wrote:
> Sergey Gromov Wrote: [snip]
>> OTOH, I'd expect something like the following to also work:
>>
>> auto f = new File("blah.txt"); auto top = take(10, f.lines);
>>
>
> Same here. I feel that take (and some other adaptors) should accept
> range by reference. One of the advantages of ranges over opApply is
> that you can start foreach, break somewhere (throw an exception,
> recover) and then continue. It also means that the range is mutated
> inside foreach. If foreach mutates the range, so can other algorithms
> that operate on ranges do.
While I'm not 100% clear on what take should do, this particular argument is not that compelling. I really mean not compelling at all.
Consider:
int[] a = [ 1, 2, 3, 4, 5];
foreach (e; a)
{
if (e == 3) break;
}
assert(a == [4, 5]);
Do you seriously expect that to be the case? Of course not. However, you would inconsistently expect this to happen:
int[] a = [ 1, 2, 3, 4, 5];
foreach (e; take(4, a))
{
if (e == 3) break;
}
assert(a == [4, 5]);
Ranges are modeled to correspond to slices. Slices have few operations that manipulate them by reference (notably the controversial ~=) and range are made to be unsurprising when compared to slices. They are really extensions of slices.
Andrei
| |||
January 30, 2009 Re: Heap: container or range? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Fri, 30 Jan 2009, Andrei Alexandrescu wrote:
> Consider:
>
> int[] a = [ 1, 2, 3, 4, 5];
> foreach (e; a)
> {
> if (e == 3) break;
> }
> assert(a == [4, 5]);
>
> Do you seriously expect that to be the case? Of course not. However, you would inconsistently expect this to happen:
>
> int[] a = [ 1, 2, 3, 4, 5];
> foreach (e; take(4, a))
> {
> if (e == 3) break;
> }
> assert(a == [4, 5]);
>
> Ranges are modeled to correspond to slices. Slices have few operations that manipulate them by reference (notably the controversial ~=) and range are made to be unsurprising when compared to slices. They are really extensions of slices.
>
>
> Andrei
I hate to come back to naming, but I think that's a major part of what's causing grief here. Take implies mutation of the input range. How about 'subrange' or 'subset' or 'leftmost' anything that doesn't imply removal.
Later,
Brad
| |||
January 30, 2009 Re: Heap: container or range? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Brad Roberts | Brad Roberts wrote:
> On Fri, 30 Jan 2009, Andrei Alexandrescu wrote:
>
>> Consider:
>>
>> int[] a = [ 1, 2, 3, 4, 5];
>> foreach (e; a)
>> {
>> if (e == 3) break;
>> }
>> assert(a == [4, 5]);
>>
>> Do you seriously expect that to be the case? Of course not. However, you would
>> inconsistently expect this to happen:
>>
>> int[] a = [ 1, 2, 3, 4, 5];
>> foreach (e; take(4, a))
>> {
>> if (e == 3) break;
>> }
>> assert(a == [4, 5]);
>>
>> Ranges are modeled to correspond to slices. Slices have few operations that
>> manipulate them by reference (notably the controversial ~=) and range are made
>> to be unsurprising when compared to slices. They are really extensions of
>> slices.
>>
>>
>> Andrei
>
> I hate to come back to naming, but I think that's a major part of what's causing grief here. Take implies mutation of the input range. How about 'subrange' or 'subset' or 'leftmost' anything that doesn't imply removal.
>
> Later,
> Brad
Names are important. "take" is taken from Haskell.
Andrei
| |||
January 31, 2009 Re: Heap: container or range? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Fri, 30 Jan 2009 14:46:28 -0800, Andrei Alexandrescu wrote: > Names are important. "take" is taken from Haskell. No it hasn't. Haskell still has it so its been "copied" from Haskell. :-) -- Derek Parnell Melbourne, Australia skype: derek.j.parnell | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply