July 06, 2019
On Saturday, July 6, 2019 12:17:26 PM MDT Jonathan M Davis via Digitalmars- d-learn wrote:
> On Saturday, July 6, 2019 8:12:36 AM MDT berni via Digitalmars-d-learn
>
> wrote:
> > Now it's getting weird. Meanwhile I encountered, that take() sometimes consumes and sometimes not. Where can I learn, what is the reason behind this behavior? And how can I handle this?
>
> take _always_ consumes the range that it's given. The problem is that some types of ranges are implicitly saved when they're copied, whereas others aren't, and when a range is implicitly saved when it's copied, you end up with the copy being consumed. Dynamic arrays are implicitly saved when they're copied, so the range you pass is saved, and the copy is consumed instead of the original.
>
> In generic code, you have to assume that once a range has been copied, you can't use it anymore (just the copy) precisely because the semantics of copying differ depending on the type of the range. You can only use a range after copying it if you know what type of range you're dealing with and how it behaves. So, you can rely on a dynamic array implicitly saving when it's passed to take, but in generic code, you really shouldn't be using a range again once you pass it to take, because what actually happens is dependent on the type of the range.
>
> In general what this means is that if you pass a range to a function, and you then want to use the range again afterwards, you need to call save when passing it to the function, and otherwise, you just assume that it's consumed and don't use it again. You certainly don't pass it to a function with the expectation of some elements being consumed and then continue to use the rest of the range unless the function takes its argument by ref or by pointer (which relatively few range-based functions do).
>
> If you want a function that's guaranteed to not implicitly copy a range, then it needs to accept the argument by ref or take a pointer to it. In the case where a function doesn't have to do the work lazily, ref would work, but in a case like take where you're returning a wrapper range, pointers would be required. So, a version of take that didn't ever copy the range it's given and thus never risked implicitly saving the range it was passed would have to either take a pointer to it or take it by ref and then take the address of the ref. In either case, the code using such a take would then have to ensure that the original range didn't leave scope and get destroyed before the take range was consumed, or the take range would then refer to invalid memory.

Another thing to consider about lazy ranges that take pointers to avoid implicit saving and ensure that they consume the original is that if the original has anything popped from it before the wrapper range is fully consumed, then that will screw up the wrapper range, because the elements in the range it's wrapping will have changed. Wrapper ranges are really designed with the idea that they're operating on their own copy of the range. Even with take as it's currently implemented, you risk bugs if you pass it a range which is a reference type without calling save, because if you don't fully consume the take range before using the original range again, you end up screwing up the elements in the take range when you pop anything off of the original range.

It really doesn't work well to have a wrapper range which doesn't have its own independent copy of the range to iterate over. The only case where it works cleanly to have a wrapper range like take gives you and not have to iterate through the elements again to have a range starting at the first element after the take range is to have a random-access range that has slicing, in which case, you can just slice it, and take isn't required. Without a random-access range with slicing, if you want to only iterate through the elements once, you really should just be iterating through the range manually rather than using a wrapper range.

- Jonathan M Davis



July 06, 2019
On Saturday, 6 July 2019 at 18:17:26 UTC, Jonathan M Davis wrote:
> take _always_ consumes the range that it's given

not if it hasSlicing. see http://dpldocs.info/experimental-docs/source/std.range.d.html#L2015

but yeah otherwise i agree with you
July 07, 2019
I start to understand: A (lazy) range is something like a voucher: byChunk() does not provide the data immediately, but only a voucher to hand you a chunk (an array of 2 bytes in my example above) a time. This voucher is passed to joiner(), which keeps that voucher and hands me out a new voucher for a byte a time (called "a" in my example above). This voucher is now passed to take(), whick again keeps that voucher and hands me out yet an other voucher for a byte a time, but limits itself to 5 items.

Up to now nothing has happend, but the exchange of that vouchers. Now when writeln() is called, it shows take() that voucher and says: "Give me the first of your numbers". take() uses his voucher to do the same with joiner(), which uses its voucher to get the first chunk of byChunk(), takes the first byte of that chunk and hands it to take() and take() hands it to writeln().

Now writeln() uses it's voucher once again, to get the next byte. take() asks joiner() again, but this time joiner() does not ask byChunk(), because it still has one number left in the chunk it got when asking the last time. Now joiner() uses this second byte to hand it back. And so on.

After writeln() finished I'm doing something evil: I use a private copy of that voucher, that I allready handed to take() to use it myself. That's sort of stealing back. And hence the results are more or less unpredictable. Especially, when I use this copy, before take() finished it's job, as Jonathan M Davis points out.

My misthinking was to assume, that my copy of the voucher is still the same as the voucher, that take() owns after having handed out the 5 items, so I can use it to get the other items. If it where possible to ask take() to hand me back it's voucher, I could do it with this voucher, but I don't see a possibility to get that voucher back.

What I actually want to do, when using take(), is not to get a new voucher, but I want to keep my voucher and just use this voucher to get the first 5 items - take() doesn't do that, so I've got to write my own function which cannot be lazy.

Or better: I'd like to hand in my voucher and get back two vouchers, one for the first 5 bytes and one for the rest. That's actually, what I thought, take() is doing...
July 07, 2019
On Saturday, July 6, 2019 11:02:15 PM MDT berni via Digitalmars-d-learn wrote:
> Or better: I'd like to hand in my voucher and get back two vouchers, one for the first 5 bytes and one for the rest. That's actually, what I thought, take() is doing...

Without slicing, that's impossible without iterating over the elements multiple times. It would be possible to have a function like take that did something like return a tuple of two ranges where the first one is the taken elements, and the second one is the rest, but without slicing, that could only be done by calling save so that the two ranges are independent copies of the original range, and then the range that doesn't have the elements that were taken off has to internally pop off all of the elements that were taken so that its first element is the first one that wasn't taken. There's no way to magically skip the taken elements for the second range without slicing.

The alternative is to manually iterate through all of the elements that you want to take, doing whatever you're going to do with them, and then doing whatever you want to do with rest of the range after that. Then iteration only occurs once. But that means that you don't have a range of the taken elements and can't do something like pass them to another algorithm as a range without constructing a new range from them, and if you're going to do that, you might as well just call save, pass the range to take, and the call drop on the original range to pop off the elements that were taken.

- Jonathan M Davis



July 07, 2019
On Saturday, July 6, 2019 12:58:52 PM MDT Adam D. Ruppe via Digitalmars-d- learn wrote:
> On Saturday, 6 July 2019 at 18:17:26 UTC, Jonathan M Davis wrote:
> > take _always_ consumes the range that it's given
>
> not if it hasSlicing. see http://dpldocs.info/experimental-docs/source/std.range.d.html#L2015
>
> but yeah otherwise i agree with you

True enough, though if the code is generic, it still pretty much has to assume that the range was consumed.

- Jonathan M Davis



July 07, 2019
On Sunday, 7 July 2019 at 09:01:53 UTC, Jonathan M Davis wrote:
> Without slicing, that's impossible without iterating over the elements multiple times.

That's what I thought too, but meanwhile I think it is possible. To get it working, the second range needs to know about the first one and when the second one is queried the first time it has to notify the first one, that it has to remove all remaining items from the underlying range (and in case it still needs them, save them) and never use it again.

The advantage of this setup over copying the items immediately is some sort of laziness. It might happen, that the second one will never query or it might happen, that the first one finished allready when the second one queries. In this cases no additional memory is needed.
Even if the second one is queried before the first one finished, the needed buffer might be much smaller.
July 07, 2019
On Sunday, July 7, 2019 12:36:42 PM MDT berni via Digitalmars-d-learn wrote:
> On Sunday, 7 July 2019 at 09:01:53 UTC, Jonathan M Davis wrote:
> > Without slicing, that's impossible without iterating over the elements multiple times.
>
> That's what I thought too, but meanwhile I think it is possible. To get it working, the second range needs to know about the first one and when the second one is queried the first time it has to notify the first one, that it has to remove all remaining items from the underlying range (and in case it still needs them, save them) and never use it again.
>
> The advantage of this setup over copying the items immediately is
> some sort of laziness. It might happen, that the second one will
> never query or it might happen, that the first one finished
> allready when the second one queries. In this cases no additional
> memory is needed.
> Even if the second one is queried before the first one finished,
> the needed buffer might be much smaller.

Having one range know about the other isn't enough. That just means that the take range would tell the other range that it had popped an element off, and then the other would know that it had to pop an element off. That still involves popping the same element on different ranges twice. In order to have popFront only called once, they have to be the same range, so when take pops an element off, it pops an element off of the other range. If both the take range and the drop range are wrappers, then it would be possible to do something with pointers to the original range where the number of elements that has been popped off is kept track of, and as long as the take range pops them all first, the drop range can just continue iterating through the original range, but if any elements from the drop range got iterated first, then it would have to save the range and start popping from there not to screw up the take range. And since pointers to the original range would have to be used to avoid having independent copies, you run the risk of all of this being screwed up by a piece of code popping an element from the original range at some point. Also, all of that machinery would likely quickly become far more expensive than simply saving the range to call take on it and then calling drop on the original. Yes, that pops the same elements multiple times, but wrapper ranges already end up calling chains of popFronts, fronts, and emptys as they iterate through a range. You rarely really get a single popFront call for a popFront unless the optimizer has actually managed to optimize out all of the layers.

You can try mucking around with your own implemention of take if you want, but I'd suggest that you're just better off taking the approach that it's expected that if you use take, you're going to have to iterate through those elements twice and that if that's not acceptable, it's better to do whatever you need to do with the take range by manually operating on each of the elements rather than having them be a range. Unless random-access ranges with slicing are involved, trying to do anything that effectively slices a range without iterating over the elements multiple times is almost certainly going to be a mess. At some point, if you really want slicing semantics, then you need to make sure that your range actually has slicing rather than trying to act like it does when it doesn't.

- Jonathan M Davis



July 09, 2019
On Sunday, 7 July 2019 at 21:55:17 UTC, Jonathan M Davis wrote:
> Having one range know about the other isn't enough. That just means that the take range would tell the other range that it had popped an element off, and then the other would know that it had to pop an element off. That still involves popping the same element on different ranges twice.

Sounds like you didn't understand what I meant. So I tried to implement what I had in mind and I have to confess, that there is something about ranges that I do not understand yet:

>import std.stdio;
>import std.array;
>
>void main()
>{
>    auto a = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15];
>
>    call(a,5);
>}
>
>void call(T)(ref T range, int c)
>{
>    struct Take
>    {
>        T* take_range;
>        int count;
>
>        @property bool empty=false;
>        @property auto front() { return (*take_range).front; }
>        void popFront()
>        {
>            (*take_range).popFront();
>            if (--count<=0) empty=true;
>        }
>
>        T* get_range()
>        {
>            assert(count==0);
>            while (!empty) popFront();
>            return take_range;
>        }
>    }
>
>    auto take = Take(&range,c);
>
>//    writeln(take);
>    while (!take.empty) { write(take.front); take.popFront(); }
>
>    auto original_range = take.get_range();
>
>    writeln(*original_range);
>}

Running this program leads to
>12345[6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
as expected.

But when I replace that loop at the bottom by writeln(take) the assertion in get_range fails (count==5). As all the functions are called in the same order by both the loop and writeln, I confess, that writeln somehow manages to use a copy of "take" although this isn't a ForwardRange and copying should therefore not be possible.

> You can try mucking around with your own implemention of take if you want, but I'd suggest that you're just better off taking the approach that it's expected that if you use take,

I't not about improving or doing things different. It's just about me understanding, what happens.



July 09, 2019
Hurray, it works! :-)

https://run.dlang.io/is/2GMq34

I have to use classes to avoid copying when arguments are passed to a function. (And yes, there should of course be much more checks, especially when there are to few elements in the original range. And it could be speed improved and so on... I wanted to keep it simple.)


1 2
Next ›   Last »