Jump to page: 1 2
Thread overview
Is there something like a consuming take?
Jul 06, 2019
berni
Jul 06, 2019
a11e99z
Jul 06, 2019
berni
Jul 06, 2019
a11e99z
Jul 06, 2019
berni
Jul 06, 2019
berni
Jul 06, 2019
Adam D. Ruppe
Jul 06, 2019
berni
Jul 06, 2019
Adam D. Ruppe
Jul 06, 2019
Jonathan M Davis
Jul 06, 2019
Adam D. Ruppe
Jul 07, 2019
berni
Jul 07, 2019
Jonathan M Davis
Jul 07, 2019
berni
Jul 07, 2019
Jonathan M Davis
Jul 09, 2019
berni
Jul 09, 2019
berni
Jul 07, 2019
Jonathan M Davis
Jul 06, 2019
Jonathan M Davis
July 06, 2019
I want to copy the first n items of a range to an array, removing these items from the range.

This works:

> foreach (i;0..n)
> {
>    data ~= r.front;
>    r.popFront();
> }

but looks a little bit arkward.

I came up with this now:

> data = r.take(n).array;

This works partly, because the values of r are not consumed. So I have to call afterwards:

> r = r.drop(n);

Now I wonder, if it is possible to do this with one single call, something like

data = r.take_consuming(n).array;

Does there something like this exist?


July 06, 2019
On Saturday, 6 July 2019 at 11:20:50 UTC, berni wrote:
> I want to copy the first n items of a range to an array, I came up with this now:
>> data = r.take(n).array;
> This works partly, because the values of r are not consumed. So I have to call afterwards:
>> r = r.drop(n);
> Now I wonder, if it is possible to do this with one single call, something like
> data = r.take_consuming(n).array;
> Does there something like this exist?

sure
auto take_consuming( R )( ref R r, int cnt ) {
    auto tmp = r.take( cnt ).array;
    r = r.drop( cnt );
    return tmp;
}
don't thank
July 06, 2019
On Saturday, 6 July 2019 at 11:48:51 UTC, a11e99z wrote:
> sure
> auto take_consuming( R )( ref R r, int cnt ) {
>     auto tmp = r.take( cnt ).array;
>     r = r.drop( cnt );
>     return tmp;
> }
> don't thank

Doesn't look like what I'm looking for, as it is exactly the same I allready found.

Maybe I need to explain, what I dislike with this approach: take() calls popFront n times and drop() calls popFront another n times giving a total of 2n times (depending on the underlying range, this might cause lot's of calulcations be done twice. The first version with the foreach loop calls popFront only n times.


July 06, 2019
On Saturday, 6 July 2019 at 12:10:13 UTC, berni wrote:
> On Saturday, 6 July 2019 at 11:48:51 UTC, a11e99z wrote:
>
> Maybe I need to explain, what I dislike with this approach: take() calls popFront n times and drop() calls popFront another n times giving a total of 2n times (depending on the underlying range, this might cause lot's of calulcations be done twice. The first version with the foreach loop calls popFront only n times.

auto take_consuming( R )( ref R r, int cnt ) {
    import std.range.primitives : hasSlicing;
    static if (hasSlicing!R) { // without allocations
    	auto tmp = r[0..cnt];
    	r = r[cnt..$]; // or r.popFronN( cnt ); // O(1)
        return tmp;
    } else { // loop range once
        auto tmp = uninitializedArray!( ElementType!R[])( cnt);
        int k = 0;
        for (; !r.empty && k<cnt; ++k, r.popFront) tmp[ k] = r.front;
    	return tmp[ 0..k];
    }
}

July 06, 2019
Now it's getting weird. Meanwhile I encountered, that take() sometimes consumes and sometimes not. Where can I learn, what is the reason behind this behavior? And how can I handle this?
July 06, 2019
A small example showing this strange behaviour:

>import std.stdio;
>import std.algorithm.iteration;
>import std.range;
>
>enum BUFFER_SIZE = 1024;
>
>void main(string[] args)
>{
>    auto a = (new File(args[1]))
>        .byChunk(BUFFER_SIZE)
>        .joiner;
>
>    writeln(a.take(5));
>    writeln(a);
>}

Using a file, containing the bytes 1 to 10 I get:

>[ 1, 2, 3, 4, 5 ]
>[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]

take does not consume.

When I now change BUFFER_SIZE to 2 I get:

>[ 1, 2, 3, 4, 5 ]
>[ 5, 6, 7, 8, 9, 10 ]

Now the first two buffers have been consumend and the third ([5, 6]) not.

Feels like a bug in Phobos. But maybe I do not understand, what's happening and this is correct behaviour. Can anyone explain or confirm, that this is a bug?
July 06, 2019
On Saturday, 6 July 2019 at 14:12:36 UTC, berni wrote:
> Meanwhile I encountered, that take() sometimes consumes and sometimes not.

It depends on what you're passing.

So take is defined as just getting the first N elements from the given range. So what happens next depends on what it is "taking" from (I don't like the name "take" exactly because that implies, well, taking. What the function really does is more like "view into first N elements".


With input ranges, iterating over them consumes it implicitly, so you could say that take *always* consumes input ranges (at least once it gets iterated over).

But, it will frequently consume a copy of the range instead of the one you have at the top level, since ranges are passed by value.

If you want it to be seen outside, `ref` is the general answer. You might find success with refRange in some cases.

http://dpldocs.info/experimental-docs/std.range.refRange.html



But if you are passing something with slicing, like a plain array, take will never actually consume it, and instead just slice the input, even if it is ref. This gets the view of those first elements in cheaper way.

So going back to your original definition:

> I want to copy the first n items of a range to an array, removing these items from the range.

As far as I know, none of the std.range functions are defined to do this.

I'd probably write your own that:

1) takes the range by ref so changes are visible outside
2) iterates over it with popFront
3) returns the copy

that should fulfill all the requirements. you could slightly optimize arrays (or other hasSlicing things) like

int[] yourFunction(ref int[] arr, int n) {
   auto ret = arr[0 .. n];
   arr = arr[n .. $];
   return ret;
}


that is, just slicing and consuming in one go for each side and then you don't even have to actually copy it, just return the slice.
July 06, 2019
On Saturday, 6 July 2019 at 14:40:23 UTC, berni wrote:
>>        .byChunk(BUFFER_SIZE)

byChunk is defined to reuse its buffer between calls.

http://dpldocs.info/experimental-docs/std.stdio.byChunk.1.html#examples

This means previous contents are overwritten when you advance.


> When I now change BUFFER_SIZE to 2 I get:
>
>>[ 1, 2, 3, 4, 5 ]
>>[ 5, 6, 7, 8, 9, 10 ]
>
> Now the first two buffers have been consumend and the third ([5, 6]) not.

So here, the take call gave you a view into the first 5 elements. It read one and two and printed them, then byChunk.popFront was called, overwriting the buffer with 3,4 and take passed that to writeln, then popFront again, overwriting with 5,6.

writeln printed out 5, and take, having finished its work, left the buffer alone in its state.


Now, you print the other thing, which still has 5,6 in the buffer, then popFront, overwrites with 7,8, etc and so on.


So this is a case of input range behavior - always consuming the underlying file - combined with buffering of two elements at once, leaving 5,6 behind, and the reuse of the buffer meaning you see that 5,6 again on the next call.
July 06, 2019
On Saturday, 6 July 2019 at 14:48:04 UTC, Adam D. Ruppe wrote:
> [...]
> So this is a case of input range behavior - always consuming the underlying file - combined with buffering of two elements at once, leaving 5,6 behind, and the reuse of the buffer meaning you see that 5,6 again on the next call.

Thanks for clearifing what happens. In my oppinion the behaviour of take() should be better defined. It's clear, that take() returns a range with the first n elements of the underlaying range (and that is done lazily). But it's not specified what happens with the underlaying range. As the behaviour is unpredictable (or at least hard to predict), one should assume, that the underlaying range is completely destroyed by take(). This makes take() much less usefull, than it could be, in my eyes. :-(


July 06, 2019
On Saturday, July 6, 2019 8:12:36 AM MDT berni via Digitalmars-d-learn wrote:
> Now it's getting weird. Meanwhile I encountered, that take() sometimes consumes and sometimes not. Where can I learn, what is the reason behind this behavior? And how can I handle this?

take _always_ consumes the range that it's given. The problem is that some types of ranges are implicitly saved when they're copied, whereas others aren't, and when a range is implicitly saved when it's copied, you end up with the copy being consumed. Dynamic arrays are implicitly saved when they're copied, so the range you pass is saved, and the copy is consumed instead of the original.

In generic code, you have to assume that once a range has been copied, you can't use it anymore (just the copy) precisely because the semantics of copying differ depending on the type of the range. You can only use a range after copying it if you know what type of range you're dealing with and how it behaves. So, you can rely on a dynamic array implicitly saving when it's passed to take, but in generic code, you really shouldn't be using a range again once you pass it to take, because what actually happens is dependent on the type of the range.

In general what this means is that if you pass a range to a function, and you then want to use the range again afterwards, you need to call save when passing it to the function, and otherwise, you just assume that it's consumed and don't use it again. You certainly don't pass it to a function with the expectation of some elements being consumed and then continue to use the rest of the range unless the function takes its argument by ref or by pointer (which relatively few range-based functions do).

If you want a function that's guaranteed to not implicitly copy a range, then it needs to accept the argument by ref or take a pointer to it. In the case where a function doesn't have to do the work lazily, ref would work, but in a case like take where you're returning a wrapper range, pointers would be required. So, a version of take that didn't ever copy the range it's given and thus never risked implicitly saving the range it was passed would have to either take a pointer to it or take it by ref and then take the address of the ref. In either case, the code using such a take would then have to ensure that the original range didn't leave scope and get destroyed before the take range was consumed, or the take range would then refer to invalid memory.

- Jonathan M Davis



« First   ‹ Prev
1 2