getNext

I think I figured out a comfortable and all-encompassing means to define a simplified interface for an input range.

Currently input ranges need to define empty, front, and popFront. That works but it's a bit heavy for simple input ranges. We've been discussing simplified interfaces in this group but couldn't find one that satisfied all use cases.

Consider this:

T* getNext(R, T)(ref R range, ref T item);

Semantics: if the range wants to expose addresses of its elements, it returns a pointer to the current element and also advances to the next element. Otherwise (i.e. the range does not have or does not want to expose addresses of its elements), the range fills "item" with the current value, again moves on to the next value, and returns &item.

In all cases, when there are no more elements in the range, getNext returns null.

getNext is easy to define for e.g. arrays and files. How does it sound? Does it bring significant simplification?


Andrei

July 13, 2010

Re: getNext

Posted by Jonathan M Davis
in reply to Andrei Alexandrescu

Permalink

Jonathan M Davis

Posted in reply to Andrei Alexandrescu

Permalink

On Monday 12 July 2010 20:48:05 Andrei Alexandrescu wrote:
> I think I figured out a comfortable and all-encompassing means to define a simplified interface for an input range.
> 
> Currently input ranges need to define empty, front, and popFront. That works but it's a bit heavy for simple input ranges. We've been discussing simplified interfaces in this group but couldn't find one that satisfied all use cases.
> 
> Consider this:
> 
> T* getNext(R, T)(ref R range, ref T item);
> 
> Semantics: if the range wants to expose addresses of its elements, it returns a pointer to the current element and also advances to the next element. Otherwise (i.e. the range does not have or does not want to expose addresses of its elements), the range fills "item" with the current value, again moves on to the next value, and returns &item.
> 
> In all cases, when there are no more elements in the range, getNext returns null.
> 
> getNext is easy to define for e.g. arrays and files. How does it sound? Does it bring significant simplification?
> 
> 
> Andrei

What happens to empty, front, and popFront then? Is this a case where something must define either empty, front, and popFront or getNext to be an input range? Or is this something else? Personally, I find empty, front, and popFront quite useful and simple for anything that I've done, and I'd find getNext to be a lot more verbose. getNext may be great if you're using empty, front, and popFront pretty much simultaneously, but if you don't want to use all of them for whatever you're doing, then getNext is overkill.

So, essentially, I suppose the issue is that I don't see what you intend to do to front, popFront, and empty if you add getNext into the mix. I do not want to see front, popFront, or empty go away. Having getNext as an additional function to make it easier to iterate over a range and do something with each element as you iterate wouldn't hurt my feelings any, but I definitely don't want to lose popFront, empty, or front.

As for simplification, it strikes me as more complicated in every case except where you are iterating over a range and processing each element as you iterate. Granted, that's a common use case, but there are plenty of other cases, where if you were forced to use getNext instead of having popFront, empty, and front, that would be a major problem.

- Jonathan M Davis

July 13, 2010

Re: getNext

Posted by Andrei Alexandrescu
in reply to Jonathan M Davis

Permalink

Andrei Alexandrescu

Posted in reply to Jonathan M Davis

Permalink

On 07/12/2010 11:21 PM, Jonathan M Davis wrote:
> On Monday 12 July 2010 20:48:05 Andrei Alexandrescu wrote:
>> I think I figured out a comfortable and all-encompassing means to define
>> a simplified interface for an input range.
>>
>> Currently input ranges need to define empty, front, and popFront. That
>> works but it's a bit heavy for simple input ranges. We've been
>> discussing simplified interfaces in this group but couldn't find one
>> that satisfied all use cases.
>>
>> Consider this:
>>
>> T* getNext(R, T)(ref R range, ref T item);
>>
>> Semantics: if the range wants to expose addresses of its elements, it
>> returns a pointer to the current element and also advances to the next
>> element. Otherwise (i.e. the range does not have or does not want to
>> expose addresses of its elements), the range fills "item" with the
>> current value, again moves on to the next value, and returns&item.
>>
>> In all cases, when there are no more elements in the range, getNext
>> returns null.
>>
>> getNext is easy to define for e.g. arrays and files. How does it sound?
>> Does it bring significant simplification?
>>
>>
>> Andrei
>
> What happens to empty, front, and popFront then? Is this a case where something
> must define either empty, front, and popFront or getNext to be an input range? Or
> is this something else? Personally, I find empty, front, and popFront quite
> useful and simple for anything that I've done, and I'd find getNext to be a lot
> more verbose. getNext may be great if you're using empty, front, and popFront
> pretty much simultaneously, but if you don't want to use all of them for
> whatever you're doing, then getNext is overkill.

An input range could either define the troika empty/front/popFront or getNext. In the former case, getNext detects the presence of the troika and uses it transparently. That's great because client code can simply use getNext throughout.

> So, essentially, I suppose the issue is that I don't see what you intend to do
> to front, popFront, and empty if you add getNext into the mix.

There is no aggravation brought to the current definitions.

> I do not want to
> see front, popFront, or empty go away. Having getNext as an additional function
> to make it easier to iterate over a range and do something with each element as
> you iterate wouldn't hurt my feelings any, but I definitely don't want to lose
> popFront, empty, or front.

My feelings too. The troika is here to stay, and definitely necessary for any range richer than an input range.

> As for simplification, it strikes me as more complicated in every case except
> where you are iterating over a range and processing each element as you iterate.
> Granted, that's a common use case, but there are plenty of other cases, where if
> you were forced to use getNext instead of having popFront, empty, and front,
> that would be a major problem.

Yah, truth be told getNext won't win a prize for brevity. You need to define both a variable and a pointer to use it:

T meh;
T * neh;
while ((neh = getNext(r, meh))) {
   ... process *neh ...
}

I've just had an idea that is so dark and devious, I was almost afraid to try it. But it works like a charm. Consider:

T * getNext(R, E)(ref R range,
                  ref E store = *(cast(E*) alloca(E.sizeof))
{
    ...
}

With this, allocating a dummy buffer on caller's stack is automated, so client code can just write:

for (T * p; (p = getNext(r)); )  {
   ... process *p ...
}

I feel dirty.


Andrei

July 13, 2010

Re: getNext

Posted by Jesse Phillips
in reply to Andrei Alexandrescu

Permalink

Jesse Phillips

Posted in reply to Andrei Alexandrescu

Permalink

Andrei Alexandrescu Wrote:

> getNext is easy to define for e.g. arrays and files. How does it sound? Does it bring significant simplification?
> 
> 
> Andrei

I'm with Jonathan on this. I don't really see much of a benefit. popFront, empty, front are very easy to define and simple to use.

Java uses getNext for its iterators. Though it calls it 'next' and throws an exception when trying to call without any elements. This leads it to also provide a hasNext function.

July 13, 2010

Re: getNext

Posted by Ellery Newcomer
in reply to Andrei Alexandrescu

Permalink

Ellery Newcomer

Posted in reply to Andrei Alexandrescu

Permalink

On 07/12/2010 10:48 PM, Andrei Alexandrescu wrote:
> I think I figured out a comfortable and all-encompassing means to define
> a simplified interface for an input range.
>
> Currently input ranges need to define empty, front, and popFront. That
> works but it's a bit heavy for simple input ranges. We've been
> discussing simplified interfaces in this group but couldn't find one
> that satisfied all use cases.
>
> Consider this:
>
> T* getNext(R, T)(ref R range, ref T item);
>
> Semantics: if the range wants to expose addresses of its elements, it
> returns a pointer to the current element and also advances to the next
> element. Otherwise (i.e. the range does not have or does not want to
> expose addresses of its elements), the range fills "item" with the
> current value, again moves on to the next value, and returns &item.
>
> In all cases, when there are no more elements in the range, getNext
> returns null.

let's see here

Range AhmAStream;

ubyte[] buf = new ubyte[4];
ubyte[]* ruf = AhmAStream.getNext(buf);
assert(ruf && ruf.length == 4);
int i = * (cast(int*) ruf.ptr)
buf = new ubyte[i];
ruf = AhmAStream.getNext(buf);
assert(ruf && ruf.length == i);

Something like this could work within this interface, couldn't it?

Think you might be on to something

>
> getNext is easy to define for e.g. arrays and files. How does it sound?
> Does it bring significant simplification?
>
>
> Andrei

July 13, 2010

Re: getNext

Posted by Rory McGuire
in reply to Andrei Alexandrescu

Permalink

Rory McGuire

Posted in reply to Andrei Alexandrescu

Permalink

On Tue, 13 Jul 2010 06:38:55 +0200, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 07/12/2010 11:21 PM, Jonathan M Davis wrote:
>> On Monday 12 July 2010 20:48:05 Andrei Alexandrescu wrote:
>>> I think I figured out a comfortable and all-encompassing means to define
>>> a simplified interface for an input range.
>>>
>>> Currently input ranges need to define empty, front, and popFront. That
>>> works but it's a bit heavy for simple input ranges. We've been
>>> discussing simplified interfaces in this group but couldn't find one
>>> that satisfied all use cases.
>>>
>>> Consider this:
>>>
>>> T* getNext(R, T)(ref R range, ref T item);
>>>
>>> Semantics: if the range wants to expose addresses of its elements, it
>>> returns a pointer to the current element and also advances to the next
>>> element. Otherwise (i.e. the range does not have or does not want to
>>> expose addresses of its elements), the range fills "item" with the
>>> current value, again moves on to the next value, and returns&item.
>>>
>>> In all cases, when there are no more elements in the range, getNext
>>> returns null.
>>>
>>> getNext is easy to define for e.g. arrays and files. How does it sound?
>>> Does it bring significant simplification?
>>>
>>>
>>> Andrei
>>
>> What happens to empty, front, and popFront then? Is this a case where something
>> must define either empty, front, and popFront or getNext to be an input range? Or
>> is this something else? Personally, I find empty, front, and popFront quite
>> useful and simple for anything that I've done, and I'd find getNext to be a lot
>> more verbose. getNext may be great if you're using empty, front, and popFront
>> pretty much simultaneously, but if you don't want to use all of them for
>> whatever you're doing, then getNext is overkill.
>
> An input range could either define the troika empty/front/popFront or getNext. In the former case, getNext detects the presence of the troika and uses it transparently. That's great because client code can simply use getNext throughout.
>
>> So, essentially, I suppose the issue is that I don't see what you intend to do
>> to front, popFront, and empty if you add getNext into the mix.
>
> There is no aggravation brought to the current definitions.
>
>> I do not want to
>> see front, popFront, or empty go away. Having getNext as an additional function
>> to make it easier to iterate over a range and do something with each element as
>> you iterate wouldn't hurt my feelings any, but I definitely don't want to lose
>> popFront, empty, or front.
>
> My feelings too. The troika is here to stay, and definitely necessary for any range richer than an input range.
>
>> As for simplification, it strikes me as more complicated in every case except
>> where you are iterating over a range and processing each element as you iterate.
>> Granted, that's a common use case, but there are plenty of other cases, where if
>> you were forced to use getNext instead of having popFront, empty, and front,
>> that would be a major problem.
>
> Yah, truth be told getNext won't win a prize for brevity. You need to define both a variable and a pointer to use it:
>
> T meh;
> T * neh;
> while ((neh = getNext(r, meh))) {
>     ... process *neh ...
> }
>
> I've just had an idea that is so dark and devious, I was almost afraid to try it. But it works like a charm. Consider:
>
> T * getNext(R, E)(ref R range,
>                    ref E store = *(cast(E*) alloca(E.sizeof))
> {
>      ...
> }
>
> With this, allocating a dummy buffer on caller's stack is automated, so client code can just write:
>
> for (T * p; (p = getNext(r)); )  {
>     ... process *p ...
> }
>
> I feel dirty.
>
>
> Andrei

:) I like it.

But then looking at the alloca man page... yikes

July 13, 2010

alloca() notes [Was: Re: getNext]

Posted by bearophile
in reply to Andrei Alexandrescu

Permalink

bearophile

Posted in reply to Andrei Alexandrescu

Permalink

Andrei Alexandrescu:
> T * getNext(R, E)(ref R range,
>                    ref E store = *(cast(E*) alloca(E.sizeof))
> {

Time ago I have filed bug http://d.puremagic.com/issues/show_bug.cgi?id=3822 on alloca(), but I think this code is not hit by it.

I have recently written some C99 code and I have appreciated its Variable Length Arrays both in syntax, safety and performance gain.

Compared to alloca() the VLAs have some advantages:
- Their syntax is shorter and nicer, and it's natural for a C programmer;
- no imports needed;
- the semantics is more clear, because they define a new variable, so the size of their scope is the same as the in all other variables, while alloca() seems to have two possible different implementations;
- VLAs are more typesafe, there is no need to use a cast. While the cast needed by alloca() may forbid it in SafeD code.
- VLAs don't need sizeof(T), you just specify a type.

The result is that the usage of alloca() feels dirty in both C and D, but Variable Length Arrays (VLA) of C99 don't feel dirty at all.

On the other hand VLAs (and alloca) can produce a stack overflow, they are not so commonly useful (as alloca), and they can become essentially a third kind of arrays for D (this is not good).

In D I'd like something like alloca() that needs no casts and is able to find the size by itself, avoiding the bug prone usage of T.sizeof.

A way to do it is to use the same syntax used by C99 and allow a variable in the definition of a stack array:

auto foo(int n) {
    int[n] arr;
    return arr;
}

The main difference is that in D when you return arr it gets copied by value.

Currently this code works:

int foo(int n) {
     return 3 * n + 1;
}
auto bar() {
    immutable int n = 5;
    int[foo(n)] arr;
    return arr;
}
void main() {}

because dmd runs foo() at compile time, so arr is allocated on the stack with a statically known size. If VLAs get introduced in D, then in this case the compiler has to do what it currently it doesn't do: to run a function at compile-time if possible (and create a fixed length array) and run it at runtime if that's impossible (and create a VLA).

To avoid that in some cases you can use something like:

int[StaticValue!(n)] arr;

Where StaticValue is a template that makes sure n is a value always known at compile-time.

But this is a little messy, and I don't like it too much.

Keep in mind that currently this gives an error:

int[] bar() {
    int[2] arr;
    return arr;
}
void main() {}

You have to use the auto return type or:

int[2] bar() {
    int[2] arr;
    return arr;
}
void main() {}

With VLAs you are forced to use auto, but it can't work anyway at the return point.
So I think the normal C99 syntax for VLAs is not good for D2. On the other hand alloca() semantics and syntax are bad. So D alloca() can be replaced by something better like:

T* ptr = StackAlloc!T(n);
Or:
T* ptr = Alloca!T(n);
Or:
T[] arr = VLA!T(n);

But this created an array that looks like a dynamic array, but its memory is on the stack so it must not escape the function. So it can be a bug-prone, similar to this that currently compiles:

int[] foo() {
    int[1] stackArray;
    int[] dynArray = stackArray[0 .. 1];
    return dynArray;
}
void main() {}

See enhancement request http://d.puremagic.com/issues/show_bug.cgi?id=4451

So with the syntax T[] arr=VLA!T(n); the compiler has to disallow the return of arr and its slices.

Whith those three syntaxes your signature becomes:

T* getNext(R, E)(ref R range, ref E store = *StackAlloc!E(1)) {
T* getNext(R, E)(ref R range, ref E store = *Alloca!E(1)) {
T* getNext(R, E)(ref R range, ref E store = *(VLA!E(1).ptr)) {

Bye,
bearophile

July 13, 2010

Re: getNext

Posted by Shin Fujishiro
in reply to Andrei Alexandrescu

Permalink

Shin Fujishiro

Posted in reply to Andrei Alexandrescu

Permalink

Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> I've just had an idea that is so dark and devious, I was almost afraid to try it. But it works like a charm. Consider:
> 
> T * getNext(R, E)(ref R range,
>                    ref E store = *(cast(E*) alloca(E.sizeof))
> {
>      ...
> }
> 
> With this, allocating a dummy buffer on caller's stack is automated, so client code can just write:
> 
> for (T * p; (p = getNext(r)); )  {
>     ... process *p ...
> }
> 
> I feel dirty.

How about a TLS variable?

template temporary(T)
{
    static T temporary;
}
E* getNext(R, E)(ref R range, ref E store = temporary!E);


Shin

July 13, 2010

Re: getNext

Posted by Shin Fujishiro
in reply to Shin Fujishiro

Permalink

Shin Fujishiro

Posted in reply to Shin Fujishiro

Permalink

Shin Fujishiro <rsinfu@gmail.com> wrote:
> How about a TLS variable?
> 
> template temporary(T)
> {
>     static T temporary;
> }
> E* getNext(R, E)(ref R range, ref E store = temporary!E);

Yeah, I just missed it.  TLS is not usable.  I had spoken carelessly...


Shin

July 13, 2010

Re: getNext

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Permalink

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

Permalink

On Mon, 12 Jul 2010 23:48:05 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> I think I figured out a comfortable and all-encompassing means to define a simplified interface for an input range.
>
> Currently input ranges need to define empty, front, and popFront. That works but it's a bit heavy for simple input ranges. We've been discussing simplified interfaces in this group but couldn't find one that satisfied all use cases.
>
> Consider this:
>
> T* getNext(R, T)(ref R range, ref T item);
>
> Semantics: if the range wants to expose addresses of its elements, it returns a pointer to the current element and also advances to the next element. Otherwise (i.e. the range does not have or does not want to expose addresses of its elements), the range fills "item" with the current value, again moves on to the next value, and returns &item.
>
> In all cases, when there are no more elements in the range, getNext returns null.
>
> getNext is easy to define for e.g. arrays and files. How does it sound? Does it bring significant simplification?

Yes, yes, yes!

A question though -- whenever a pointer occurs, we always cringe, especially in safeD.  will getNext be unsafe?

BTW, I like the alloca thingy, that's really cool.

One thing I just thought of, getNext should be split into two functions, the one you have, and:

ElementType!R *getNext(R)(ref R range)

To avoid having to supply the item or use alloca when the range is going to give you back a pointer to its internals anyways (tested with a template constraint).

-Steve

Top | Forum index | About this forum

Forums