View mode: basic / threaded / horizontal-split · Log in · Help
July 13, 2010
getNext
I think I figured out a comfortable and all-encompassing means to define 
a simplified interface for an input range.

Currently input ranges need to define empty, front, and popFront. That 
works but it's a bit heavy for simple input ranges. We've been 
discussing simplified interfaces in this group but couldn't find one 
that satisfied all use cases.

Consider this:

T* getNext(R, T)(ref R range, ref T item);

Semantics: if the range wants to expose addresses of its elements, it 
returns a pointer to the current element and also advances to the next 
element. Otherwise (i.e. the range does not have or does not want to 
expose addresses of its elements), the range fills "item" with the 
current value, again moves on to the next value, and returns &item.

In all cases, when there are no more elements in the range, getNext 
returns null.

getNext is easy to define for e.g. arrays and files. How does it sound? 
Does it bring significant simplification?


Andrei
July 13, 2010
Re: getNext
On Monday 12 July 2010 20:48:05 Andrei Alexandrescu wrote:
> I think I figured out a comfortable and all-encompassing means to define
> a simplified interface for an input range.
> 
> Currently input ranges need to define empty, front, and popFront. That
> works but it's a bit heavy for simple input ranges. We've been
> discussing simplified interfaces in this group but couldn't find one
> that satisfied all use cases.
> 
> Consider this:
> 
> T* getNext(R, T)(ref R range, ref T item);
> 
> Semantics: if the range wants to expose addresses of its elements, it
> returns a pointer to the current element and also advances to the next
> element. Otherwise (i.e. the range does not have or does not want to
> expose addresses of its elements), the range fills "item" with the
> current value, again moves on to the next value, and returns &item.
> 
> In all cases, when there are no more elements in the range, getNext
> returns null.
> 
> getNext is easy to define for e.g. arrays and files. How does it sound?
> Does it bring significant simplification?
> 
> 
> Andrei

What happens to empty, front, and popFront then? Is this a case where something 
must define either empty, front, and popFront or getNext to be an input range? Or 
is this something else? Personally, I find empty, front, and popFront quite 
useful and simple for anything that I've done, and I'd find getNext to be a lot 
more verbose. getNext may be great if you're using empty, front, and popFront 
pretty much simultaneously, but if you don't want to use all of them for 
whatever you're doing, then getNext is overkill.

So, essentially, I suppose the issue is that I don't see what you intend to do 
to front, popFront, and empty if you add getNext into the mix. I do not want to 
see front, popFront, or empty go away. Having getNext as an additional function 
to make it easier to iterate over a range and do something with each element as 
you iterate wouldn't hurt my feelings any, but I definitely don't want to lose 
popFront, empty, or front.

As for simplification, it strikes me as more complicated in every case except 
where you are iterating over a range and processing each element as you iterate. 
Granted, that's a common use case, but there are plenty of other cases, where if 
you were forced to use getNext instead of having popFront, empty, and front, 
that would be a major problem.

- Jonathan M Davis
July 13, 2010
Re: getNext
On 07/12/2010 11:21 PM, Jonathan M Davis wrote:
> On Monday 12 July 2010 20:48:05 Andrei Alexandrescu wrote:
>> I think I figured out a comfortable and all-encompassing means to define
>> a simplified interface for an input range.
>>
>> Currently input ranges need to define empty, front, and popFront. That
>> works but it's a bit heavy for simple input ranges. We've been
>> discussing simplified interfaces in this group but couldn't find one
>> that satisfied all use cases.
>>
>> Consider this:
>>
>> T* getNext(R, T)(ref R range, ref T item);
>>
>> Semantics: if the range wants to expose addresses of its elements, it
>> returns a pointer to the current element and also advances to the next
>> element. Otherwise (i.e. the range does not have or does not want to
>> expose addresses of its elements), the range fills "item" with the
>> current value, again moves on to the next value, and returns&item.
>>
>> In all cases, when there are no more elements in the range, getNext
>> returns null.
>>
>> getNext is easy to define for e.g. arrays and files. How does it sound?
>> Does it bring significant simplification?
>>
>>
>> Andrei
>
> What happens to empty, front, and popFront then? Is this a case where something
> must define either empty, front, and popFront or getNext to be an input range? Or
> is this something else? Personally, I find empty, front, and popFront quite
> useful and simple for anything that I've done, and I'd find getNext to be a lot
> more verbose. getNext may be great if you're using empty, front, and popFront
> pretty much simultaneously, but if you don't want to use all of them for
> whatever you're doing, then getNext is overkill.

An input range could either define the troika empty/front/popFront or 
getNext. In the former case, getNext detects the presence of the troika 
and uses it transparently. That's great because client code can simply 
use getNext throughout.

> So, essentially, I suppose the issue is that I don't see what you intend to do
> to front, popFront, and empty if you add getNext into the mix.

There is no aggravation brought to the current definitions.

> I do not want to
> see front, popFront, or empty go away. Having getNext as an additional function
> to make it easier to iterate over a range and do something with each element as
> you iterate wouldn't hurt my feelings any, but I definitely don't want to lose
> popFront, empty, or front.

My feelings too. The troika is here to stay, and definitely necessary 
for any range richer than an input range.

> As for simplification, it strikes me as more complicated in every case except
> where you are iterating over a range and processing each element as you iterate.
> Granted, that's a common use case, but there are plenty of other cases, where if
> you were forced to use getNext instead of having popFront, empty, and front,
> that would be a major problem.

Yah, truth be told getNext won't win a prize for brevity. You need to 
define both a variable and a pointer to use it:

T meh;
T * neh;
while ((neh = getNext(r, meh))) {
   ... process *neh ...
}

I've just had an idea that is so dark and devious, I was almost afraid 
to try it. But it works like a charm. Consider:

T * getNext(R, E)(ref R range,
                  ref E store = *(cast(E*) alloca(E.sizeof))
{
    ...
}

With this, allocating a dummy buffer on caller's stack is automated, so 
client code can just write:

for (T * p; (p = getNext(r)); )  {
   ... process *p ...
}

I feel dirty.


Andrei
July 13, 2010
Re: getNext
Andrei Alexandrescu Wrote:

> getNext is easy to define for e.g. arrays and files. How does it sound? 
> Does it bring significant simplification?
> 
> 
> Andrei

I'm with Jonathan on this. I don't really see much of a benefit. popFront, empty, front are very easy to define and simple to use.

Java uses getNext for its iterators. Though it calls it 'next' and throws an exception when trying to call without any elements. This leads it to also provide a hasNext function.
July 13, 2010
Re: getNext
On 07/12/2010 10:48 PM, Andrei Alexandrescu wrote:
> I think I figured out a comfortable and all-encompassing means to define
> a simplified interface for an input range.
>
> Currently input ranges need to define empty, front, and popFront. That
> works but it's a bit heavy for simple input ranges. We've been
> discussing simplified interfaces in this group but couldn't find one
> that satisfied all use cases.
>
> Consider this:
>
> T* getNext(R, T)(ref R range, ref T item);
>
> Semantics: if the range wants to expose addresses of its elements, it
> returns a pointer to the current element and also advances to the next
> element. Otherwise (i.e. the range does not have or does not want to
> expose addresses of its elements), the range fills "item" with the
> current value, again moves on to the next value, and returns &item.
>
> In all cases, when there are no more elements in the range, getNext
> returns null.

let's see here

Range AhmAStream;

ubyte[] buf = new ubyte[4];
ubyte[]* ruf = AhmAStream.getNext(buf);
assert(ruf && ruf.length == 4);
int i = * (cast(int*) ruf.ptr)
buf = new ubyte[i];
ruf = AhmAStream.getNext(buf);
assert(ruf && ruf.length == i);

Something like this could work within this interface, couldn't it?

Think you might be on to something

>
> getNext is easy to define for e.g. arrays and files. How does it sound?
> Does it bring significant simplification?
>
>
> Andrei
July 13, 2010
Re: getNext
On Tue, 13 Jul 2010 06:38:55 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> On 07/12/2010 11:21 PM, Jonathan M Davis wrote:
>> On Monday 12 July 2010 20:48:05 Andrei Alexandrescu wrote:
>>> I think I figured out a comfortable and all-encompassing means to  
>>> define
>>> a simplified interface for an input range.
>>>
>>> Currently input ranges need to define empty, front, and popFront. That
>>> works but it's a bit heavy for simple input ranges. We've been
>>> discussing simplified interfaces in this group but couldn't find one
>>> that satisfied all use cases.
>>>
>>> Consider this:
>>>
>>> T* getNext(R, T)(ref R range, ref T item);
>>>
>>> Semantics: if the range wants to expose addresses of its elements, it
>>> returns a pointer to the current element and also advances to the next
>>> element. Otherwise (i.e. the range does not have or does not want to
>>> expose addresses of its elements), the range fills "item" with the
>>> current value, again moves on to the next value, and returns&item.
>>>
>>> In all cases, when there are no more elements in the range, getNext
>>> returns null.
>>>
>>> getNext is easy to define for e.g. arrays and files. How does it sound?
>>> Does it bring significant simplification?
>>>
>>>
>>> Andrei
>>
>> What happens to empty, front, and popFront then? Is this a case where  
>> something
>> must define either empty, front, and popFront or getNext to be an input  
>> range? Or
>> is this something else? Personally, I find empty, front, and popFront  
>> quite
>> useful and simple for anything that I've done, and I'd find getNext to  
>> be a lot
>> more verbose. getNext may be great if you're using empty, front, and  
>> popFront
>> pretty much simultaneously, but if you don't want to use all of them for
>> whatever you're doing, then getNext is overkill.
>
> An input range could either define the troika empty/front/popFront or  
> getNext. In the former case, getNext detects the presence of the troika  
> and uses it transparently. That's great because client code can simply  
> use getNext throughout.
>
>> So, essentially, I suppose the issue is that I don't see what you  
>> intend to do
>> to front, popFront, and empty if you add getNext into the mix.
>
> There is no aggravation brought to the current definitions.
>
>> I do not want to
>> see front, popFront, or empty go away. Having getNext as an additional  
>> function
>> to make it easier to iterate over a range and do something with each  
>> element as
>> you iterate wouldn't hurt my feelings any, but I definitely don't want  
>> to lose
>> popFront, empty, or front.
>
> My feelings too. The troika is here to stay, and definitely necessary  
> for any range richer than an input range.
>
>> As for simplification, it strikes me as more complicated in every case  
>> except
>> where you are iterating over a range and processing each element as you  
>> iterate.
>> Granted, that's a common use case, but there are plenty of other cases,  
>> where if
>> you were forced to use getNext instead of having popFront, empty, and  
>> front,
>> that would be a major problem.
>
> Yah, truth be told getNext won't win a prize for brevity. You need to  
> define both a variable and a pointer to use it:
>
> T meh;
> T * neh;
> while ((neh = getNext(r, meh))) {
>     ... process *neh ...
> }
>
> I've just had an idea that is so dark and devious, I was almost afraid  
> to try it. But it works like a charm. Consider:
>
> T * getNext(R, E)(ref R range,
>                    ref E store = *(cast(E*) alloca(E.sizeof))
> {
>      ...
> }
>
> With this, allocating a dummy buffer on caller's stack is automated, so  
> client code can just write:
>
> for (T * p; (p = getNext(r)); )  {
>     ... process *p ...
> }
>
> I feel dirty.
>
>
> Andrei

:) I like it.

But then looking at the alloca man page... yikes
July 13, 2010
alloca() notes [Was: Re: getNext]
Andrei Alexandrescu:
> T * getNext(R, E)(ref R range,
>                    ref E store = *(cast(E*) alloca(E.sizeof))
> {

Time ago I have filed bug http://d.puremagic.com/issues/show_bug.cgi?id=3822 on alloca(), but I think this code is not hit by it.

I have recently written some C99 code and I have appreciated its Variable Length Arrays both in syntax, safety and performance gain.

Compared to alloca() the VLAs have some advantages:
- Their syntax is shorter and nicer, and it's natural for a C programmer;
- no imports needed;
- the semantics is more clear, because they define a new variable, so the size of their scope is the same as the in all other variables, while alloca() seems to have two possible different implementations;
- VLAs are more typesafe, there is no need to use a cast. While the cast needed by alloca() may forbid it in SafeD code.
- VLAs don't need sizeof(T), you just specify a type.

The result is that the usage of alloca() feels dirty in both C and D, but Variable Length Arrays (VLA) of C99 don't feel dirty at all. 

On the other hand VLAs (and alloca) can produce a stack overflow, they are not so commonly useful (as alloca), and they can become essentially a third kind of arrays for D (this is not good).

In D I'd like something like alloca() that needs no casts and is able to find the size by itself, avoiding the bug prone usage of T.sizeof.

A way to do it is to use the same syntax used by C99 and allow a variable in the definition of a stack array:

auto foo(int n) {
   int[n] arr; 
   return arr;
}

The main difference is that in D when you return arr it gets copied by value.

Currently this code works:

int foo(int n) {
    return 3 * n + 1;
}
auto bar() {
   immutable int n = 5;
   int[foo(n)] arr;
   return arr;
}
void main() {}

because dmd runs foo() at compile time, so arr is allocated on the stack with a statically known size. If VLAs get introduced in D, then in this case the compiler has to do what it currently it doesn't do: to run a function at compile-time if possible (and create a fixed length array) and run it at runtime if that's impossible (and create a VLA).

To avoid that in some cases you can use something like:

int[StaticValue!(n)] arr;

Where StaticValue is a template that makes sure n is a value always known at compile-time.

But this is a little messy, and I don't like it too much.

Keep in mind that currently this gives an error:

int[] bar() {
   int[2] arr;
   return arr;
}
void main() {}

You have to use the auto return type or:

int[2] bar() {
   int[2] arr;
   return arr;
}
void main() {}

With VLAs you are forced to use auto, but it can't work anyway at the return point.
So I think the normal C99 syntax for VLAs is not good for D2. On the other hand alloca() semantics and syntax are bad. So D alloca() can be replaced by something better like:

T* ptr = StackAlloc!T(n);
Or:
T* ptr = Alloca!T(n);
Or:
T[] arr = VLA!T(n);

But this created an array that looks like a dynamic array, but its memory is on the stack so it must not escape the function. So it can be a bug-prone, similar to this that currently compiles:

int[] foo() {
   int[1] stackArray;
   int[] dynArray = stackArray[0 .. 1];
   return dynArray;
}
void main() {}

See enhancement request http://d.puremagic.com/issues/show_bug.cgi?id=4451

So with the syntax T[] arr=VLA!T(n); the compiler has to disallow the return of arr and its slices.

Whith those three syntaxes your signature becomes:

T* getNext(R, E)(ref R range, ref E store = *StackAlloc!E(1)) {
T* getNext(R, E)(ref R range, ref E store = *Alloca!E(1)) {
T* getNext(R, E)(ref R range, ref E store = *(VLA!E(1).ptr)) {

Bye,
bearophile
July 13, 2010
Re: getNext
Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> I've just had an idea that is so dark and devious, I was almost afraid 
> to try it. But it works like a charm. Consider:
> 
> T * getNext(R, E)(ref R range,
>                    ref E store = *(cast(E*) alloca(E.sizeof))
> {
>      ...
> }
> 
> With this, allocating a dummy buffer on caller's stack is automated, so 
> client code can just write:
> 
> for (T * p; (p = getNext(r)); )  {
>     ... process *p ...
> }
> 
> I feel dirty.

How about a TLS variable?

template temporary(T)
{
   static T temporary;
}
E* getNext(R, E)(ref R range, ref E store = temporary!E);


Shin
July 13, 2010
Re: getNext
Shin Fujishiro <rsinfu@gmail.com> wrote:
> How about a TLS variable?
> 
> template temporary(T)
> {
>     static T temporary;
> }
> E* getNext(R, E)(ref R range, ref E store = temporary!E);

Yeah, I just missed it.  TLS is not usable.  I had spoken carelessly...


Shin
July 13, 2010
Re: getNext
On Mon, 12 Jul 2010 23:48:05 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> I think I figured out a comfortable and all-encompassing means to define  
> a simplified interface for an input range.
>
> Currently input ranges need to define empty, front, and popFront. That  
> works but it's a bit heavy for simple input ranges. We've been  
> discussing simplified interfaces in this group but couldn't find one  
> that satisfied all use cases.
>
> Consider this:
>
> T* getNext(R, T)(ref R range, ref T item);
>
> Semantics: if the range wants to expose addresses of its elements, it  
> returns a pointer to the current element and also advances to the next  
> element. Otherwise (i.e. the range does not have or does not want to  
> expose addresses of its elements), the range fills "item" with the  
> current value, again moves on to the next value, and returns &item.
>
> In all cases, when there are no more elements in the range, getNext  
> returns null.
>
> getNext is easy to define for e.g. arrays and files. How does it sound?  
> Does it bring significant simplification?

Yes, yes, yes!

A question though -- whenever a pointer occurs, we always cringe,  
especially in safeD.  will getNext be unsafe?

BTW, I like the alloca thingy, that's really cool.

One thing I just thought of, getNext should be split into two functions,  
the one you have, and:

ElementType!R *getNext(R)(ref R range)

To avoid having to supply the item or use alloca when the range is going  
to give you back a pointer to its internals anyways (tested with a  
template constraint).

-Steve
« First   ‹ Prev
1 2 3 4 5
Top | Discussion index | About this forum | D home