Jump to page: 1 2
Thread overview
Iterating over multiple collections in parallel
Jul 03, 2008
Koroskin Denis
Jul 03, 2008
BCS
Jul 04, 2008
downs
Jul 04, 2008
bearophile
Jul 05, 2008
Fawzi Mohamed
Jul 05, 2008
bearophile
Jul 05, 2008
Walter Bright
Jul 05, 2008
Manfred_Nowak
Jul 05, 2008
Yigal Chripun
Jul 06, 2008
Manfred_Nowak
Jul 06, 2008
Yigal Chripun
Jul 05, 2008
Koroskin Denis
Jul 06, 2008
Manfred_Nowak
July 03, 2008
If found myself solving the same problem again and again:
how to implement simultaneous iteration over two (or more collections)?

suppose, we have the arrays:
int[] a = [1, 2, 3];
int[] b = [1, 4, 9];

and would like to multiply them per-component, like this:

int[] c = new int[a.length];
for (int i = 0; i < a.length; ++j) {
    c[i] = a[i] * b[i];
}

Since foreach is the prefered (and sometimes the only) way to iterate over collection, there should be a way to iterate over multiple collections simultaneously, like (just an example) this:

int[] c = new int[a.length];
foreach (i : a; j : b; ref k : c) {
    k = a + b;
}

But this isn't supported (yet?)
More complicated example would be an iterating over two (or more) collection with *no* random access iterators available, opApply only.

I spend a lot of time on this and have found no solution how to do it with existing D feature set, but this is surely doable with a few inter-function gotos and an exception if collections sizes don't match. It's just a small layer written in asm, nothing more.

How about an enhancement proposal? :)
July 03, 2008
"Koroskin Denis" <2korden@gmail.com> wrote in message news:op.udp4pcjxenyajd@proton.creatstudio.intranet...
> If found myself solving the same problem again and again:
> how to implement simultaneous iteration over two (or more collections)?
>
> suppose, we have the arrays:
> int[] a = [1, 2, 3];
> int[] b = [1, 4, 9];
>
> and would like to multiply them per-component, like this:
>
> int[] c = new int[a.length];
> for (int i = 0; i < a.length; ++j) {
>     c[i] = a[i] * b[i];
> }
>
> Since foreach is the prefered (and sometimes the only) way to iterate over collection, there should be a way to iterate over multiple collections simultaneously, like (just an example) this:
>
> int[] c = new int[a.length];
> foreach (i : a; j : b; ref k : c) {
>     k = a + b;
> }
>
> But this isn't supported (yet?)
> More complicated example would be an iterating over two (or more)
> collection with *no* random access iterators available, opApply only.
>
> I spend a lot of time on this and have found no solution how to do it with existing D feature set, but this is surely doable with a few inter-function gotos and an exception if collections sizes don't match. It's just a small layer written in asm, nothing more.
>
> How about an enhancement proposal? :)

It's something Walter mentioned before, but without any kind of public "plan" of these ideas he has, there's no way to know if it's even on his plate.

I think the syntax he used was something like this:

foreach(a, b; c)(d, e; f) { blah }

In any case, it's difficult in the general case to do parallel iteration with D's inside-out iterators.  To do it generally, you'd have to use coroutines.  But you can currently write a templated zip to iterate over arrays simultaneously, only because they're so easy to iterate over.  I'm sure downs will post something horrid-looking that you can look over.


July 03, 2008
Reply to Koroskin,

> If found myself solving the same problem again and again: how to
> implement simultaneous iteration over two (or more collections)?
> 
> suppose, we have the arrays:
> int[] a = [1, 2, 3];
> int[] b = [1, 4, 9];
> and would like to multiply them per-component, like this:
> 
> int[] c = new int[a.length];
> for (int i = 0; i < a.length; ++j) {
> c[i] = a[i] * b[i];
> }
> Since foreach is the prefered (and sometimes the only) way to iterate
> over  collection, there should be a way to iterate over multiple
> collections  simultaneously, like (just an example) this:
> 
> int[] c = new int[a.length];
> foreach (i : a; j : b; ref k : c) {
> k = a + b;
> }
> But this isn't supported (yet?)
> More complicated example would be an iterating over two (or more)
> collection with *no* random access iterators available, opApply only.
> I spend a lot of time on this and have found no solution how to do it
> with  existing D feature set, but this is surely doable with a few
> inter-function gotos and an exception if collections sizes don't
> match.  It's just a small layer written in asm, nothing more.
> 
> How about an enhancement proposal? :)
> 

Can a singe thread have more than one stack? If someone can figure out how to do that, then this wouldn't be "to hard" to do in a lib.

call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last dg is called on the threads normal stack and calls the real body.


July 03, 2008
"Koroskin Denis" wrote
> If found myself solving the same problem again and again:
> how to implement simultaneous iteration over two (or more collections)?
>
> suppose, we have the arrays:
> int[] a = [1, 2, 3];
> int[] b = [1, 4, 9];
>
> and would like to multiply them per-component, like this:
>
> int[] c = new int[a.length];
> for (int i = 0; i < a.length; ++j) {
>     c[i] = a[i] * b[i];
> }
>
> Since foreach is the prefered (and sometimes the only) way to iterate over collection, there should be a way to iterate over multiple collections simultaneously, like (just an example) this:
>
> int[] c = new int[a.length];
> foreach (i : a; j : b; ref k : c) {
>     k = a + b;
> }
>
> But this isn't supported (yet?)
> More complicated example would be an iterating over two (or more)
> collection with *no* random access iterators available, opApply only.
>
> I spend a lot of time on this and have found no solution how to do it with existing D feature set, but this is surely doable with a few inter-function gotos and an exception if collections sizes don't match. It's just a small layer written in asm, nothing more.
>
> How about an enhancement proposal? :)

The issue here is how to formulate each iteration.  foreach works because it is well defined "iterate over each value in the container".  But iterating over multiple containers, you could want different orders of iteration, iterating for each unique set of indexes, etc.

Or you could want what you had originally stated, in which cases the lengths must be correct.  We already have foreach_reverse, which is like foreach, but iterates in the opposite direction.  I don't want foreach_X for everyones preferred iteration method of choice.

I think the best solution here is a fruct (foreach struct) in a library:

struct myfruct
{
   static myfruct opCall(int[] a, int[] b, int[] c)
   {
      myfruct result;
      result.a = a;
      result.b = b;
      result.c = c;
      assert(a.length == b.length && a.length == c.length);
   }

   int[] a;
   int[] b;
   int[] c;
   int opApply(int delegate(ref int i, ref int j, ref int k) dg)
   {
      int result = 0;
      for(int i = 0; i < a.length; i++)
      {
         if((result = dg(a[i], b[i], c[i])) != 0)
             break;
      }
      return result;
   }
}

usage:

foreach(i, j, ref k; myfruct(a, b, c))
{
    k = i * j;
}

Now, the issue here is, how can this be made generic?  The answer is, we need iterators.  Otherwise you cannot iterate over multiple containers that are not index-based (such as AA's or maps).  With iterators, that have a well defined interface (like opApply), that allows one to increment individual indexes, and not have to know what indexes are made of how how they work.  They just define opInc and opStar.  Oh, and we would need reference return types too (so opStar can return an actual reference).

-Steve


July 03, 2008
"BCS" <ao@pathlink.com> wrote in message news:55391cb32ecb08caab0811331728@news.digitalmars.com...
>
> Can a singe thread have more than one stack? If someone can figure out how to do that, then this wouldn't be "to hard" to do in a lib.
>
> call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last dg is called on the threads normal stack and calls the real body.

Of course, they're called stackthreads or Fibers.  Downs has implemented them in his tools package (in an unstable state) for Phobos, and they've been in Tango for more than two years.


July 04, 2008
Jarrett Billingsley wrote:
> "BCS" <ao@pathlink.com> wrote in message news:55391cb32ecb08caab0811331728@news.digitalmars.com...
>> Can a singe thread have more than one stack? If someone can figure out how to do that, then this wouldn't be "to hard" to do in a lib.
>>
>> call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last dg is called on the threads normal stack and calls the real body.
> 
> Of course, they're called stackthreads or Fibers.  Downs has implemented them in his tools package (in an unstable state) for Phobos, and they've been in Tango for more than two years.
> 
> 
They're pretty stable now, but they depend on a Phobos patch to run acceptably fast (500 cycles per context switch on my Pentium M).

And I don't know if they run on DMD; there was some problem with the GDC assembler statements. If somebody finds a problem with it, tell me in a reply and I'll try to fix it.

Anyway, here's parallel iteration over two foreachables:

gentoo-pc ~/d $ cat test33.d && echo "----" && rebuild test33.d && ./test33
module test33;
import std.stdio, tools.stackthreads;

void main() {
  auto it1 = [1, 2, 3];
  auto it2 = [1, 4, 9];
  auto iter2 = generator((void delegate(int) yield) {
    foreach (entry; it2) yield(entry);
  });

  foreach (entry1; it1) {
    auto entry2 = iter2();
    writefln(entry1, " - ", entry2);
  }
}
----
1 - 1
2 - 4
3 - 9
gentoo-pc ~/d $
July 04, 2008
downs:
>   auto iter2 = generator((void delegate(int) yield) {
>     foreach (entry; it2) yield(entry);
>   });
> 
>   foreach (entry1; it1) {
>     auto entry2 = iter2();
>     writefln(entry1, " - ", entry2);

Nice :-)

Bye,
bearophile
July 05, 2008
Iterating over many collection in parallel is very nice to have, what would be nice to have with a cleaner syntax for it.

Aldor had it, and since then I always missed it.
It worked like this:
Iterator/loops can be declared like this:

for a in [1,2,3]
for b in [1,0,7,3]
for i in 1..
while (b<7)
repeat {
	...
}

which mean follow *all* iterators in parallel, and execute ..., when any iterator stop, stop everything

very clear, nice and flexible.

Other syntaxes look like a kludge with respect to it.

Python for example has generators/iterators, and builds derived iterators one on the top of the other

for a in [1,2,3]
for i in 0..
repeat { ... }

becomes
for i,a in enumerate([1,2,3].iterator()):
 ...

C++ iterators can be used in parallel, because all the difficulty of keeping the current position is moved to the iterator writer, but the syntax is just plain ugly and writing efficient iterators for complex structures can be painful.

Anyway leaving away that syntax, in D without changes one can already have a reasonable syntax (even if not as nice).
Using fibers, as in the nice example of downs it should also be possible to convert automatically an opApply in a generator, so technically it should be feasible, even if 500 cycles per conetext switch, might be too much for some applications (tight loops).
So in D one could have a something workable
foreach(a,b,i;collectLoop([1,2,3],[2,3,7],Range(1,infinity))){
...
}

where collectLoop is a variadic template function that converts opApply to a generator and returns an object having a opApply that loops in parallel.
Implementation of it is left as exercice to the reader ;-)

Fawzi

July 05, 2008
Fawzi Mohamed:

> for i,a in enumerate([1,2,3].iterator()):

I may have lost you a bit here, anyway in real Python that code is:

for idx, el in enumerate([1, 2, 3]):
    print idx, el

> Using fibers, as in the nice example of downs it should also be
> possible to convert automatically an opApply in a generator, so
> technically it should be feasible, even if 500 cycles per conetext
> switch, might be too much for some applications (tight loops).
> So in D one could have a something workable
> foreach(a,b,i;collectLoop([1,2,3],[2,3,7],Range(1,infinity))){
> ...
> }
> Implementation of it is left as exercice to the reader ;-)

In my libs there are zip() and azip() for something like that, but they don't use fibers yet... Once the fibers are added, the code can be made smart at compile-time, so it can use the current fast code when possible, and fibers when it must.

Bye,
bearophile
July 05, 2008
Koroskin Denis wrote:
> I spend a lot of time on this and have found no solution how to do it with existing D feature set, but this is surely doable with a few inter-function gotos and an exception if collections sizes don't match. 

It's a well-known problem, and not so easy to solve in an intuitive, non-klunky way. Some of the issues are what do you do when the collections are of different sizes? Error? Iterate to the minimum? Iterate to the maximum and use default values for the shorter collections?

The best I can offer at the moment is:

    foreach (i; 0 .. a.length)
	a[i] = b[i] + c[i];
« First   ‹ Prev
1 2