View mode: basic / threaded / horizontal-split · Log in · Help
July 03, 2008
Iterating over multiple collections in parallel
If found myself solving the same problem again and again:
how to implement simultaneous iteration over two (or more collections)?

suppose, we have the arrays:
int[] a = [1, 2, 3];
int[] b = [1, 4, 9];

and would like to multiply them per-component, like this:

int[] c = new int[a.length];
for (int i = 0; i < a.length; ++j) {
    c[i] = a[i] * b[i];
}

Since foreach is the prefered (and sometimes the only) way to iterate over  
collection, there should be a way to iterate over multiple collections  
simultaneously, like (just an example) this:

int[] c = new int[a.length];
foreach (i : a; j : b; ref k : c) {
    k = a + b;
}

But this isn't supported (yet?)
More complicated example would be an iterating over two (or more)  
collection with *no* random access iterators available, opApply only.

I spend a lot of time on this and have found no solution how to do it with  
existing D feature set, but this is surely doable with a few  
inter-function gotos and an exception if collections sizes don't match.  
It's just a small layer written in asm, nothing more.

How about an enhancement proposal? :)
July 03, 2008
Re: Iterating over multiple collections in parallel
"Koroskin Denis" <2korden@gmail.com> wrote in message 
news:op.udp4pcjxenyajd@proton.creatstudio.intranet...
> If found myself solving the same problem again and again:
> how to implement simultaneous iteration over two (or more collections)?
>
> suppose, we have the arrays:
> int[] a = [1, 2, 3];
> int[] b = [1, 4, 9];
>
> and would like to multiply them per-component, like this:
>
> int[] c = new int[a.length];
> for (int i = 0; i < a.length; ++j) {
>     c[i] = a[i] * b[i];
> }
>
> Since foreach is the prefered (and sometimes the only) way to iterate over 
> collection, there should be a way to iterate over multiple collections 
> simultaneously, like (just an example) this:
>
> int[] c = new int[a.length];
> foreach (i : a; j : b; ref k : c) {
>     k = a + b;
> }
>
> But this isn't supported (yet?)
> More complicated example would be an iterating over two (or more) 
> collection with *no* random access iterators available, opApply only.
>
> I spend a lot of time on this and have found no solution how to do it with 
> existing D feature set, but this is surely doable with a few 
> inter-function gotos and an exception if collections sizes don't match. 
> It's just a small layer written in asm, nothing more.
>
> How about an enhancement proposal? :)

It's something Walter mentioned before, but without any kind of public 
"plan" of these ideas he has, there's no way to know if it's even on his 
plate.

I think the syntax he used was something like this:

foreach(a, b; c)(d, e; f) { blah }

In any case, it's difficult in the general case to do parallel iteration 
with D's inside-out iterators.  To do it generally, you'd have to use 
coroutines.  But you can currently write a templated zip to iterate over 
arrays simultaneously, only because they're so easy to iterate over.  I'm 
sure downs will post something horrid-looking that you can look over.
July 03, 2008
Re: Iterating over multiple collections in parallel
Reply to Koroskin,

> If found myself solving the same problem again and again: how to
> implement simultaneous iteration over two (or more collections)?
> 
> suppose, we have the arrays:
> int[] a = [1, 2, 3];
> int[] b = [1, 4, 9];
> and would like to multiply them per-component, like this:
> 
> int[] c = new int[a.length];
> for (int i = 0; i < a.length; ++j) {
> c[i] = a[i] * b[i];
> }
> Since foreach is the prefered (and sometimes the only) way to iterate
> over  collection, there should be a way to iterate over multiple
> collections  simultaneously, like (just an example) this:
> 
> int[] c = new int[a.length];
> foreach (i : a; j : b; ref k : c) {
> k = a + b;
> }
> But this isn't supported (yet?)
> More complicated example would be an iterating over two (or more)
> collection with *no* random access iterators available, opApply only.
> I spend a lot of time on this and have found no solution how to do it
> with  existing D feature set, but this is surely doable with a few
> inter-function gotos and an exception if collections sizes don't
> match.  It's just a small layer written in asm, nothing more.
> 
> How about an enhancement proposal? :)
> 

Can a singe thread have more than one stack? If someone can figure out how 
to do that, then this wouldn't be "to hard" to do in a lib.

call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last 
dg is called on the threads normal stack and calls the real body.
July 03, 2008
Re: Iterating over multiple collections in parallel
"Koroskin Denis" wrote
> If found myself solving the same problem again and again:
> how to implement simultaneous iteration over two (or more collections)?
>
> suppose, we have the arrays:
> int[] a = [1, 2, 3];
> int[] b = [1, 4, 9];
>
> and would like to multiply them per-component, like this:
>
> int[] c = new int[a.length];
> for (int i = 0; i < a.length; ++j) {
>     c[i] = a[i] * b[i];
> }
>
> Since foreach is the prefered (and sometimes the only) way to iterate over 
> collection, there should be a way to iterate over multiple collections 
> simultaneously, like (just an example) this:
>
> int[] c = new int[a.length];
> foreach (i : a; j : b; ref k : c) {
>     k = a + b;
> }
>
> But this isn't supported (yet?)
> More complicated example would be an iterating over two (or more) 
> collection with *no* random access iterators available, opApply only.
>
> I spend a lot of time on this and have found no solution how to do it with 
> existing D feature set, but this is surely doable with a few 
> inter-function gotos and an exception if collections sizes don't match. 
> It's just a small layer written in asm, nothing more.
>
> How about an enhancement proposal? :)

The issue here is how to formulate each iteration.  foreach works because it 
is well defined "iterate over each value in the container".  But iterating 
over multiple containers, you could want different orders of iteration, 
iterating for each unique set of indexes, etc.

Or you could want what you had originally stated, in which cases the lengths 
must be correct.  We already have foreach_reverse, which is like foreach, 
but iterates in the opposite direction.  I don't want foreach_X for 
everyones preferred iteration method of choice.

I think the best solution here is a fruct (foreach struct) in a library:

struct myfruct
{
  static myfruct opCall(int[] a, int[] b, int[] c)
  {
     myfruct result;
     result.a = a;
     result.b = b;
     result.c = c;
     assert(a.length == b.length && a.length == c.length);
  }

  int[] a;
  int[] b;
  int[] c;
  int opApply(int delegate(ref int i, ref int j, ref int k) dg)
  {
     int result = 0;
     for(int i = 0; i < a.length; i++)
     {
        if((result = dg(a[i], b[i], c[i])) != 0)
            break;
     }
     return result;
  }
}

usage:

foreach(i, j, ref k; myfruct(a, b, c))
{
   k = i * j;
}

Now, the issue here is, how can this be made generic?  The answer is, we 
need iterators.  Otherwise you cannot iterate over multiple containers that 
are not index-based (such as AA's or maps).  With iterators, that have a 
well defined interface (like opApply), that allows one to increment 
individual indexes, and not have to know what indexes are made of how how 
they work.  They just define opInc and opStar.  Oh, and we would need 
reference return types too (so opStar can return an actual reference).

-Steve
July 03, 2008
Re: Iterating over multiple collections in parallel
"BCS" <ao@pathlink.com> wrote in message 
news:55391cb32ecb08caab0811331728@news.digitalmars.com...
>
> Can a singe thread have more than one stack? If someone can figure out how 
> to do that, then this wouldn't be "to hard" to do in a lib.
>
> call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last 
> dg is called on the threads normal stack and calls the real body.

Of course, they're called stackthreads or Fibers.  Downs has implemented 
them in his tools package (in an unstable state) for Phobos, and they've 
been in Tango for more than two years.
July 04, 2008
Re: Iterating over multiple collections in parallel
Jarrett Billingsley wrote:
> "BCS" <ao@pathlink.com> wrote in message 
> news:55391cb32ecb08caab0811331728@news.digitalmars.com...
>> Can a singe thread have more than one stack? If someone can figure out how 
>> to do that, then this wouldn't be "to hard" to do in a lib.
>>
>> call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last 
>> dg is called on the threads normal stack and calls the real body.
> 
> Of course, they're called stackthreads or Fibers.  Downs has implemented 
> them in his tools package (in an unstable state) for Phobos, and they've 
> been in Tango for more than two years. 
> 
> 
They're pretty stable now, but they depend on a Phobos patch to run acceptably fast (500 cycles per context switch on my Pentium M).

And I don't know if they run on DMD; there was some problem with the GDC assembler statements. If somebody finds a problem with it, tell me in a reply and I'll try to fix it.

Anyway, here's parallel iteration over two foreachables:

gentoo-pc ~/d $ cat test33.d && echo "----" && rebuild test33.d && ./test33
module test33;
import std.stdio, tools.stackthreads;

void main() {
 auto it1 = [1, 2, 3];
 auto it2 = [1, 4, 9];
 auto iter2 = generator((void delegate(int) yield) {
   foreach (entry; it2) yield(entry);
 });

 foreach (entry1; it1) {
   auto entry2 = iter2();
   writefln(entry1, " - ", entry2);
 }
}
----
1 - 1
2 - 4
3 - 9
gentoo-pc ~/d $
July 04, 2008
Re: Iterating over multiple collections in parallel
downs:
>   auto iter2 = generator((void delegate(int) yield) {
>     foreach (entry; it2) yield(entry);
>   });
> 
>   foreach (entry1; it1) {
>     auto entry2 = iter2();
>     writefln(entry1, " - ", entry2);

Nice :-)

Bye,
bearophile
July 05, 2008
Re: Iterating over multiple collections in parallel
Iterating over many collection in parallel is very nice to have, what 
would be nice to have with a cleaner syntax for it.

Aldor had it, and since then I always missed it.
It worked like this:
Iterator/loops can be declared like this:

for a in [1,2,3]
for b in [1,0,7,3]
for i in 1..
while (b<7)
repeat {
	...
}

which mean follow *all* iterators in parallel, and execute ..., when 
any iterator stop, stop everything

very clear, nice and flexible.

Other syntaxes look like a kludge with respect to it.

Python for example has generators/iterators, and builds derived 
iterators one on the top of the other

for a in [1,2,3]
for i in 0..
repeat { ... }

becomes
for i,a in enumerate([1,2,3].iterator()):
 ...

C++ iterators can be used in parallel, because all the difficulty of 
keeping the current position is moved to the iterator writer, but the 
syntax is just plain ugly and writing efficient iterators for complex 
structures can be painful.

Anyway leaving away that syntax, in D without changes one can already 
have a reasonable syntax (even if not as nice).
Using fibers, as in the nice example of downs it should also be 
possible to convert automatically an opApply in a generator, so 
technically it should be feasible, even if 500 cycles per conetext 
switch, might be too much for some applications (tight loops).
So in D one could have a something workable
foreach(a,b,i;collectLoop([1,2,3],[2,3,7],Range(1,infinity))){
...
}

where collectLoop is a variadic template function that converts opApply 
to a generator and returns an object having a opApply that loops in 
parallel.
Implementation of it is left as exercice to the reader ;-)

Fawzi
July 05, 2008
Re: Iterating over multiple collections in parallel
Fawzi Mohamed:

> for i,a in enumerate([1,2,3].iterator()):

I may have lost you a bit here, anyway in real Python that code is:

for idx, el in enumerate([1, 2, 3]):
   print idx, el

> Using fibers, as in the nice example of downs it should also be 
> possible to convert automatically an opApply in a generator, so 
> technically it should be feasible, even if 500 cycles per conetext 
> switch, might be too much for some applications (tight loops).
> So in D one could have a something workable
> foreach(a,b,i;collectLoop([1,2,3],[2,3,7],Range(1,infinity))){
> ...
> }
> Implementation of it is left as exercice to the reader ;-)

In my libs there are zip() and azip() for something like that, but they don't use fibers yet... Once the fibers are added, the code can be made smart at compile-time, so it can use the current fast code when possible, and fibers when it must.

Bye,
bearophile
July 05, 2008
Re: Iterating over multiple collections in parallel
Koroskin Denis wrote:
> I spend a lot of time on this and have found no solution how to do it 
> with existing D feature set, but this is surely doable with a few 
> inter-function gotos and an exception if collections sizes don't match. 

It's a well-known problem, and not so easy to solve in an intuitive, 
non-klunky way. Some of the issues are what do you do when the 
collections are of different sizes? Error? Iterate to the minimum? 
Iterate to the maximum and use default values for the shorter collections?

The best I can offer at the moment is:

    foreach (i; 0 .. a.length)
	a[i] = b[i] + c[i];
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home