Casting MapResult (page 3)

On Monday, 15 June 2015 at 22:40:31 UTC, Baz wrote: > Right, my bad. This one whould work: > > --- > float[] test(float[] x) { > auto result = x.dup; > result.each!((ref a) => (a = exp(a))); > return result; > } > --- That works. Thanks. I did some benchmarking and found that map tended to be faster than each. For some large arrays, it was exceptionally faster. Perhaps it has to do with the extra copying in the each formula? I also did an alternative to each using foreach and they were exactly the same speed.

On Tuesday, 16 June 2015 at 13:06:58 UTC, jmh530 wrote: > On Monday, 15 June 2015 at 22:40:31 UTC, Baz wrote: > >> Right, my bad. This one whould work: >> >> --- >> float[] test(float[] x) { >> auto result = x.dup; >> result.each!((ref a) => (a = exp(a))); >> return result; >> } >> --- > > That works. Thanks. > > I did some benchmarking and found that map tended to be faster than each. For some large arrays, it was exceptionally faster. Perhaps it has to do with the extra copying in the each formula? > > I also did an alternative to each using foreach and they were exactly the same speed. Range based code is very dependant on aggressive optimisation to get good performance. DMD does a pretty bad/patchy job of this, LDC and and GDC will normally give you more consistently* fast code. *consistent as in different implementations performing very similarly instead of seeing big differences like you have here.

On Tuesday, 16 June 2015 at 13:15:05 UTC, John Colvin wrote: > > *consistent as in different implementations performing very similarly instead of seeing big differences like you have here. That's a good point. I tried numpy's exp (which uses C at a low level, I think) and found it takes about a fifth as long. I went searching for numpy's implementation, but could only find a C header containing the function prototype. I only have dmd on my work computer and it probably would be a hassle to get the others working right now.

On Tuesday, 16 June 2015 at 14:43:17 UTC, jmh530 wrote: > On Tuesday, 16 June 2015 at 13:15:05 UTC, John Colvin wrote: > >> >> *consistent as in different implementations performing very similarly instead of seeing big differences like you have here. > > That's a good point. I tried numpy's exp (which uses C at a low level, I think) and found it takes about a fifth as long. I went searching for numpy's implementation, but could only find a C header containing the function prototype. > > I only have dmd on my work computer and it probably would be a hassle to get the others working right now. Have you tried using core.stdc.math.exp instead of std.math.exp? It's probably faster, although not necessarily quite as accurate. If you want really fast exponentiation of an array though, you want to use SIMD. Something like http://www.yeppp.info would be easy to use from D.

On Tuesday, 16 June 2015 at 14:43:17 UTC, jmh530 wrote: > On Tuesday, 16 June 2015 at 13:15:05 UTC, John Colvin wrote: > >> >> *consistent as in different implementations performing very similarly instead of seeing big differences like you have here. > > That's a good point. I tried numpy's exp (which uses C at a low level, I think) and found it takes about a fifth as long. I went searching for numpy's implementation, but could only find a C header containing the function prototype. > > I only have dmd on my work computer and it probably would be a hassle to get the others working right now. What OS are you on? See http://wiki.dlang.org/Compilers

On Tuesday, 16 June 2015 at 16:38:55 UTC, John Colvin wrote: > > What OS are you on? See http://wiki.dlang.org/Compilers I'm on Windows 7 at work, and I have both Win7 and linux at home. I figure I can try it on linux at home. Sometimes the work computer is a bit funky with installing things, so I didn't want to bother. On Tuesday, 16 June 2015 at 16:37:35 UTC, John Colvin wrote: > > If you want really fast exponentiation of an array though, you want to use SIMD. Something like http://www.yeppp.info would be easy to use from D. I wasn't familiar with yeppp. Thanks. I'll probably keep things in as native D for now, but it's good to know there are other options. I compared the results with Julia and R while I was at it. The D code was quite a bit faster than them. It's just that numpy's doing something that is getting better performance. After some investigation, it's possible that my version of numpy is using SSE, which is a form of SIMD from Intel. It doesn't seem to be easy to check this. The one method I found on stackoverflow doesn't work for me... It looks like D has some support for simd, but only for a limited subset of matrices.

June 23, 2015

Re: Casting MapResult

Posted by jmh530
in reply to John Colvin

Permalink

jmh530

Posted in reply to John Colvin

Permalink

On Tuesday, 16 June 2015 at 16:37:35 UTC, John Colvin wrote:
> If you want really fast exponentiation of an array though, you want to use SIMD. Something like http://www.yeppp.info would be easy to use from D.

I've been looking into SIMD a little. It turns out that core.simd only works for DMD on Linux machines. Not sure about the other compilers, but I was a bit stuck for a little on it. I read a little on SIMD as I had no real understanding of it before you mentioned it. At least I understand why all the types on core.simd were so small. My initial reaction was there's no way I would want to write a code just for float[4], but now I'm like "oh that's the whole point".

Anyway, I might try to put something together on my other machine one of these days, but I was able to make a little bit more progress with D's std.parallelism. The foreach loops work great, even on Windows, with little extra work required.

That being said, I'm not seeing any speed-up from parallel map. I put some code below doing some variations on std.algorithm.map and taskPool.map. The more the memory allocation (through .array) the longer everything takes. Keeping things as ranges seems to be much faster.

The most interesting result to me was that the taskPool.map was slower than std.algorithm.map in each case. Maybe a difference between being semi-eager versus lazy. The code below doesn't show it, but it seems like the parallel foreach loop is faster than std.algorithm.map or taskPool.map when doing everything with arrays.



import std.datetime;
import std.parallelism;
import std.conv : to;
import std.math : exp;
import std.stdio : writeln;
import std.array : array;
import std.range : iota;

enum real x_size = 100_000;

void f0()
{
	auto y = std.algorithm.map!(a => exp(a))(iota(x_size));
}

void f1()
{
	auto y = taskPool.map!exp(iota(x_size));
}

void f2()
{
	auto y = std.algorithm.map!(a => exp(a))(iota(x_size)).array;
}

void f3()
{
	auto y = taskPool.map!exp(iota(x_size)).array;
}

void f4()
{
	auto y = std.algorithm.map!(a => exp(a))(iota(x_size).array);
}

void f5()
{
	auto y = taskPool.map!exp(iota(x_size).array);
}

void f6()
{
	auto y = std.algorithm.map!(a => exp(a))(iota(x_size).array).array;
}

void f7()
{
	auto y = taskPool.map!exp(iota(x_size).array).array;
}

void main() {
	auto r = benchmark!(f0, f1, f2, f3, f4, f5, f6, f7)(100);
	auto f0Result = to!Duration(r[0]);
	auto f1Result = to!Duration(r[1]);
	auto f2Result = to!Duration(r[2]);
	auto f3Result = to!Duration(r[3]);
	auto f4Result = to!Duration(r[4]);
	auto f5Result = to!Duration(r[5]);
	auto f6Result = to!Duration(r[6]);
	auto f7Result = to!Duration(r[7]);
	writeln(f0Result);			//prints ~ 17us on my machine
	writeln(f1Result);			//prints ~ 4.3ms on my machine
	writeln(f2Result);			//prints ~ 1.7s on my machine
	writeln(f3Result);			//prints ~ 3.5s on my machine
	writeln(f4Result);			//prints ~ 471ms on my machine
	writeln(f5Result);			//prints ~ 473ms on my machine
	writeln(f6Result);			//prints ~ 1.9s on my machine
	writeln(f7Result);			//prints ~ 3.9s on my machine
}

On Tuesday, 23 June 2015 at 01:27:21 UTC, jmh530 wrote: > On Tuesday, 16 June 2015 at 16:37:35 UTC, John Colvin wrote: >> If you want really fast exponentiation of an array though, you want to use SIMD. Something like http://www.yeppp.info would be easy to use from D. > > I've been looking into SIMD a little. It turns out that core.simd only works for DMD on Linux machines. If I remember correctly, core.simd should work with every compiler on every supported OS. What did you try that didn't work?

On Tuesday, 23 June 2015 at 10:50:51 UTC, John Colvin wrote: > If I remember correctly, core.simd should work with every compiler on every supported OS. What did you try that didn't work? I figured out the issue! You have to compile using the -m64 flag to get it to work on Windows (this works on both dmd and rdmd). The 32bit specification does not support SIMD. I don't think I had ever noticed that it wasn't using a 64bit compilation. I was a little disheartened getting an error running one of the first pieces of code on this page http://dlang.org/simd.html the second line below, some casting issue. int4 v = 7; v = 3 * v; // multiply each element in v by 3 Outside of that, I can see one issue. I was overloading a version of exp that takes a real and returns a real. I see no support for a real SIMD type, perhaps because the CPUs don't support it. So I pretty much could only overload the float or double versions. On std.parallelism, I noticed that I could only loop through the static arrays with foreach with I appended them with []. I still get mixed up on that syntax. The good thing about static loops is that I could determine the length at compile time. I'm not positive, but I think I might be able to get it set up so that I could have different functions, one non-parallel below some length and one parallel above some length. This is good because the parallel one may not be able to use all the function attributes of the non-parallel ones. I haven't been able to get anything like that to work for a dynamic array version as the length is not known at compile time, just one big function.

Forums