May 16, 2004
hellcatv@hotmail.com wrote:

> First of all I'd like to say that if walter wishes to integrate my vec.d into his language he can have it under the BSD license or another license if he wants to talk to me about it. Other users must talk to me about changing the license but I'm quite flexible.
> 
> I was a bit brief about how my float2 float3 and float4 work in http://cvs.sourceforge.net/viewcvs.py/deliria/deliria/vec.d

very nifty. I feel your pain implementing all those swizzle operators. It feels like lisp again with cdr, cadr, cadadr etc etc :-) How many are there - it looks like >50. yikes.

But what's with the pretty wacky idea that for nsize < 3 the z() property should return x()? Does Cg really do that? That seems pretty pervasive that asking for higher dimension information just picks some valid dimension and uses that. Seems random to me. But then again maybe there's a reason.

Otherwise it is very cool to have vectorized math operations and such. I
definetely wouldn't mind seeing a simplified version of vec getting
included in phobos somewhere. All those swizzles make my head ... well...
spin.
What I've been using is just:

// helper to make an array "literal" with value semantics
// Example:
//   uintn!(3)(100,200,300)
struct InlineArray(T,int N) {
  T[N] array;

  static .InlineArray!(T,N) opCall(T x0,...) {
    .InlineArray!(T,N) res;
    res.array[] = (&x0)[0..N][];
    return res;
  }

  static .InlineArray!(T,N) opCall(T[N] x) {
    .InlineArray!(T,N) res;
    res.array[] = x[];
    return res;
  }

  T opIndex(int i) {
    return array[i];
  }

  void opIndex(int i, T val) {
    array[i] = val;
  }

  // todo: arithmetic, cmp, etc
}
template uintn(int N) {
  alias InlineArray!(uint,N) uintn;
}
template intn(int N) {
  alias InlineArray!(int,N) intn;
}
template floatn(int N) {
  alias InlineArray!(float,N) floatn;
}
template doublen(int N) {
  alias InlineArray!(double,N) doublen;
}

May 16, 2004
hellcatv@hotmail.com wrote:

>First of all I'd like to say that if walter wishes to integrate my vec.d into
>his language he can have it under the BSD license or another license if he wants
>to talk to me about it. Other users must talk to me about changing the license
>but I'm quite flexible.
>
>I was a bit brief about how my float2 float3 and float4 work
>in http://cvs.sourceforge.net/viewcvs.py/deliria/deliria/vec.d
>
>but it's almost exactly like the Cg spec except for a few caveats
>a) as you can see in my posts before I was complaining that the opCmp operator
>must only return an int hence I can't do the 4-way < and > and ==
>comparisons...so I use the dot product to get a partial ordering
>b) you can still do component-wise compares using the opLess and opGreater and
>opLEqual and so forth.
>c) you can assign using the swizzle operators
>float4 myvar=float4(1,2,3,4);
>float3 mytmp.yzx = myvar.zyx;
>..
>d) you cannot repeat letters in the swizzle operators unless you are on gdc
>because digital mars' link.exe has a bug that crashes if too many functions are
>defined in a single file
>float3 mytmp.xyz = myvar.zyz; <-- only works if you define Swizzle as a compiler
>flag
>otherwise the alternative is
>float3 mytmp.xyz = myvar.swizzle(2,1,2);
>it's just as powerful a syntax and only necessary if you repeat components.
>
>e) I have provided real2 real3 and real4 and double2 double3 and double4
>vectors.
>
>f) I welcome contributions to the lib... and especially benchmarking of it.
>g) I have provided all the intrinsic functions within Cg (cos, lerp, etc)--they
>also work on the intrinsic float,double and real types.
>h) this lib really pushes the digital mars compiler to its limit--adding one or
>two functions causes the linker to crash under windows.
>of course gdc is golden
>i) The lib is created using a single template class and a *lot* of
>instantiations of that class (so the user mustn't type vec!(real,4)  of course
>that syntax works as well)
>--Daniel
>  
>
Nice.  Parhaps you should check out / take some ideas from burtons math.d class (in undig).  It seemed pretty complete and had some niffty ideas.

-- 
-Anderson: http://badmama.com.au/~anderson/
May 16, 2004
In article <c85srd$268c$1@digitaldaemon.com>, Ben Hinkle says...
>
>Does float4 have value or reference semantics?
>I don't think I'd use a float4 myself since I'm not a game programmer but it
>has repeatedly come up about using (shortish) arrays with value semantics.
>Some generic "static array with value-semantics" would be cool. Right now
>to get something like it I'm using a struct with the type and length as
>template parameters. It works fine but is verbose.
>


Sorry, I'm not explaining myself properly.

float4 would have value semantics and would represent a vector hardware register (e.g. an SSE register in X86.) and use SIMD instructions to perform add/sub/mul/div/etc... It would be subject to all of the optimizations that the compiler can currently do on floats and ints.

Modern C++ compilers support the use of these registers/instructions through intrinsics (or Dylan Cuthbert's extensions to GCC on PS2) and unless D can at least match them, then as a professional games programmer I'll never be able to justify the use of D - even in spite of all the other great language features :(

But I'd actually like to see D go one step further by supporting these types in the language and leapfrogging C++ in an important way in the process.

I'm only interested in extreme performance, so making a struct that looks like a Cg type is pointless, although interesting :)

Si



May 16, 2004
ideally having a struct with vector-like math ops should vectorize
but I don't think any compiler right now we have matches that definition of
ideal

in some ways it would be best to concentrate on figuring out how to enable fast optimizations on structures that happen to do vector ops rather than programming specific types for architectures that have SIMD instructions that may go away in the very next gen of hardware (what if they have scalar units instead next time around)

I would be curious what the hit would be of using my Cg struct as opposed to doing the raw math on 3 local vars...I suspect it's a lot... in C++ it certainly is with gcc...visual studio makes it about a 50% speed hit, but gcc it's more like 75% speed hit

In article <c87bp5$186f$1@digitaldaemon.com>, Simon Hobbs says...
>
>In article <c85srd$268c$1@digitaldaemon.com>, Ben Hinkle says...
>>
>>Does float4 have value or reference semantics?
>>I don't think I'd use a float4 myself since I'm not a game programmer but it
>>has repeatedly come up about using (shortish) arrays with value semantics.
>>Some generic "static array with value-semantics" would be cool. Right now
>>to get something like it I'm using a struct with the type and length as
>>template parameters. It works fine but is verbose.
>>
>
>
>Sorry, I'm not explaining myself properly.
>
>float4 would have value semantics and would represent a vector hardware register (e.g. an SSE register in X86.) and use SIMD instructions to perform add/sub/mul/div/etc... It would be subject to all of the optimizations that the compiler can currently do on floats and ints.
>
>Modern C++ compilers support the use of these registers/instructions through intrinsics (or Dylan Cuthbert's extensions to GCC on PS2) and unless D can at least match them, then as a professional games programmer I'll never be able to justify the use of D - even in spite of all the other great language features :(
>
>But I'd actually like to see D go one step further by supporting these types in the language and leapfrogging C++ in an important way in the process.
>
>I'm only interested in extreme performance, so making a struct that looks like a Cg type is pointless, although interesting :)
>
>Si
>
>
>


May 16, 2004
In article <c87g2f$1ea5$1@digitaldaemon.com>, hellcatv@hotmail.com says...
>
>ideally having a struct with vector-like math ops should vectorize
>but I don't think any compiler right now we have matches that definition of
>ideal
>
>in some ways it would be best to concentrate on figuring out how to enable fast optimizations on structures that happen to do vector ops rather than programming specific types for architectures that have SIMD instructions that may go away in the very next gen of hardware (what if they have scalar units instead next time around)
>

Well, in the future when X86 and PowerPC 'go away' it would still be trivial for the compiler to implement a vector add using scalar units. As you point out, it is doing the opposite that proves problematic.

Si


May 16, 2004
Simon Hobbs wrote:

> In article <c85srd$268c$1@digitaldaemon.com>, Ben Hinkle says...
>>
>>Does float4 have value or reference semantics?
>>I don't think I'd use a float4 myself since I'm not a game programmer but
>>it has repeatedly come up about using (shortish) arrays with value
>>semantics. Some generic "static array with value-semantics" would be cool.
>>Right now to get something like it I'm using a struct with the type and
>>length as template parameters. It works fine but is verbose.
>>
> 
> 
> Sorry, I'm not explaining myself properly.
> 
> float4 would have value semantics and would represent a vector hardware register (e.g. an SSE register in X86.) and use SIMD instructions to perform add/sub/mul/div/etc... It would be subject to all of the optimizations that the compiler can currently do on floats and ints.

OK. My first thought was to use the inline assembler but now that you say it uses SSE registers I guess even with asm blocks you'd have to make sure the right registers are filled when you call add/sub/etc. That would mess up the data-flow optimizations. Still it is an option. It is kindof like bringing back the "register" storage attribute from C (shudder).

> Modern C++ compilers support the use of these registers/instructions through intrinsics (or Dylan Cuthbert's extensions to GCC on PS2) and unless D can at least match them, then as a professional games programmer I'll never be able to justify the use of D - even in spite of all the other great language features :(

If these GCC extensions work on the x86 then gdc could pick them up. DMD would take longer though.

> But I'd actually like to see D go one step further by supporting these types in the language and leapfrogging C++ in an important way in the process.
> 
> I'm only interested in extreme performance, so making a struct that looks like a Cg type is pointless, although interesting :)

D is young so the performance will certainly improve somewhat, maybe not to
the extreme you are looking for. When I read about D it struck me as a
slightly lower level version of Java/Csharp. I never expected it to have
extreme performance of something like Fortran or Cg or even all the
customizability of C++. So for me that's all bonus :-)
Still some vectorized support could benefit both the game and scientific
computing worlds.

I was just googling to see if anyone has tried putting BLAS on the GPU and
sure enough people are looking into it. See for example
  http://wwwcg.in.tum.de/Research/data/Publications/sig03.pdf
That would mean numerical algorithms run on the GPU instead of the CPU for
the vectorized ops. There are probably tons of problems with getting the
data there and back but it's a neat possibility. Who says a graphics card
is just for graphics? ;-)


May 16, 2004
Actually My research project involves doing things like BLAS on the GPU http://graphics.stanford.edu/projects/brookgpu/

in fact we benchmarked a lot of the blas stuff...
(the code for the benchmarks is available on the above website and in CVS)
on the matrix-vector operations the performance was quite impressive (SAXPY and
Dot)
however matrix-matrix multiply sucks on the GPU... we get the full bandwidth out
of the cache, but--full bandwidth out of the cache is half or a quarter the full
bandwidth out of the CPU cache... so there's no chance you win on matrix-matrix.

anyhow feel free to download our brook platform and try writing some GPU
programs yourself... (I recommend getting the CVS version right now--the
released version is falling behind in features)
and feel free to chat with me about what kinds of apps will work well on the
GPU... the answer is apps that reuse their data a finite number of times...
things that get huge cache performance on the CPU are not likely candidates.
--Daniel


In article <c87rg0$1ugl$1@digitaldaemon.com>, Ben Hinkle says...
>
>Simon Hobbs wrote:
>
>> In article <c85srd$268c$1@digitaldaemon.com>, Ben Hinkle says...
>>>
>>>Does float4 have value or reference semantics?
>>>I don't think I'd use a float4 myself since I'm not a game programmer but
>>>it has repeatedly come up about using (shortish) arrays with value
>>>semantics. Some generic "static array with value-semantics" would be cool.
>>>Right now to get something like it I'm using a struct with the type and
>>>length as template parameters. It works fine but is verbose.
>>>
>> 
>> 
>> Sorry, I'm not explaining myself properly.
>> 
>> float4 would have value semantics and would represent a vector hardware register (e.g. an SSE register in X86.) and use SIMD instructions to perform add/sub/mul/div/etc... It would be subject to all of the optimizations that the compiler can currently do on floats and ints.
>
>OK. My first thought was to use the inline assembler but now that you say it uses SSE registers I guess even with asm blocks you'd have to make sure the right registers are filled when you call add/sub/etc. That would mess up the data-flow optimizations. Still it is an option. It is kindof like bringing back the "register" storage attribute from C (shudder).
>
>> Modern C++ compilers support the use of these registers/instructions through intrinsics (or Dylan Cuthbert's extensions to GCC on PS2) and unless D can at least match them, then as a professional games programmer I'll never be able to justify the use of D - even in spite of all the other great language features :(
>
>If these GCC extensions work on the x86 then gdc could pick them up. DMD would take longer though.
>
>> But I'd actually like to see D go one step further by supporting these types in the language and leapfrogging C++ in an important way in the process.
>> 
>> I'm only interested in extreme performance, so making a struct that looks like a Cg type is pointless, although interesting :)
>
>D is young so the performance will certainly improve somewhat, maybe not to
>the extreme you are looking for. When I read about D it struck me as a
>slightly lower level version of Java/Csharp. I never expected it to have
>extreme performance of something like Fortran or Cg or even all the
>customizability of C++. So for me that's all bonus :-)
>Still some vectorized support could benefit both the game and scientific
>computing worlds.
>
>I was just googling to see if anyone has tried putting BLAS on the GPU and sure enough people are looking into it. See for example
>  http://wwwcg.in.tum.de/Research/data/Publications/sig03.pdf
>That would mean numerical algorithms run on the GPU instead of the CPU for the vectorized ops. There are probably tons of problems with getting the data there and back but it's a neat possibility. Who says a graphics card is just for graphics? ;-)
>
>


May 16, 2004
On Sat, 15 May 2004 11:36:40 -0700, Walter wrote:

> The language already supports vector operations on arrays of floats, doubles, or anything else. Currently, however, it is not implemented in the compiler.

As fare as I can see from  http://developer.nvidia.com/attach/6043 what Cg have and D is missing.

Is matrix multiplication, vector swizzling and write masking

I suggest the following syntax for D using the example from the Cg link.

float[4] vec1={4.0,-2.0,5.0,3.0};

float[2] vec2 =vec1[1,0];  // vec2 ={-2.0,4.0}
float scalar =vec1[3];     // scaler = 3.0
float[3] vec3=scalar;	   // vec3 = {3.0,3.0,3.0}

write masking

vec1[0,3]=vec3;     // vec1 = {3.0,-2.0,5.0,3.0}

Something that have been bordering me about D is
that a slice 0..4 means 0,1,2,3 and not 0,1,2,3,4
if you chose to use the comma notation for masking
I think that it would be better to have 0..4 as a short
for 0,1,2,3,4 instead of 0,1,2,3.


With matrix multiplication i think that is
better to use the more general Einstein summation.
Which would allow a very short notation for vector
calculations.

in this notation an affine (a*v+b) transformation on a vector would be written like this

double[4] vec1, vec2, b;
double[4][4] a;

  vec2[i=0..3]=a[i][j=0..3]*vec1[j] + b[i];


but on an array with 100 vectors you could transform with

double[4][100] arr1,arr2;

  arr2[i=0..3][k=0..99]=a[i][j=0..3]*arr1[j][k] + b[i];

the advantage of having this implemented in the core
language is to exploit the processors vector unit (MMX)
,a graphic processors (GPU) or a math unit like
http://www.clearspeed.com/ without rewriting the program in assembler for
the specific hardware.

Maybe it would be a good idea to have compiler modules for different types of hardware.

Knud
May 16, 2004
Knud Sørensen wrote:

>the advantage of having this implemented in the core language is to exploit the processors vector unit (MMX)
>,a graphic processors (GPU) or a math unit like http://www.clearspeed.com/ without rewriting the program in assembler for
>the specific hardware.
>  
>
I would argue that having matrix multiplication and such will bloat the language.  It should be a library feature.  I see no problem with writing it in assembler (as long as I don't have to write it <g>).  It would be better to have this as part of the standard library.  That language won't be able to provide much additional speed by hard-wiring things like MMX into the language.  Remember MMX and the like are designed to work well as language extensions in the first place.

Why not include, in the language, every useful hardware data-structure under-the-sun?  Data structures should only be put into the language when they make sense and can be done much cleaner then with libraries.

>Maybe it would be a good idea to have compiler modules for different types
>of hardware.
>
>  
>
The language shouldn't be tied to the hardware.  Its the job of library vendors to make porting hell not the language.

>Knud
>  
>


-- 
-Anderson: http://badmama.com.au/~anderson/
May 16, 2004
J Anderson wrote:
> I would argue that having matrix multiplication and such will bloat the language.  It should be a library feature.  I see no problem with writing it in assembler (as long as I don't have to write it <g>).  It would be better to have this as part of the standard library.  That language won't be able to provide much additional speed by hard-wiring things like MMX into the language.  Remember MMX and the like are designed to work well as language extensions in the first place.

If Phobos included some types and functions for these sorts of operations, compiler vendors would hypothetically be able to implement those operations as intrinsics.

 -- andy