View mode: basic / threaded / horizontal-split · Log in · Help
January 27, 2007
Re: seeding the pot for 2.0 features
On Thu, 25 Jan 2007 20:08:29 +0200, BCS <BCS@pathlink.com> wrote:
> Kristian Kilpi wrote:
>> On Wed, 24 Jan 2007 02:09:09 +0200, BCS <ao@pathlink.com> wrote:
>> [snip]
>>
>>> my choice: dynamic init structs and explicit context for delegates
>>>
>>> int i,j,k;
>>> // pickle i,j,k then use them
>>> auto (new struct {int i_ = i; int j_ = j; int k_ = k; }).delegate(int   
>>> m){return ((i_ * m)+j_)*m +k_;}
>>>
>>  While we are at it, why not just let the compiler to generate context  
>> for  delegates? ;)
>> E.g.
>>    int i, j;
>>   return int delegete(int m) {return i * m + j;}
>> ->
>>   return int (new struct {int i_ = i; int j_ = j}).delegete(int m)  
>> {return  i_ * m + j_;}
>
>
> I think that has been proposed but has some problems. For instance
>
>
> auto dg = int delegate(int m) {return i * m + j;}
>
> fnUsingDg(dg);	// doesn't need context
>
> i++;
> j++;	// did that effect the context?
>
> if(boolFn())
> 	return dg;	// now we need context
> else
> 			// or do we??
> 	return int delegate(int m) {return i + j;}
[snip]

Hmm, yep, that's problematic.
First, I tried to come up with some simple rules, For example, the context  
should be automatically generated when:

  1) A delegate is returned.

  2) A delegate is assigned to something that is not a local variable.
     A class object is considered to be a local when 'scope' is used with  
it.

But, nothing is ever so simple, isn't it? ;) They (the rules) would likely  
cause *way* more trouble than good.

However, it would be nice if one could tell the compiler to automatically  
create the context for a delegate (probably your explicit syntax should be  
possible too). For example, *something* like this:

  dg.createContext();  //compiler generates the context


Hmm, maybe the compiler should automatically generate context for  
anonymous delegates, after all, when they are returned from a function:

  int i, j;
  return delegate int(int m) {return i + j;};  //creates context

But not when it's returned via a variable:

  int i, j;
  auto dg = delegate int(int m) {return i + j;};
  return dg;  //no context is created

You would have to call 'createContext()' by yourself.

Preventing context from being generated should also be possible, of course:

  return (delegate int(int m) {return i + j;}).noContext();  //don't  
create context


I think implicit context for anonymous delegates returned from functions  
would reduce the number of errors people will make with them. But it's an  
exception to the rules, which is never good.
January 27, 2007
Re: seeding the pot for 2.0 features
I'll bite.  Here are the two features I consider most important:

1. Dynamic Closures:
See: 
http://lists.puremagic.com/pipermail/digitalmars-d/2006-August/007520.html

2. Low dimensional vectors as primitive types

Specifically I would like to see the types int, real, float etc. 
extended into 2-4 dimensional vectors.  ie. int2, real4, float3.

This would be exceptionally useful in many applications which require 
coordinate geometry.  Here is a very brief list:

Scientific Programs
Physics Simulation
Computer Graphics
Video Games
User Interfaces
Computational Geometry
Robotics
Fluid Simulation

etc.

I would prefer not to recount the number of times I have written my own 
vector library, and how tedious they are to create.  For most every 
language I learn it is the first thing I need to write, since so few are 
willing to provide a default implementation.  In my opinion, this is 
unacceptable for a problem which occurs so frequently.

One option is to extend the standard library with a vector-types class, 
but this is not nearly as nice a compiler level implementation.

1. The 90 degrees rotation trick
This is based on the following article:
http://www.flipcode.com/articles/article_fastervectormath.shtml

The basic idea is we use templates to expand vector expressions to 
achieve better register use and memory locality.  Here is a simple example:

a = b + c * d;

Normally, we would translate this as something like:

a = b.opAdd(c.opMul(d));

Which gives the following assembler:

//Do opMul
mov 	reg, c.x
mul	reg, d.x
mov 	tmp.x, reg
mov 	reg, c.y
mul	reg, d.y
mov 	tmp.y, reg
mov 	reg, c.z
mul	reg, d.z
mov 	tmp.z, reg

//Do opAdd
mov 	reg, tmp.x
add	reg, b.x
mov 	a.x, reg
mov 	reg, tmp.y
add	reg, b.y
mov 	a.y, reg
mov 	reg, tmp.z
add	reg, b.z
mov 	a.z, reg


This is not particularly efficient, since it requires the creation of a 
temporary vector to store the result from c * d.  A better strategy 
involves 'rotating the computation 90 degrees' and performing the 
expression on a per-component level:

//x - component
mov	reg, c.x
mul	reg, d.x
add	reg, b.x
mov	a.x, reg

//y - component
mov	reg, c.y
mul	reg, d.y
add	reg, b.y
mov	a.y, reg

//z - component
mov	reg, c.z
mul	reg, d.z
add	reg, b.z
mov	a.z, reg

The performance improvement becomes more substantial the longer the 
expression.  Since overloaded operators do not instantiate templates, 
there is no obvious way to obtain this result in the current language spec.

2. Architecture specific optimizations (SIMD)

For low dimensional arithmetic, many architectures provide specially 
optimized instruction for low dimensional vectors.  The problem is most 
languages do not exploit them.  Creating efficient SIMD code is 
impossible for a library, since each opAdd/opMul must be written using 
inline assembler and therefore incurs the overhead of a function call 
regardless.  This is worsened by the fact that moving to/from a vector 
register is typically very expensive.

A compiler level implementation can easily avoid these issues by 
assigning vector expressions to a register when passing them.  Moreover 
it is more portable than compiler intrinsics like MSVC's SSE extensions. 
 The implementation can easily emit fallback code if the architecture 
does not support SIMD instructions.

3. Swizzles

A swizzle is a reordering of the elements in a vector.  Shader languages 
like Cg or GLSL typically support them, given their utility in certain 
types of computations.  Here are some examples:

v.xxxx 	// Returns a vector with v.x broadcast to all components
v.xyz	// Returns only the xyz components of v
v.zyx	// Returns a vector consisting of the reverse of v's xyz components

Enumerating all possible swizzles within a template is impossible, and 
therefore requires one function per swizzle.  The result is massive code 
bloat, and many lines of automatically generated gibberish.  To get an 
idea at how many functions this requires, the total number of swizzles 
for 2-4 component vectors is 4^4 + 4^3 + 4^2 + 4 or 340.  Multiply that 
by the number of primitive types and the result becomes quite large.



-Mik
January 28, 2007
Re: seeding the pot for 2.0 features
A quite cool feature was something similair to Python's
__getattr__, __setattr__ special functions.

So if one writes

class foo
{
       ... // foo does not declare a function or member bar

       // special operators do not return anything
       opGetAttr(out T val, char[] identifier);
       opSetAttr(in T val, char[] identifier);
};

void test()
{
       foo test = new foo;
       test.bar = 5;
       assert( test.bar == 5 );
}

The idea is, that the compiler implicitly translates the
assignment/access to the undefined identifier bar to calls of
opSetAttr and opGetAttr.

Also functions should be be accessible this way, so a function
call test.foobar(...) should be translated. Of course any valid
function prototype must be made avaliable through propper op(S
G)etAttr overloads.

Eventually we could also think about op(G|S)etAttr being variadic
or template functions. The result would be some kind of late
linking capability.

The main application I'm thinking of are scripting language
bindings. One would have to write only one universal scripting
object class, that would be instanced for every object to be
accessed from D, being parameterized ro call the right interface
functions.

A call to say spam.eggs of a Python adapter class instance would
be translated into opGetAttr(f, "eggs"), which would give back a
function object, which call will result in the function calls of
the Python API to call the function in the python interpreter.

Wolfgang Draxinger
-- 
E-Mail address works, Jabber: hexarith@jabber.org, ICQ: 134682867
January 28, 2007
Re: seeding the pot for 2.0 features
Wolfgang Draxinger wrote:
> The main application I'm thinking of are scripting language
> bindings. One would have to write only one universal scripting
> object class, that would be instanced for every object to be
> accessed from D, being parameterized ro call the right interface
> functions.
> 
> A call to say spam.eggs of a Python adapter class instance would
> be translated into opGetAttr(f, "eggs"), which would give back a
> function object, which call will result in the function calls of
> the Python API to call the function in the python interpreter.
> 

Have you seen Pyd, I wonder?

http://pyd.dsource.org/

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://pyd.dsource.org
January 28, 2007
Re: seeding the pot for 2.0 features
Mikola Lysenko wrote:
> I'll bite.  Here are the two features I consider most important:
> 
> 1. Dynamic Closures:
> See: 
> http://lists.puremagic.com/pipermail/digitalmars-d/2006-August/007520.html
> 
> 2. Low dimensional vectors as primitive types
> 
> Specifically I would like to see the types int, real, float etc. 
> extended into 2-4 dimensional vectors.  ie. int2, real4, float3.
> 
> This would be exceptionally useful in many applications which require 
> coordinate geometry.  Here is a very brief list:
> 
> Scientific Programs
> Physics Simulation
> Computer Graphics
> Video Games
> User Interfaces
> Computational Geometry
> Robotics
> Fluid Simulation
> 
> etc.
> 
> I would prefer not to recount the number of times I have written my own 
> vector library, and how tedious they are to create.  For most every 
> language I learn it is the first thing I need to write, since so few are 
> willing to provide a default implementation.  In my opinion, this is 
> unacceptable for a problem which occurs so frequently.
> 
> One option is to extend the standard library with a vector-types class, 
> but this is not nearly as nice a compiler level implementation.
> 
> 1. The 90 degrees rotation trick
> This is based on the following article:
> http://www.flipcode.com/articles/article_fastervectormath.shtml
> 
> The basic idea is we use templates to expand vector expressions to 
> achieve better register use and memory locality.  Here is a simple example:
> 
> a = b + c * d;
> 
> Normally, we would translate this as something like:
> 
> a = b.opAdd(c.opMul(d));
> 
> Which gives the following assembler:
> 
> //Do opMul
> mov     reg, c.x
> mul    reg, d.x
> mov     tmp.x, reg
> mov     reg, c.y
> mul    reg, d.y
> mov     tmp.y, reg
> mov     reg, c.z
> mul    reg, d.z
> mov     tmp.z, reg
> 
> //Do opAdd
> mov     reg, tmp.x
> add    reg, b.x
> mov     a.x, reg
> mov     reg, tmp.y
> add    reg, b.y
> mov     a.y, reg
> mov     reg, tmp.z
> add    reg, b.z
> mov     a.z, reg
> 
> 
> This is not particularly efficient, since it requires the creation of a 
> temporary vector to store the result from c * d.  A better strategy 
> involves 'rotating the computation 90 degrees' and performing the 
> expression on a per-component level:
> 
> //x - component
> mov    reg, c.x
> mul    reg, d.x
> add    reg, b.x
> mov    a.x, reg
> 
> //y - component
> mov    reg, c.y
> mul    reg, d.y
> add    reg, b.y
> mov    a.y, reg
> 
> //z - component
> mov    reg, c.z
> mul    reg, d.z
> add    reg, b.z
> mov    a.z, reg
> 
> The performance improvement becomes more substantial the longer the 
> expression.  Since overloaded operators do not instantiate templates, 
> there is no obvious way to obtain this result in the current language spec.
> 
> 2. Architecture specific optimizations (SIMD)
> 
> For low dimensional arithmetic, many architectures provide specially 
> optimized instruction for low dimensional vectors.  The problem is most 
> languages do not exploit them.  Creating efficient SIMD code is 
> impossible for a library, since each opAdd/opMul must be written using 
> inline assembler and therefore incurs the overhead of a function call 
> regardless.  This is worsened by the fact that moving to/from a vector 
> register is typically very expensive.
> 
> A compiler level implementation can easily avoid these issues by 
> assigning vector expressions to a register when passing them.  Moreover 
> it is more portable than compiler intrinsics like MSVC's SSE extensions. 
>  The implementation can easily emit fallback code if the architecture 
> does not support SIMD instructions.
> 
> 3. Swizzles
> 
> A swizzle is a reordering of the elements in a vector.  Shader languages 
> like Cg or GLSL typically support them, given their utility in certain 
> types of computations.  Here are some examples:
> 
> v.xxxx     // Returns a vector with v.x broadcast to all components
> v.xyz    // Returns only the xyz components of v
> v.zyx    // Returns a vector consisting of the reverse of v's xyz 
> components
> 
> Enumerating all possible swizzles within a template is impossible, and 
> therefore requires one function per swizzle.  The result is massive code 
> bloat, and many lines of automatically generated gibberish.  To get an 
> idea at how many functions this requires, the total number of swizzles 
> for 2-4 component vectors is 4^4 + 4^3 + 4^2 + 4 or 340.  Multiply that 
> by the number of primitive types and the result becomes quite large.
> 
> 
> 
> -Mik

Low dimensional primitive vectors would rock.  I have to wonder though 
if array operations would do the trick instead though, especially with 
static arrays.

uint4 foo;
uint4 bar = someDefaultValue;
foo.x = x;
foo.y = y;
foo.z = z;
uint4 result = foo + bar;

becomes

uint[4] foo;
uint[4] bar = someDefaultValue;
foo[0] = x;
foo[1] = y;
foo[2] = z;
uint[4] result = foo + bar;

Maybe not as readable, depending on the context, but that should 
suffice, right?

The only trick then is to make sure the compiler recognizes these cases 
and optimizes for them.  For example large dynamic arrays which are 
worthy of some fixed overhead before jumping into simd instructions or 
whatever, whereas the possibility of fixed overhead may change the way 
these small static things get optimized.  Point is, maybe this is just a 
quality of implementation issue once array operations are added?

Swizzling not templatable, really? ...

import std.stdio;
import std.traits;

template doAssignment(dchar[] ordering, uint index = 0)
{
	static if ( index < ordering.length )
	{
		// dummy will get optimized away by dmd
		auto dummy = temp[index] = vector[cast(uint)ordering[index]];
		mixin doAssignment!(ordering, index + 1);
	}
}

void swizzle(T, dchar[] ordering)(T vector)
{
	T temp;
	static if ( !isStaticArray!(T) )
		temp.length = vector.length;
	
	mixin doAssignment!(ordering);
	vector[0..ordering.length] = temp[0..ordering.length];
	
	static if ( !isStaticArray!(T) )
		delete temp;
}

void main()
{
	int[4] point = [7,73,42,5];
	writefln( point ); // prints [7,73,42,5]
	swizzle!(int[],[3,1,2,0])( point );
	writefln( point ); // prints [5,73,42,7]
	
	real[] array = [81345.536,5.67,43.351,0.0,932.4,0.03,0.9852,57];
	writefln( array );
	swizzle!(real[],[0,1,2,4,3,5])(array);
	writefln( array );
}

Now this is suboptimal, but I have made an effort to get all of the 
ordering values at compile time.  Now we can hand those to a template 
and have it return information about optimal order of assignment and 
temporary usage.  Now when you need to use said information, it should 
be doable either by clever writing of the assignment operations or by 
pasting together stuff using mixins.  As for the fact that swizzle is a 
function, well it will probably get inlined, and if it doesn't, it 
should (quality of implementation issue).
January 28, 2007
Re: seeding the pot for 2.0 features
Chad J wrote:
> Low dimensional primitive vectors would rock.  I have to wonder though 
> if array operations would do the trick instead though, especially with 
> static arrays.
> 
> uint4 foo;
> uint4 bar = someDefaultValue;
> foo.x = x;
> foo.y = y;
> foo.z = z;
> uint4 result = foo + bar;
> 
> becomes
> 
> uint[4] foo;
> uint[4] bar = someDefaultValue;
> foo[0] = x;
> foo[1] = y;
> foo[2] = z;
> uint[4] result = foo + bar;
> 
> Maybe not as readable, depending on the context, but that should 
> suffice, right?
> 
> The only trick then is to make sure the compiler recognizes these cases 
> and optimizes for them.  For example large dynamic arrays which are 
> worthy of some fixed overhead before jumping into simd instructions or 
> whatever, whereas the possibility of fixed overhead may change the way 
> these small static things get optimized.  Point is, maybe this is just a 
> quality of implementation issue once array operations are added?
> 
> Swizzling not templatable, really? ...
> 
> import std.stdio;
> import std.traits;
> 
> template doAssignment(dchar[] ordering, uint index = 0)
> {
>     static if ( index < ordering.length )
>     {
>         // dummy will get optimized away by dmd
>         auto dummy = temp[index] = vector[cast(uint)ordering[index]];
>         mixin doAssignment!(ordering, index + 1);
>     }
> }
> 
> void swizzle(T, dchar[] ordering)(T vector)
> {
>     T temp;
>     static if ( !isStaticArray!(T) )
>         temp.length = vector.length;
>     
>     mixin doAssignment!(ordering);
>     vector[0..ordering.length] = temp[0..ordering.length];
>     
>     static if ( !isStaticArray!(T) )
>         delete temp;
> }
> 
> void main()
> {
>     int[4] point = [7,73,42,5];
>     writefln( point ); // prints [7,73,42,5]
>     swizzle!(int[],[3,1,2,0])( point );
>     writefln( point ); // prints [5,73,42,7]
>     
>     real[] array = [81345.536,5.67,43.351,0.0,932.4,0.03,0.9852,57];
>     writefln( array );
>     swizzle!(real[],[0,1,2,4,3,5])(array);
>     writefln( array );
> }
> 
> Now this is suboptimal, but I have made an effort to get all of the 
> ordering values at compile time.  Now we can hand those to a template 
> and have it return information about optimal order of assignment and 
> temporary usage.  Now when you need to use said information, it should 
> be doable either by clever writing of the assignment operations or by 
> pasting together stuff using mixins.  As for the fact that swizzle is a 
> function, well it will probably get inlined, and if it doesn't, it 
> should (quality of implementation issue).


Right.  However, compare the syntax in the following two cases:

real4 a, b;
a = b.wzyx

versus:

real[4] a, b;
a = swizzle!(real[], [3, 2, 1, 0])(b);

There is also a chance that the compiler may miss inlining the swizzle, 
resulting in code bloat, (as you pointed out.)  Add in the fact that a 
compiler can exploit instructions like SSE's pshuf, and it becomes 
pretty clear that for low-d vectors a compiler level implementation is 
superior.

On the topic of using array operations as a replacement for 
low-dimension vectors, I still have some mixed feelings about it.  I 
think higher dimensional array ops are of dubious value, and vaguely 
reminiscent of APL.  For most applications, the higher dimension array 
operations are overkill, and they will inevitably commit horrible acts 
of obfuscation:

bool stricmp(char[] str1, char[] str2)
{
	return ((str1 ^ str2) & ~('a' - 'A')) == 0;
}

Faced with such monstrosities, it might be best to keep vector code in 
the lower dimensions where it is most strongly connected to its 
geometric meaning.


-Mik
January 28, 2007
Re: seeding the pot for 2.0 features [small vectors]
Mikola Lysenko wrote:
> I'll bite.  Here are the two features I consider most important:
> 
> 2. Low dimensional vectors as primitive types
> 
> Specifically I would like to see the types int, real, float etc. 
> extended into 2-4 dimensional vectors.  ie. int2, real4, float3.
> 
> This would be exceptionally useful in many applications which require 
> coordinate geometry.  Here is a very brief list:
> 
> Scientific Programs
> Physics Simulation
> Computer Graphics
> Video Games
> User Interfaces
> Computational Geometry
> Robotics
> Fluid Simulation

It's still a tiny fraction of the number of applications that use, say, 
strings.  So the ubiquity argument for inclusion is pretty weak, I think.

> 
> etc.
> 
> I would prefer not to recount the number of times I have written my own 
> vector library, and how tedious they are to create.  For most every 
> language I learn it is the first thing I need to write, since so few are 
> willing to provide a default implementation.  In my opinion, this is 
> unacceptable for a problem which occurs so frequently.

Again, it occurs for *you* frequently (and I'll admit for me too), but 
still the vast majority of programmers out there have never had a 
burning need for a float3 with all the bells and whistles.

If the need for a vector library were truly ubiquitous, it seems like it 
would be easier to find a decent implementation on the web, or that one 
would at least be available in the standard library of the given 
programming language.

As far as D is concerned, Helix has a pretty decent implementation.  See 
http://www.dsource.org/projects/helix.  It lacks Vector2's but I've 
added them to my own copy and I'd be happy to send it to you if you like.

> One option is to extend the standard library with a vector-types class, 
> but this is not nearly as nice a compiler level implementation.

I'm not convinced that a compiler-level implementation of these things 
is necessary.

> 1. The 90 degrees rotation trick
> This is based on the following article:
> http://www.flipcode.com/articles/article_fastervectormath.shtml
> ...
> The performance improvement becomes more substantial the longer the 
> expression.  Since overloaded operators do not instantiate templates, 
> there is no obvious way to obtain this result in the current language spec.

I thought the new opAssign was supposed to be enough to make expression 
templates work in D.  Don Clugston even posted a proof-of-concept that 
would use templates to rearrange expressions a while back.

Anyway, for this one, I think preferred approach is to make the core 
language expressive enough so that tricks like expression templates can 
work, rather than implementing such optimizations for particular cases 
in the compiler.

> 2. Architecture specific optimizations (SIMD)
> 
> For low dimensional arithmetic, many architectures provide specially 
> optimized instruction for low dimensional vectors.  The problem is most 
> languages do not exploit them.  Creating efficient SIMD code is 
> impossible for a library, since each opAdd/opMul must be written using 
> inline assembler and therefore incurs the overhead of a function call 
> regardless.  This is worsened by the fact that moving to/from a vector 
> register is typically very expensive.
> 
> A compiler level implementation can easily avoid these issues by 
> assigning vector expressions to a register when passing them.  Moreover 
> it is more portable than compiler intrinsics like MSVC's SSE extensions. 
>  The implementation can easily emit fallback code if the architecture 
> does not support SIMD instructions.

Again, this sounds like it would be better to solve the generic issue of 
libraries not being able to take maximum advantage of existing hardware 
optimizations, like the issue with ASM methods not being inline-able.

> 3. Swizzles
> 
> A swizzle is a reordering of the elements in a vector.  Shader languages 
> like Cg or GLSL typically support them, given their utility in certain 
> types of computations.  Here are some examples:
> 
> v.xxxx     // Returns a vector with v.x broadcast to all components
> v.xyz    // Returns only the xyz components of v
> v.zyx    // Returns a vector consisting of the reverse of v's xyz 
> components
> 
> Enumerating all possible swizzles within a template is impossible, and 
> therefore requires one function per swizzle.  The result is massive code 
> bloat, and many lines of automatically generated gibberish.  To get an 
> idea at how many functions this requires, the total number of swizzles 
> for 2-4 component vectors is 4^4 + 4^3 + 4^2 + 4 or 340.  Multiply that 
> by the number of primitive types and the result becomes quite large.

Are swizzles all that useful outside of Shader languages?  Part of the 
reason they are useful in shaders is that GPU's can do a swizzles for 
free.  Can CPUs (I dunno)?  Another part of the reason is that all 
operations happen on 4-components no matter what, so if you want to 
multiply a scalar inside a vector times another vector, you might as 
well write it as v.xxxx * v2.  A third reason swizzles are useful on 
GPUs is because you often end up stuffing completely unrelated junk into 
them in the name of efficiency.  I'm not sure that's necessary or useful 
on a CPU architecture that isn't quite as tied to float4 as GPUs are.
--

I'm sure I'm among those who would use built-in small vector classes, 
but I don't think it's clear that they should be built into the compiler 
of a general purpose programming language.

On the other hand, if you can convince me that it really is impossible 
to maximize performance (while maintaining convenience) any other way, 
then I could be swayed.  Also if CPUs themselves are moving in this 
direction, then that also is something to think about.  By that I mean 
if float4 becomes (or already is) what could be considered a "native 
type" on the major desktop CPUs, then I can see that it would make sense 
for a programming language to reflect that by making it a built-in type.

--bb
January 28, 2007
Re: seeding the pot for 2.0 features
Kirk McDonald wrote:

> Have you seen Pyd, I wonder?
> 
> http://pyd.dsource.org/

Yes I have, but I was more thinking of using Python modules from
D.

Having such an interface would be cool for other things, too. For
example one could use this as a convenient interface to things
like the OpenGL API. For my 3D engine wrapper classes and one is
there for abstracting stuff like textures. Textures have several
parameters which are set through glTexParameter and glTexEnv.
Currently for each parameter there is a pair of property
functions. But each OpenGL extension that extends the set of
avaliable texture parameters requires to either extend, or to
derive the texture class. However most texture parameters are
numeric. The class could have a public avaliable AA, which maps
property names to the OpenGL tokens. Then a universal handler
function would use that array to get the OpenGL token from the
requested member and perform the function calls. There would be
a basic set of functions of course, but upon extension loading
that extension wrapper could extend the maping AA apropriately.
This way the texture class wouldn't have to be extended/derived,
keeping the codebase small and consistent.

Wolfgang Draxinger
-- 
E-Mail address works, Jabber: hexarith@jabber.org, ICQ: 134682867
January 28, 2007
Re: seeding the pot for 2.0 features [small vectors]
Bill Baxter wrote:
> Mikola Lysenko wrote:
> 
>> I'll bite.  Here are the two features I consider most important:
>>
>> 2. Low dimensional vectors as primitive types
>>
>> Specifically I would like to see the types int, real, float etc. 
>> extended into 2-4 dimensional vectors.  ie. int2, real4, float3.
>>
>> This would be exceptionally useful in many applications which require 
>> coordinate geometry.  Here is a very brief list:
>>
>> Scientific Programs
>> Physics Simulation
>> Computer Graphics
>> Video Games
>> User Interfaces
>> Computational Geometry
>> Robotics
>> Fluid Simulation
> 
> 
> It's still a tiny fraction of the number of applications that use, say, 
> strings.  So the ubiquity argument for inclusion is pretty weak, I think.
> 

What applications don't use vector instructions?

Also, I think it's more important to consider what /D applications/ will 
be using SIMD instructions, rather than what applications in general do 
or do not use coordinate geometry.  That's because a lot of those 
applications may not even be written in D or have anything to do with D, 
like the mass of stuff written in dynamic languages like perl, python, 
ruby, etc.

I have to wonder, has any language out there really given good support 
for SIMD primitives, besides assembly?  I think D could stand a lot to 
gain here.  That said, I don't mind if it's done in a library as long as 
it looks polished and is not cumbersome.

>>
>> etc.
>>
>> I would prefer not to recount the number of times I have written my 
>> own vector library, and how tedious they are to create.  For most 
>> every language I learn it is the first thing I need to write, since so 
>> few are willing to provide a default implementation.  In my opinion, 
>> this is unacceptable for a problem which occurs so frequently.
> 
> 
> Again, it occurs for *you* frequently (and I'll admit for me too), but 
> still the vast majority of programmers out there have never had a 
> burning need for a float3 with all the bells and whistles.
> 
> If the need for a vector library were truly ubiquitous, it seems like it 
> would be easier to find a decent implementation on the web, or that one 
> would at least be available in the standard library of the given 
> programming language.
> 
> As far as D is concerned, Helix has a pretty decent implementation.  See 
> http://www.dsource.org/projects/helix.  It lacks Vector2's but I've 
> added them to my own copy and I'd be happy to send it to you if you like.
> 
>> One option is to extend the standard library with a vector-types 
>> class, but this is not nearly as nice a compiler level implementation.
> 
> 
> I'm not convinced that a compiler-level implementation of these things 
> is necessary.
> 
>> 1. The 90 degrees rotation trick
>> This is based on the following article:
>> http://www.flipcode.com/articles/article_fastervectormath.shtml
>> ...
>> The performance improvement becomes more substantial the longer the 
>> expression.  Since overloaded operators do not instantiate templates, 
>> there is no obvious way to obtain this result in the current language 
>> spec.
> 
> 
> I thought the new opAssign was supposed to be enough to make expression 
> templates work in D.  Don Clugston even posted a proof-of-concept that 
> would use templates to rearrange expressions a while back.
> 
> Anyway, for this one, I think preferred approach is to make the core 
> language expressive enough so that tricks like expression templates can 
> work, rather than implementing such optimizations for particular cases 
> in the compiler.
> 
>> 2. Architecture specific optimizations (SIMD)
>>
>> For low dimensional arithmetic, many architectures provide specially 
>> optimized instruction for low dimensional vectors.  The problem is 
>> most languages do not exploit them.  Creating efficient SIMD code is 
>> impossible for a library, since each opAdd/opMul must be written using 
>> inline assembler and therefore incurs the overhead of a function call 
>> regardless.  This is worsened by the fact that moving to/from a vector 
>> register is typically very expensive.
>>
>> A compiler level implementation can easily avoid these issues by 
>> assigning vector expressions to a register when passing them.  
>> Moreover it is more portable than compiler intrinsics like MSVC's SSE 
>> extensions.  The implementation can easily emit fallback code if the 
>> architecture does not support SIMD instructions.
> 
> 
> Again, this sounds like it would be better to solve the generic issue of 
> libraries not being able to take maximum advantage of existing hardware 
> optimizations, like the issue with ASM methods not being inline-able.
> 
>> 3. Swizzles
>>
>> A swizzle is a reordering of the elements in a vector.  Shader 
>> languages like Cg or GLSL typically support them, given their utility 
>> in certain types of computations.  Here are some examples:
>>
>> v.xxxx     // Returns a vector with v.x broadcast to all components
>> v.xyz    // Returns only the xyz components of v
>> v.zyx    // Returns a vector consisting of the reverse of v's xyz 
>> components
>>
>> Enumerating all possible swizzles within a template is impossible, and 
>> therefore requires one function per swizzle.  The result is massive 
>> code bloat, and many lines of automatically generated gibberish.  To 
>> get an idea at how many functions this requires, the total number of 
>> swizzles for 2-4 component vectors is 4^4 + 4^3 + 4^2 + 4 or 340.  
>> Multiply that by the number of primitive types and the result becomes 
>> quite large.
> 
> 
> Are swizzles all that useful outside of Shader languages?  Part of the 
> reason they are useful in shaders is that GPU's can do a swizzles for 
> free.  Can CPUs (I dunno)?  Another part of the reason is that all 
> operations happen on 4-components no matter what, so if you want to 
> multiply a scalar inside a vector times another vector, you might as 
> well write it as v.xxxx * v2.  A third reason swizzles are useful on 
> GPUs is because you often end up stuffing completely unrelated junk into 
> them in the name of efficiency.  I'm not sure that's necessary or useful 
> on a CPU architecture that isn't quite as tied to float4 as GPUs are.
> -- 
> 
> I'm sure I'm among those who would use built-in small vector classes, 
> but I don't think it's clear that they should be built into the compiler 
> of a general purpose programming language.
> 
> On the other hand, if you can convince me that it really is impossible 
> to maximize performance (while maintaining convenience) any other way, 
> then I could be swayed.  Also if CPUs themselves are moving in this 
> direction, then that also is something to think about.  By that I mean 
> if float4 becomes (or already is) what could be considered a "native 
> type" on the major desktop CPUs, then I can see that it would make sense 
> for a programming language to reflect that by making it a built-in type.
> 
> --bb

I'd say float4 has been a native type for a couple years now.  A desktop 
computer that doesn't have SSE or Altivec or some other SIMD is probably 
quite antiquated and not running D programs.  This is because SSE was 
around in 1999 running on 450 MHz CPUs.
http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

The only computers I know of that lack float4 are smartphone and PDA 
type devices running modern ARM processors.  Even some of the recent 
ARM-XSCALE processors have MMX instructions, which doesn't give float4 
but does give short4 and int2.  I'm also not sure about modern 
supercomputers and the like, since I haven't worked with those.
January 28, 2007
Re: seeding the pot for 2.0 features [small vectors]
Bill Baxter wrote:
> Mikola Lysenko wrote:
>> 2. Low dimensional vectors as primitive types
>>
>> This would be exceptionally useful in many applications which require 
>> coordinate geometry.  Here is a very brief list:
>>
>> Scientific Programs
>> Physics Simulation
>> Computer Graphics
>> Video Games
>> User Interfaces
>> Computational Geometry
>> Robotics
>> Fluid Simulation
> 
> It's still a tiny fraction of the number of applications that use, say, 
> strings.  So the ubiquity argument for inclusion is pretty weak, I think.
> 

This was the same sentiment I felt several years ago, but overtime I 
have come to realize that vectors are probably one of the most widely 
used, yet least supported programming structures.  The thought 
eventually crystallized for me while learning ActionScript 3.  I was 
reading through the example projects they provided with the SDK, and was 
struck by the fact that each project used a custom 2-d vector class. 
Given the prevalence of such a construct, it seems that a language ought 
to provide a good common implementation.

Pretty much any application that eventually needs to produce graphical 
output requires the use of vectors somewhere down the line.  Be it a 
simple GUI reading in mouse clicks, a video game or a scientific GUI. 
The idea of low dimensional vectors is firmly rooted in geometry, which 
is one of the most basic branches of mathematics.  Vectors are easily 
used more often than complex numbers, which D already supports.

I would like to see some more detailed usage statistics on vectors, 
however this is difficult to reliably check.  A Google code search on 
vector vs. complex gives the following results, though this may be 
somewhat skewed given that vector is a loaded word in computer science:

Vector Math: 300k  results
http://www.google.com/codesearch?q=vector+math&hl=en&btnG=Search+Code

Complex Math: 100k results
http://www.google.com/codesearch?q=complex+math&hl=en&btnG=Search+Code


I suspect that there are probably better ways to measure the use of 
vectors, especially since the synonym Point is often used in C++ 
programs.  (std::vector creates name conflicts).  Also contractions like 
vec, vect, vecr are often used instead of a full word.  As a percentage 
of total applications containing vectors, it may be difficult to 
estimate.  From my personal experience, I would guess that they are used 
very frequently.

For such a widespread structure, there is remarkably little language 
support.  For example, the Java standard library contains hundreds of 
classes.  It has everything from a GregorianCalendar to a Midi 
synthesizer.  It does not have vector support.  Even in newer languages 
like Python, there is still an inordinate amount of time wasted on 
reimplementing vectors.

Part of the problem is that vectors are pretty easy to hack together, so 
everyone basically shlocks something that barely works and uses that. 
The other option is that they get hung up on the damn vector class 
before they even write the project and end up wasting all their time on 
that instead.  There are literally hundreds of vector libraries for C++ 
alone.  A Google search for vector library gives over 26 million 
results!  I've even contributed a few of my own, such as:

http://www.assertfalse.com/downloads/vecmat-0.1.zip

Moreover, interoperability is a big concern.  Many times I have tried to 
interface with some code online, perhaps a model loader or some 
numerical application.  Since so few languages have standard vectors, 
they all use custom formats for processing data.  In order to add them 
to my project, I typically have to do a find/replace on the vector type 
they are using and pray to god it compiles.  Even then, there are 
sometimes sneaking problems that only show up after I've finished 
importing and connecting everything.


> 
> As far as D is concerned, Helix has a pretty decent implementation.  See 
> http://www.dsource.org/projects/helix.  It lacks Vector2's but I've 
> added them to my own copy and I'd be happy to send it to you if you like.
> 
>> One option is to extend the standard library with a vector-types 
>> class, but this is not nearly as nice a compiler level implementation.
> 
> I'm not convinced that a compiler-level implementation of these things 
> is necessary.
> 

Helix is not a bad project, but in order for a standard vector to be 
useful it needs to come packaged with the compiler.  The temptation to 
roll your own is too great otherwise.  Even putting it in the standard 
library is not enough to prevent people from reinventing the wheel, as 
we have seen by the numerous variations on C++'s std::list.  Ideally, 
small vectors should be in the global namespace, right alongside complex 
numbers and dynamic arrays.

Effective vector code needs correct data alignment, instruction 
scheduling and register use.  Each of these issues is most effectively 
handled in the compiler/code gen stage, and therefore suggests that at 
the very least the compiler implementation ought to be aware of the 
vector type in some way.  By applying the "D Builtin Rationale," it is 
easy to see that vectors meet all the required criteria.

An optimal strategy would be to have the vector types builtin, with a 
simple per-component multiplication defined, and a separate standard 
math library for doing more complex operations.  Here is an example:

import std.vecmath;

float4 a = [1, 2, 3, 4];
float4 b = [2, 3, 4, 5];
float4 c = a * b;		//c = [2, 6, 12, 20]
float  d = dot(a, b);		//Defined in std.vecmath

-Mik
1 2 3 4 5
Top | Discussion index | About this forum | D home