| Thread overview | |||||||||
|---|---|---|---|---|---|---|---|---|---|
|
February 05, 2012 Re: std.simd module | ||||
|---|---|---|---|---|
| ||||
> First criticism I expect is for many to insist on a class-style vector > library, which I personally think has no place as a low level, portable API. > Everyone has a different idea of what the perfect vector lib should look > like, and it tends to change significantly with respect to its application. > - Thing like toDouble, toFloat don't map to vectors if they change the number of elements. - What's getX(byte16), swizzling makes sense for float4/double2, not sure about the rest. - Using named free functions for operands (or, complement, neg) is overly verbose. So indeed my proposal is to prefer GLSL-like syntax. // construction conversion auto f = float4(1.0, 2.0f, 3, 4); auto f2 = v.yxzw; // using opDispatch auto d = double2(v.wy); auto d = double2(v.get!(0), v.get!(1)); // probably someone knows a trick for compile time indexing (v[0]) auto f3 = float4(1.0, d, 2); double d = f3.z; // a lot of operands can be mapped and are already f |= f2; f = f & ~f2; // named functions for the rest auto f1 = float4.loadAligned(p); auto f2 = float4.loadUnaligned(p); auto f3 = float4.broadcast(1.0f); > I feel this flat API is easier to implement, maintain, and understand, and > I expect the most common use of this lib will be in the back end of peoples > own vector/matrix/linear algebra libs that suit their apps. > Phobos should provide at least basic vector and matrix implementations. | ||||
February 05, 2012 Re: std.simd module | ||||
|---|---|---|---|---|
| ||||
Attachments:
| On 5 February 2012 02:17, Martin Nowak <dawg@dawgfoto.de> wrote: > First criticism I expect is for many to insist on a class-style vector >> library, which I personally think has no place as a low level, portable >> API. >> Everyone has a different idea of what the perfect vector lib should look >> like, and it tends to change significantly with respect to its >> application. >> >> > - Thing like toDouble, toFloat don't map to vectors if they change the > number of elements. > You'll notice my comment above those functions, I refer that this is a perfect opportunity for multiple return values, and short of that, maybe I'll remove the functions, or change them in some way. - What's getX(byte16), swizzling makes sense for float4/double2, not sure > about the rest. > I agree that's not clear, I had intended to add some asserts limiting those to 2-4d vectors, or actually change the names of those functions completely. They really just call through to a broadcast swizzle!(). - Using named free functions for operands (or, complement, neg) is overly > verbose. > Is it possible to implement global operators like this? I think it's important to keep the functions there for those anyway... it offers explicit versioning, and some architectures may have non trivial implementations for those functions (for instance, neg requires 0-x on some hardware) The operators are still supported if you prefer to use them, obviously. So indeed my proposal is to prefer GLSL-like syntax. > > // construction conversion > auto f = float4(1.0, 2.0f, 3, 4); > auto f2 = v.yxzw; // using opDispatch > I planed to do this, but so far I've spent a lot of time with swizzle!(). Getting that right is tricky, and that's the meat of all permutation operations. Everything else will be layers of prettiness over that. auto d = double2(v.wy); > auto d = double2(v.get!(0), v.get!(1)); // probably someone knows a > trick for compile time indexing (v[0]) > swizzle!() already does this work (compile time), I'll be wrapping more handy helpers around that when it's complete. auto f3 = float4(1.0, d, 2); > double d = f3.z; > This API will never offer this operation. It is the single worst violation of the API, and the fastest way to make the whole implementation pointless. Swapping register types is often the worst hazard a CPU is capable of. Casting to scalar will only be implemented with explicit functions, and performance hazards detailed in the documentation, which hopefully people will read, since they'll need to look up the function to use it ;) // a lot of operands can be mapped and are already > f |= f2; > f = f & ~f2; > Sure, what's your point? You'll see I use them all over the place. That said, encouraging usage of functions like andn() in this case will result in an optimisation on some hardware which the compiler is probably not capable of without specific tweaking by someone who knows what they're doing. It's really bad form to depend on an optimiser to fix my code, when you could have just given it the appropriate instruction right up front. Especially true for an open source project like D where contributions of that sort are unreliable, and the number of people qualified to improve the optimiser in that way are very low. I feel this flat API is easier to implement, maintain, and understand, and >> I expect the most common use of this lib will be in the back end of >> peoples >> own vector/matrix/linear algebra libs that suit their apps. >> >> Phobos should provide at least basic vector and matrix implementations. > It will, but that's a separate (higher level) module, although this provides most foundational vector operations. I want to keep this level primitive and simple. If there's any dispute over the implementation of this layer, just imagine the dispute when trying to implement matrix and quaternion libs... they can be so context specific. Consider the differences between realtime and scientific uses... This library can't really go wrong at that level, but there's a LOT more to consider when working on that layer. | |||
February 05, 2012 Re: std.simd module | ||||
|---|---|---|---|---|
| ||||
Am 05.02.2012, 02:13 Uhr, schrieb Manu <turkeyman@gmail.com>:
> On 5 February 2012 03:08, Martin Nowak <dawg@dawgfoto.de> wrote:
>
>> Let me restate the main point.
>> Your approach to a higher level module wraps intrinsics with named
>> functions.
>> There is little gain in making simd(AND, f, f2) to and(f, f2) when you can
>> easily take this to the level GLSL achieves.
>>
>
> What is missing to reach that level in your opinion? I think I basically
> offer that (with some more work)
> It's not clear to me what you object to...
> I'm not prohibiting the operators, just adding the explicit functions,
> which may be more efficient in certain cases (they receive the version).
>
> Also the 'gains' of wrapping an intrinsic in an almost identical function
> are, portability, and potential optimisation for hardware versioning. I'm
> specifically trying to build something that's barely above the intrinsics
> here, although a lot of the more arcane intrinsics are being collated into
> their typically useful functionality.
>
> Are you just focused on the primitive math ops, or something broader?
GLSL achieves very clear and simple to write construction and conversion of values.
I think wrapping the core.simd vector types in an alias this struct makes it a snap
to define conversion through constructors and swizzling through properties/opDispatch.
Then you can overload operands to do the implementation specific stuff and add named methods
for the rest.
| ||||
February 05, 2012 Re: std.simd module | ||||
|---|---|---|---|---|
| ||||
Attachments:
| On 5 February 2012 03:37, Martin Nowak <dawg@dawgfoto.de> wrote:
> Am 05.02.2012, 02:13 Uhr, schrieb Manu <turkeyman@gmail.com>:
>
>
> On 5 February 2012 03:08, Martin Nowak <dawg@dawgfoto.de> wrote:
>>
>> Let me restate the main point.
>>> Your approach to a higher level module wraps intrinsics with named
>>> functions.
>>> There is little gain in making simd(AND, f, f2) to and(f, f2) when you
>>> can
>>> easily take this to the level GLSL achieves.
>>>
>>>
>> What is missing to reach that level in your opinion? I think I basically
>> offer that (with some more work)
>> It's not clear to me what you object to...
>> I'm not prohibiting the operators, just adding the explicit functions,
>> which may be more efficient in certain cases (they receive the version).
>>
>> Also the 'gains' of wrapping an intrinsic in an almost identical function are, portability, and potential optimisation for hardware versioning. I'm specifically trying to build something that's barely above the intrinsics here, although a lot of the more arcane intrinsics are being collated into their typically useful functionality.
>>
>> Are you just focused on the primitive math ops, or something broader?
>>
>
> GLSL achieves very clear and simple to write construction and conversion of values.
>
> I think wrapping the core.simd vector types in an alias this struct makes
> it a snap
> to define conversion through constructors and swizzling through
> properties/opDispatch.
> Then you can overload operands to do the implementation specific stuff and
> add named methods
> for the rest.
>
So you are referring to the light wrapper class, that's what I thought.
I think that's overcooking it a bit. Also, you seem to have ignored the
reason I created the primitive operator functions twice now. They are
needed to take advantage of the hardware with respect to different versions.
At the lowest level, I am generally favouring performance over usage (if it
doesn't cause serious damage to the API).
You're suggesting exactly what I expected half the forum to suggest, and
I'm not against it in any way, but I think it's a layer above this. This
should remain pure, and performance orientated...
There must be a library that provides the opportunity for the best
performance, before sugary libs get all layered over the top, and it's as
I've said twice now, everyone will have a different idea of what that sugar
API will look like, so I feel it should live above this... or perhaps
beside this (in the same file?), but I wouldn't want to remove this API in
favour of a vector 'class'.
| |||
February 05, 2012 Re: std.simd module | ||||
|---|---|---|---|---|
| ||||
Am 05.02.2012, 02:46 Uhr, schrieb Manu <turkeyman@gmail.com>: > On 5 February 2012 03:37, Martin Nowak <dawg@dawgfoto.de> wrote: > >> Am 05.02.2012, 02:13 Uhr, schrieb Manu <turkeyman@gmail.com>: >> >> >> On 5 February 2012 03:08, Martin Nowak <dawg@dawgfoto.de> wrote: >>> >>> Let me restate the main point. >>>> Your approach to a higher level module wraps intrinsics with named >>>> functions. >>>> There is little gain in making simd(AND, f, f2) to and(f, f2) when you >>>> can >>>> easily take this to the level GLSL achieves. >>>> >>>> >>> What is missing to reach that level in your opinion? I think I basically >>> offer that (with some more work) >>> It's not clear to me what you object to... >>> I'm not prohibiting the operators, just adding the explicit functions, >>> which may be more efficient in certain cases (they receive the version). >>> >>> Also the 'gains' of wrapping an intrinsic in an almost identical function >>> are, portability, and potential optimisation for hardware versioning. I'm >>> specifically trying to build something that's barely above the intrinsics >>> here, although a lot of the more arcane intrinsics are being collated into >>> their typically useful functionality. >>> >>> Are you just focused on the primitive math ops, or something broader? >>> >> >> GLSL achieves very clear and simple to write construction and conversion >> of values. >> >> I think wrapping the core.simd vector types in an alias this struct makes >> it a snap >> to define conversion through constructors and swizzling through >> properties/opDispatch. >> Then you can overload operands to do the implementation specific stuff and >> add named methods >> for the rest. >> > > So you are referring to the light wrapper class, that's what I thought. > I think that's overcooking it a bit. Also, you seem to have ignored the > reason I created the primitive operator functions twice now. They are > needed to take advantage of the hardware with respect to different versions. No, I do see you point, but you can do it in the private part of you module as well as in a vector struct as well as in a different module. IMHO it's just a too shallow layer over intrinsics to make a useful phobos module. In fact it's more low level for arithmetic operations than what dmd can already do for SSE vectors. OTOH dot is very good as it is. > At the lowest level, I am generally favouring performance over usage (if it > doesn't cause serious damage to the API). > You're suggesting exactly what I expected half the forum to suggest, and > I'm not against it in any way, but I think it's a layer above this. This > should remain pure, and performance orientated... > There must be a library that provides the opportunity for the best > performance, before sugary libs get all layered over the top, and it's as > I've said twice now, everyone will have a different idea of what that sugar > API will look like, so I feel it should live above this... or perhaps > beside this (in the same file?), but I wouldn't want to remove this API in > favour of a vector 'class'. A lot of people will be happy to use off the shelf primitives. | ||||
February 05, 2012 Re: std.simd module | ||||
|---|---|---|---|---|
| ||||
> There must be a library that provides the opportunity for the best
> performance, before sugary libs get all layered over the top, and it's as
> I've said twice now, everyone will have a different idea of what that sugar
> API will look like, so I feel it should live above this... or perhaps
> beside this (in the same file?), but I wouldn't want to remove this API in
> favour of a vector 'class'.
Having those generalized intrinsics publicly available in same file probably makes sense.
| ||||
February 05, 2012 Re: std.simd module | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Martin Nowak | On 2/4/2012 7:37 PM, Martin Nowak wrote:
> Am 05.02.2012, 02:13 Uhr, schrieb Manu <turkeyman@gmail.com>:
>
>> On 5 February 2012 03:08, Martin Nowak <dawg@dawgfoto.de> wrote:
>>
>>> Let me restate the main point.
>>> Your approach to a higher level module wraps intrinsics with named
>>> functions.
>>> There is little gain in making simd(AND, f, f2) to and(f, f2) when
>>> you can
>>> easily take this to the level GLSL achieves.
>>>
>>
>> What is missing to reach that level in your opinion? I think I basically
>> offer that (with some more work)
>> It's not clear to me what you object to...
>> I'm not prohibiting the operators, just adding the explicit functions,
>> which may be more efficient in certain cases (they receive the version).
>>
>> Also the 'gains' of wrapping an intrinsic in an almost identical function
>> are, portability, and potential optimisation for hardware versioning. I'm
>> specifically trying to build something that's barely above the intrinsics
>> here, although a lot of the more arcane intrinsics are being collated
>> into
>> their typically useful functionality.
>>
>> Are you just focused on the primitive math ops, or something broader?
>
> GLSL achieves very clear and simple to write construction and conversion
> of values.
>
> I think wrapping the core.simd vector types in an alias this struct
> makes it a snap
> to define conversion through constructors and swizzling through
> properties/opDispatch.
> Then you can overload operands to do the implementation specific stuff
> and add named methods
> for the rest.
The GLSL or HLSL sync is fairly nice, but has a few advantages that are harder to take advantage of on PC SIMD:
The hardware that runs HLSL can handle natively operate on data types 'smaller' than the register, either handled natively or by turning all the instructions into a mass of scalar ops that are then run in parallel as best as possible. In SIMD land on CPU's the design is much more rigid: we are effectively stuck using float and float4 data types, and emulating float2 and float3. For a very long time there was not even a a dot product instruction, as from Intel's point of view your data is transposed incorrectly if you needed to do one (plus they have to handle dot2, dot3, dot4 etc).
The cost of this emulation of float2 and float3 types is that we have to put 'some data' in the unused slots of the SIMD register on swizzle operations, which will usually lead to the SIMD instructions generating INF's and NANs in that slot and hurting performance.
The other major problem with the shader swizzle syntax is that it 'doesnt scale'. If you are using a 128 register holding 8 shorts or 16 bytes, what are the letters here? Shaders assume 4 is the limit so you have either xyzw and rgba. Then there are platform considerations (i.e. you can can't swizzle 8 bit data on SSE, you have to use a series of pack|unpack and shuffles, but VMX can easily)
That said: shader swizzle syntax is very nice, it can certainly reduce the amount of code you write by a huge factor (though the codegen is another matter) Even silly tricks with swizzling literals in HLSL are useful like the following code to sum up some numbers:
if (dot(a, 1.f.xxx) > 0)
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply