Jump to page: 1 2
Thread overview
std.simd
Mar 15, 2012
Manu
Mar 15, 2012
Robert Jacques
Mar 15, 2012
Manu
Mar 15, 2012
Robert Jacques
Mar 16, 2012
Manu
Mar 16, 2012
David Nadlinger
Mar 16, 2012
Robert Jacques
Mar 16, 2012
Manu
Mar 17, 2012
Robert Jacques
Mar 17, 2012
Manu
Mar 15, 2012
James Miller
Mar 15, 2012
Manu
Mar 15, 2012
James Miller
Mar 15, 2012
F i L
Mar 15, 2012
James Miller
Mar 15, 2012
Manu
March 15, 2012
Hey chaps (and possibly lasses?)

I've been slowly working a std.simd library, the aim of which is to provide
a lowest-level hardware-independent SIMD interface. core.simd implements
SSE currently for x86, other architectures are currently exposed via
gcc.builtins.
The purpose of std.simd, is to be the lowest level API that people make
direct use of, while still having as-close-to-direct-as-possible mapping to
the hardware opcodes, but still being portable. I would expect that custom,
more-feature-rich SIMD/vector/matrix/linear algebra libraries should be
built on top of std.simd in future, that way being portable to as many
systems as possible.

Now I've reached a question in the design of the library, I'd like to take a general consensus.

lowest level vectors are defined by: __vector(type[width])
But core.simd also defines a bunch of handy 'nice' aliases for common
vector types, ie, float4, int4, short8, etc.

I want to claim those names into std.simd. They should be the lowest level
names that people use, and therefore associate with the std.simd
functionality.
I also want to enhance them a bit:
  I want to make them a struct that wraps the primitive rather than an
alias. I understand this single-POD struct will be handled the same as the
POD its self, is that right? If I pass the wrapper struct byval to a
function, it will be passed in a register as it should yeah?
  I then intend to then add CTFE support, and maybe some properties and
opDisplatch bits.

Does this sound reasonable?


March 15, 2012
On Thu, 15 Mar 2012 12:09:58 -0500, Manu <turkeyman@gmail.com> wrote:
> Hey chaps (and possibly lasses?)
>
> I've been slowly working a std.simd library, the aim of which is to provide
> a lowest-level hardware-independent SIMD interface. core.simd implements
> SSE currently for x86, other architectures are currently exposed via
> gcc.builtins.
> The purpose of std.simd, is to be the lowest level API that people make
> direct use of, while still having as-close-to-direct-as-possible mapping to
> the hardware opcodes, but still being portable. I would expect that custom,
> more-feature-rich SIMD/vector/matrix/linear algebra libraries should be
> built on top of std.simd in future, that way being portable to as many
> systems as possible.
>
> Now I've reached a question in the design of the library, I'd like to take
> a general consensus.
>
> lowest level vectors are defined by: __vector(type[width])
> But core.simd also defines a bunch of handy 'nice' aliases for common
> vector types, ie, float4, int4, short8, etc.
>
> I want to claim those names into std.simd. They should be the lowest level
> names that people use, and therefore associate with the std.simd
> functionality.
> I also want to enhance them a bit:
>   I want to make them a struct that wraps the primitive rather than an
> alias. I understand this single-POD struct will be handled the same as the
> POD its self, is that right? If I pass the wrapper struct byval to a
> function, it will be passed in a register as it should yeah?
>   I then intend to then add CTFE support, and maybe some properties and
> opDisplatch bits.
>
> Does this sound reasonable?

This sounds reasonable. However, please realize that if you wish to use the short vector names (i.e. float4, float3, float2, etc) you should support the full set with a decent range of operations and methods. Several people (myself included) have written similar short vector libraries; I think having having short vectors in phobos is important, but having one library provide float4 and another float2 is less than ideal, even if not all of the types could leverage the SMID backend. For myself, the killer feature for such a library would be have the CUDA compatible alignments for the types. (or an equivalent enum to the effect)
March 15, 2012
On 15 March 2012 20:35, Robert Jacques <sandford@jhu.edu> wrote:

> On Thu, 15 Mar 2012 12:09:58 -0500, Manu <turkeyman@gmail.com> wrote:
>
>> Hey chaps (and possibly lasses?)
>>
>> I've been slowly working a std.simd library, the aim of which is to
>> provide
>> a lowest-level hardware-independent SIMD interface. core.simd implements
>> SSE currently for x86, other architectures are currently exposed via
>> gcc.builtins.
>> The purpose of std.simd, is to be the lowest level API that people make
>> direct use of, while still having as-close-to-direct-as-possible mapping
>> to
>> the hardware opcodes, but still being portable. I would expect that
>> custom,
>> more-feature-rich SIMD/vector/matrix/linear algebra libraries should be
>> built on top of std.simd in future, that way being portable to as many
>> systems as possible.
>>
>> Now I've reached a question in the design of the library, I'd like to take a general consensus.
>>
>> lowest level vectors are defined by: __vector(type[width])
>> But core.simd also defines a bunch of handy 'nice' aliases for common
>> vector types, ie, float4, int4, short8, etc.
>>
>> I want to claim those names into std.simd. They should be the lowest level
>> names that people use, and therefore associate with the std.simd
>> functionality.
>> I also want to enhance them a bit:
>>  I want to make them a struct that wraps the primitive rather than an
>> alias. I understand this single-POD struct will be handled the same as the
>> POD its self, is that right? If I pass the wrapper struct byval to a
>> function, it will be passed in a register as it should yeah?
>>  I then intend to then add CTFE support, and maybe some properties and
>> opDisplatch bits.
>>
>> Does this sound reasonable?
>>
>
> This sounds reasonable. However, please realize that if you wish to use the short vector names (i.e. float4, float3, float2, etc) you should support the full set with a decent range of operations and methods. Several people (myself included) have written similar short vector libraries; I think having having short vectors in phobos is important, but having one library provide float4 and another float2 is less than ideal, even if not all of the types could leverage the SMID backend. For myself, the killer feature for such a library would be have the CUDA compatible alignments for the types. (or an equivalent enum to the effect)
>

I can see how you come to that conclusion, but I generally feel that that's
a problem for a higher layer of library.
I really feel it's important to keep std.simd STRICTLY about the hardware
simd operations, only implementing what the hardware can express
efficiently, and not trying to emulate anything else. In some areas I feel
I've already violated that premise, by adding some functions to make good
use of something that NEON/VMX can express in a single opcode, but takes
SSE 2-3. I don't want to push that bar, otherwise the user will lose
confidence that the functions in std.simd will actually work efficiently on
any given hardware.
It's not a do-everything library, it's a hardware SIMD abstraction, and
most functions map to exactly one hardware opcode. I expect most people
will want to implement their own higher level lib on top tbh; almost nobody
will ever agree on what the perfect maths library should look like, and
it's also context specific.


March 15, 2012
On 16 March 2012 08:02, Manu <turkeyman@gmail.com> wrote:
> On 15 March 2012 20:35, Robert Jacques <sandford@jhu.edu> wrote:
>> This sounds reasonable. However, please realize that if you wish to use
>> the short vector names (i.e. float4, float3, float2, etc) you should support
>> the full set with a decent range of operations and methods. Several people
>> (myself included) have written similar short vector libraries; I think
>> having having short vectors in phobos is important, but having one library
>> provide float4 and another float2 is less than ideal, even if not all of the
>> types could leverage the SMID backend. For myself, the killer feature for
>> such a library would be have the CUDA compatible alignments for the types.
>> (or an equivalent enum to the effect)
>
>
> I can see how you come to that conclusion, but I generally feel that that's
> a problem for a higher layer of library.
> I really feel it's important to keep std.simd STRICTLY about the hardware
> simd operations, only implementing what the hardware can express
> efficiently, and not trying to emulate anything else. In some areas I feel
> I've already violated that premise, by adding some functions to make good
> use of something that NEON/VMX can express in a single opcode, but takes SSE
> 2-3. I don't want to push that bar, otherwise the user will lose confidence
> that the functions in std.simd will actually work efficiently on any given
> hardware.
> It's not a do-everything library, it's a hardware SIMD abstraction, and most
> functions map to exactly one hardware opcode. I expect most people will want
> to implement their own higher level lib on top tbh; almost nobody will ever
> agree on what the perfect maths library should look like, and it's also
> context specific.

I think that having the low-level vectors makes sense. Since technically only float4, int4, short8, byte16, actually make sense in the context of direct SIMD, providing other vectors would be straying into vector-library territory, as people would then expect interoperability between them, standard vector/matrix operations, and that could get too high-level. Third-party libraries have to be useful for something!

Slightly off topic questions:
Are you planning on providing a way to fallback if certain operations
aren't supported? Even if it can only be picked at compile time? Is
your work on Github or something? I wouldn't mind having a peek, since
this stuff interests me. How well does this stuff inline? I can
imagine that a lot of the benefit of using SIMD would be lost if every
SIMD instruction ends up wrapped in 3-4 more instructions, especially
if you need to do consecutive operations on the same data.

--
James Miller
March 15, 2012
Great to hear this is coming along. Can I get a link to the (github?) source?

Do the simd functions have fallback functionally for unsupported hardware? Is that planned? Or is that something I'd be writing into my own Vector structures?

Also, I noticed Phobos now includes a "etc" library, do you have plans to eventually make a general purpose higher-level Linear systems library in that?


March 15, 2012
On 16 March 2012 11:14, F i L <witte2008@gmail.com> wrote:
> Great to hear this is coming along. Can I get a link to the (github?)
> source?
>
> Do the simd functions have fallback functionally for unsupported hardware? Is that planned? Or is that something I'd be writing into my own Vector structures?
>
> Also, I noticed Phobos now includes a "etc" library, do you have plans to eventually make a general purpose higher-level Linear systems library in that?

Looks like we have the same questions. Great minds think alike and all that :D

--
Jame sMiller
March 15, 2012
On 15 March 2012 22:27, James Miller <james@aatch.net> wrote:

> On 16 March 2012 08:02, Manu <turkeyman@gmail.com> wrote:
> > On 15 March 2012 20:35, Robert Jacques <sandford@jhu.edu> wrote:
> >> This sounds reasonable. However, please realize that if you wish to use the short vector names (i.e. float4, float3, float2, etc) you should
> support
> >> the full set with a decent range of operations and methods. Several
> people
> >> (myself included) have written similar short vector libraries; I think having having short vectors in phobos is important, but having one
> library
> >> provide float4 and another float2 is less than ideal, even if not all
> of the
> >> types could leverage the SMID backend. For myself, the killer feature
> for
> >> such a library would be have the CUDA compatible alignments for the
> types.
> >> (or an equivalent enum to the effect)
> >
> >
> > I can see how you come to that conclusion, but I generally feel that
> that's
> > a problem for a higher layer of library.
> > I really feel it's important to keep std.simd STRICTLY about the hardware
> > simd operations, only implementing what the hardware can express
> > efficiently, and not trying to emulate anything else. In some areas I
> feel
> > I've already violated that premise, by adding some functions to make good use of something that NEON/VMX can express in a single opcode, but takes
> SSE
> > 2-3. I don't want to push that bar, otherwise the user will lose
> confidence
> > that the functions in std.simd will actually work efficiently on any
> given
> > hardware.
> > It's not a do-everything library, it's a hardware SIMD abstraction, and
> most
> > functions map to exactly one hardware opcode. I expect most people will
> want
> > to implement their own higher level lib on top tbh; almost nobody will
> ever
> > agree on what the perfect maths library should look like, and it's also context specific.
>
> I think that having the low-level vectors makes sense. Since technically only float4, int4, short8, byte16, actually make sense in the context of direct SIMD, providing other vectors would be straying into vector-library territory, as people would then expect interoperability between them, standard vector/matrix operations, and that could get too high-level. Third-party libraries have to be useful for something!
>
> Slightly off topic questions:
> Are you planning on providing a way to fallback if certain operations
> aren't supported?


I think it depends on HOW unsupported they are. If it can be emulated efficiently (and in the context, the emulation would be as efficient as possible on the architecture anyway), then probably, but if it's a problem that should simply be solved another way, I'd rather encourage that with a compile error.

Even if it can only be picked at compile time? Is
> your work on Github or something?


Yup: https://github.com/TurkeyMan/phobos/commits/master/std/simd.d


> I wouldn't mind having a peek, since
> this stuff interests me. How well does this stuff inline?


It inlines perfectly, I pay very close attention to the codegen every single function. And have loads of static branches to select more efficient versions for more recent revisions of the SSE instruction set.


> I can
> imagine that a lot of the benefit of using SIMD would be lost if every
> SIMD instruction ends up wrapped in 3-4 more instructions, especially
> if you need to do consecutive operations on the same data.
>

It will lose 100% of its benefit it it is wrapped up in even ONE function
call, and equally so if the vectors don't pass/return in hardware registers
as they should.
I'm crafting it to have the same performance characteristics as 'int'.


March 15, 2012
On 16 March 2012 00:14, F i L <witte2008@gmail.com> wrote:

> Do the simd functions have fallback functionally for unsupported hardware? Is that planned? Or is that something I'd be writing into my own Vector structures?


I am thinking more and more that it'll have fallback for unsupported
hardware (since the same code will need to run for CTFE), but as well just
pipe unsupported platforms through that code.
But it probably won't be as efficient as possible for those platforms, so
the jury is still out. It might be better to encourage them to do it
properly.


> Also, I noticed Phobos now includes a "etc" library, do you have plans to eventually make a general purpose higher-level Linear systems library in that?
>

I don't plan to. If I end out using one in my personal code, I'll share it though.


March 15, 2012
On 16 March 2012 11:44, Manu <turkeyman@gmail.com> wrote:
> On 15 March 2012 22:27, James Miller <james@aatch.net> wrote:
>>
>> On 16 March 2012 08:02, Manu <turkeyman@gmail.com> wrote:
>> > On 15 March 2012 20:35, Robert Jacques <sandford@jhu.edu> wrote:
>> >> This sounds reasonable. However, please realize that if you wish to use
>> >> the short vector names (i.e. float4, float3, float2, etc) you should
>> >> support
>> >> the full set with a decent range of operations and methods. Several
>> >> people
>> >> (myself included) have written similar short vector libraries; I think
>> >> having having short vectors in phobos is important, but having one
>> >> library
>> >> provide float4 and another float2 is less than ideal, even if not all
>> >> of the
>> >> types could leverage the SMID backend. For myself, the killer feature
>> >> for
>> >> such a library would be have the CUDA compatible alignments for the
>> >> types.
>> >> (or an equivalent enum to the effect)
>> >
>> >
>> > I can see how you come to that conclusion, but I generally feel that
>> > that's
>> > a problem for a higher layer of library.
>> > I really feel it's important to keep std.simd STRICTLY about the
>> > hardware
>> > simd operations, only implementing what the hardware can express
>> > efficiently, and not trying to emulate anything else. In some areas I
>> > feel
>> > I've already violated that premise, by adding some functions to make
>> > good
>> > use of something that NEON/VMX can express in a single opcode, but takes
>> > SSE
>> > 2-3. I don't want to push that bar, otherwise the user will lose
>> > confidence
>> > that the functions in std.simd will actually work efficiently on any
>> > given
>> > hardware.
>> > It's not a do-everything library, it's a hardware SIMD abstraction, and
>> > most
>> > functions map to exactly one hardware opcode. I expect most people will
>> > want
>> > to implement their own higher level lib on top tbh; almost nobody will
>> > ever
>> > agree on what the perfect maths library should look like, and it's also
>> > context specific.
>>
>> I think that having the low-level vectors makes sense. Since technically only float4, int4, short8, byte16, actually make sense in the context of direct SIMD, providing other vectors would be straying into vector-library territory, as people would then expect interoperability between them, standard vector/matrix operations, and that could get too high-level. Third-party libraries have to be useful for something!
>>
>> Slightly off topic questions:
>> Are you planning on providing a way to fallback if certain operations
>> aren't supported?
>
>
> I think it depends on HOW unsupported they are. If it can be emulated efficiently (and in the context, the emulation would be as efficient as possible on the architecture anyway), then probably, but if it's a problem that should simply be solved another way, I'd rather encourage that with a compile error.
>
>> Even if it can only be picked at compile time? Is
>> your work on Github or something?
>
>
> Yup: https://github.com/TurkeyMan/phobos/commits/master/std/simd.d
>
>>
>> I wouldn't mind having a peek, since
>> this stuff interests me. How well does this stuff inline?
>
>
> It inlines perfectly, I pay very close attention to the codegen every single function. And have loads of static branches to select more efficient versions for more recent revisions of the SSE instruction set.
>
>>
>> I can
>> imagine that a lot of the benefit of using SIMD would be lost if every
>> SIMD instruction ends up wrapped in 3-4 more instructions, especially
>> if you need to do consecutive operations on the same data.
>
>
> It will lose 100% of its benefit it it is wrapped up in even ONE function
> call, and equally so if the vectors don't pass/return in hardware registers
> as they should.
> I'm crafting it to have the same performance characteristics as 'int'.

Cool, thanks for answering my questions. Some of what I'm working on atm would benefit from simd.

--
James Miller
March 15, 2012
On Thu, 15 Mar 2012 14:02:15 -0500, Manu <turkeyman@gmail.com> wrote:
> On 15 March 2012 20:35, Robert Jacques <sandford@jhu.edu> wrote:
>> On Thu, 15 Mar 2012 12:09:58 -0500, Manu <turkeyman@gmail.com> wrote:
[snip]
>> This sounds reasonable. However, please realize that if you wish to use
>> the short vector names (i.e. float4, float3, float2, etc) you should
>> support the full set with a decent range of operations and methods. Several
>> people (myself included) have written similar short vector libraries; I
>> think having having short vectors in phobos is important, but having one
>> library provide float4 and another float2 is less than ideal, even if not
>> all of the types could leverage the SMID backend. For myself, the killer
>> feature for such a library would be have the CUDA compatible alignments for
>> the types. (or an equivalent enum to the effect)
>>
>
> I can see how you come to that conclusion, but I generally feel that that's
> a problem for a higher layer of library.

Then you should to leave namespace room for that higher level library.
« First   ‹ Prev
1 2