July 10, 2007
Craig Black wrote:
> "janderson" <askme@me.com> wrote in message news:f6rkr3$nbm$1@digitalmars.com...
>> Craig Black wrote:
>>> Lately I've been learning about GPU's, shaders, and general purpose GPU computation. I'm still just getting introduced to it and haven't gotten very deep yet, so there's probably a few of you out there who know a lot more than me about this subject.  It's probably  been discussed before in this news group, but I've been thinking about how important GPU's will be in the coming years.
>>>
>>> For those who may not know, GPU performance has been improving at a rate faster than Moore's Law.  Current high-end GPU's have many times more floating point performance than than high-end CPU's.  The latest GPU's from NVidia and AMD/ATI brag a massive 500 and 400 single-precision gigaflops respectively.  Traditionally GPU's were used for graphics only. But recently GPU's have been used for general purpose computation as well.  The newer GPU's are including general purpose computation in design considerations.
>>>
>>> The problem with GPU programming is that computation is radically different from a conventional CPU.  Because of the way the hardware is designed, there are more restrictions for GPU programs.  For example, there are no function pointers, no virtual methods, and hence no OOP. There is no branching. Because of this conditional statements are highly inefficient and should be avoided.  Because of these constraints, special purpose programming languages are required to program natively on a GPU. These special purpose programming languages are called shading languages, and include Cg, HLSL and GLSL.
>>>
>>> GPU computation is performed on data streams in parallel, where operations on each item in the stream is independent.  GPU's work most effectively on large arrays of data.  The proposed "array operations" feature in D has been discussed a lot.  It is even mentioned in the "future directions" page on the D web site.  However, I don't remember the details of the array operations feature.  What are the design goals of the this feature?  To leverage multi-cores and SSE?  Are GPU's also a consideration?
>>>
>>> There are already C++ libraries available that provide general purpose computation using GPU's without shader programming.  When it comes time to implement array operations in D, I feel that GPU's should be the primary focus.  (However, I'm not saying that multicore CPU's or SSE should be ignored.)  Design goals should be performance, simplicity, and flexibility.
>>>
>>> Thoughts?
>>>
>>> -Craig
>> While you can do much of the CPU work with the GPU I think in its present state it requires a very custom program.  For instance, there are band-width issues which means you don't get the results back till the next frame.  Therefore your program has to be designed to work in a particular way.
>>
>> Secondly the GPU is being used for other things, so the time at which you use these operations is critical, it can't just happen at any stage otherwise you blow away the current state of the GPU.
>>
>> Thirdly, you can only run a couple of these huge processing operations on the GPU at once, or come with a smart way to put them all into the same operation.  Therefore usability of this is limited.
>>
>> Anyway I think it seems more like an API sort of thing so that the user has control over when the GUI is used.
>>
>> That's my present understanding.  Maybe things will change when AMD combines the GPU into the CPU.
> 
> You are right, but don't underplay the importance of GPU computation.   GPU's
> are becoming more and more powerful, so they will be able to increasingly handle more and more general purpose computation.  Modern games use GPU's for more than just rendering, and it has become a design goal of many game engines to transfer more of the workload to the GPU.  Some domains, such as scientific simulations, don't care about graphics at all and would rather use the GPU for computation exclusively.

Having written a couple of these in the past I think they still only apply to specific problems where you are doing the same calculation loads of times like in physics, collision detection of particle effects.    Also you often endup doing more calculations on the GPU then u would on the CPU but because its still so much faster.  You can't traverse complex structures like deep binary trees very easily at least.  The code takes a frame to get back and is normally computed in a particular part of the frame cycle (unless you have 2 GPUs), therefore its really very specific to the problem that is trying to be solved.

Maybe things have changed since I last worked on this stuff. Man the amount of times I've had to re-learn Directx.  I'm sure this thread will be out of date by the time i press the send button ;)

> 
> At any rate I think we should keep our eyes peeled for opportunities to leverage this capability.  I personally am trying to learn more about it myself.

I agree.

> 
> -Craig
> 
> -Craig 
> 
> 
July 10, 2007
Lutger wrote:
> Craig Black wrote:
>> Another idea.  What if this could be done using the recently added mixin feature?  Then you could use a shading language directly rather than trying to convert D code to a shading language.  I've never used mixins, so I can't even think of how the syntax would look.  Is this a practical idea?
>>
>> -Craig
> 
> A solution like Blade* should be possible, especially when D gets some macro syntactic sugar, but somewhat involved probably.

It's certainly possible. It might not be too terrible, actually. I'm rewriting BLADE for the D conference; it now has syntax like:

import blade;

void main()
{
real result[]; double b[];
alias b c;
const x = 8.9884;
real y = 234543;
mixin(VEC("result+=b*3.5645 - x*43645*3124.543 +c[2..18]*y"));
}

By analyzing the complexity of the expression, it's possible to determine if
it can easily be done with SSE1, SSE2, or x87, and if not, just do it with straight D instead. This is nice, because it means you don't have to worry about  tricky cases. Using a mixin instead of operator overloading means you can detect aliasing. The same thing could be done for a GPU.
In fact, the examples in NVIDIA's CUDA programming guide are perfect for BLADE, except that they aren't in C!

Do you know where I could find some examples of using a GPU to do calculations (using standard C)?
I think it would be easy to get BLADE to do it for a few simple cases.
July 10, 2007
On Mon, 09 Jul 2007 19:43:27 -0500, Craig Black wrote:

> Interesting syntax.  I like it, but in order to support GPU's there would have to be some way that the programmer specified GPU or CPU computation. Perhaps a new keyword or pragma?
> 
The purpose of the syntax is to have a hardware independent way to specify large array calculations.

If you specify in the code if it should be CPU or GPU the
you have lost this Independence. so, I think it should be automatic by the
compiler or a compiler switch so if you get new hardware you just
recompile your code.




> I'm fuzzy on the details of how this could actually be done on the GPU.  The compiler would have to somehow generate either high-level or low-level shading language routines for each statement.  That means each function called would also have to be converted to GPU code.
> 
> There would obviously be some restrictions as to what kinds of language features could be used when the GPU option is enabled.  For example, no function pointers, no heap allocations, etc, etc.  I'm sure there would end up being a lot.

Yes, that is why we need a special syntax and can not just use the foreach
statement.


> I suppose special floating point types could automatically be mapped.  For example the compiler could automatically convert the static array float[3] to the shading language type float3.
> 
> -Craig
> 
> "Knud Soerensen" <4tuu4k002@sneakemail.com> wrote in message news:f6uiqk$2qad$2@digitalmars.com...
>> Take a look on the vectorization suggestion on the wish list
>> http://all-technology.com/eigenpolls/dwishlist/index.php?it=10
>> this notation lets you specify vector calculation in such
>> a way that the compiler can optimize them and let them run on the GPU
>> if that is preferred.
July 10, 2007
Don Clugston wrote:
> Lutger wrote:
>> Craig Black wrote:
>>> Another idea.  What if this could be done using the recently added mixin feature?  Then you could use a shading language directly rather than trying to convert D code to a shading language.  I've never used mixins, so I can't even think of how the syntax would look.  Is this a practical idea?
>>>
>>> -Craig
>>
>> A solution like Blade* should be possible, especially when D gets some macro syntactic sugar, but somewhat involved probably.
> 
> It's certainly possible. It might not be too terrible, actually. I'm rewriting BLADE for the D conference; it now has syntax like:
> 
> import blade;
> 
> void main()
> {
> real result[]; double b[];
> alias b c;
> const x = 8.9884;
> real y = 234543;
> mixin(VEC("result+=b*3.5645 - x*43645*3124.543 +c[2..18]*y"));
> }
> 
> By analyzing the complexity of the expression, it's possible to determine if
> it can easily be done with SSE1, SSE2, or x87, and if not, just do it with straight D instead. This is nice, because it means you don't have to worry about  tricky cases. Using a mixin instead of operator overloading means you can detect aliasing. The same thing could be done for a GPU.
> In fact, the examples in NVIDIA's CUDA programming guide are perfect for BLADE, except that they aren't in C!
> 
> Do you know where I could find some examples of using a GPU to do calculations (using standard C)?
> I think it would be easy to get BLADE to do it for a few simple cases.

Have you looked at GPGPU.org?
They have forums there too.  Don't know if they're still active, but used to be several of the pioneers of GPGPU would hang out there.

But is it that hard to translate the core bits of the code samples back into C?

--bb
July 10, 2007
"Knud Soerensen" <4tuu4k002@sneakemail.com> wrote in message news:f6vb8r$2qad$4@digitalmars.com...
> On Mon, 09 Jul 2007 19:43:27 -0500, Craig Black wrote:
>
>> Interesting syntax.  I like it, but in order to support GPU's there would have to be some way that the programmer specified GPU or CPU computation. Perhaps a new keyword or pragma?
>>
> The purpose of the syntax is to have a hardware independent way to specify large array calculations.
>
> If you specify in the code if it should be CPU or GPU the
> you have lost this Independence. so, I think it should be automatic by the
> compiler or a compiler switch so if you get new hardware you just
> recompile your code.

I do not believe a compiler switch would not provide enough granularity. The programmer should have complete control over what hardware would be used for each vectorized operation.  It is good practice to load the CPU/GPU equally, and this can't be achieved without benchmarking the code first. The compiler would be powerless to make the correct decision, and a compiler switch would make this balancing act very difficult.


>> I'm fuzzy on the details of how this could actually be done on the GPU.
>> The
>> compiler would have to somehow generate either high-level or low-level
>> shading language routines for each statement.  That means each function
>> called would also have to be converted to GPU code.
>>
>> There would obviously be some restrictions as to what kinds of language
>> features could be used when the GPU option is enabled.  For example, no
>> function pointers, no heap allocations, etc, etc.  I'm sure there would
>> end
>> up being a lot.
>
> Yes, that is why we need a special syntax and can not just use the foreach statement.

Yes but even functions that are called in this special syntax would have to be evaluated to ensure that the rules are being obeyed.

-Craig


July 10, 2007
I don't know about standard C, but I can recommend the book GPU Gems 2. It's full of content.  A large portion of the book is dedicated to GPGPU stuff.

-Craig

"Don Clugston" <dac@nospam.com.au> wrote in message news:f6vaei$25hu$1@digitalmars.com...
> Lutger wrote:
>> Craig Black wrote:
>>> Another idea.  What if this could be done using the recently added mixin feature?  Then you could use a shading language directly rather than trying to convert D code to a shading language.  I've never used mixins, so I can't even think of how the syntax would look.  Is this a practical idea?
>>>
>>> -Craig
>>
>> A solution like Blade* should be possible, especially when D gets some macro syntactic sugar, but somewhat involved probably.
>
> It's certainly possible. It might not be too terrible, actually. I'm rewriting BLADE for the D conference; it now has syntax like:
>
> import blade;
>
> void main()
> {
> real result[]; double b[];
> alias b c;
> const x = 8.9884;
> real y = 234543;
> mixin(VEC("result+=b*3.5645 - x*43645*3124.543 +c[2..18]*y"));
> }
>
> By analyzing the complexity of the expression, it's possible to determine
> if
> it can easily be done with SSE1, SSE2, or x87, and if not, just do it with
> straight D instead. This is nice, because it means you don't have to worry
> about tricky cases. Using a mixin instead of operator overloading means
> you can detect aliasing. The same thing could be done for a GPU.
> In fact, the examples in NVIDIA's CUDA programming guide are perfect for
> BLADE, except that they aren't in C!
>
> Do you know where I could find some examples of using a GPU to do
> calculations (using standard C)?
> I think it would be easy to get BLADE to do it for a few simple cases.


July 10, 2007
"Craig Black" <craigblack2@cox.net> wrote in message news:f6p829$1l06$1@digitalmars.com...
<snip>
> The proposed "array operations" feature in D has been discussed a lot.  It is even mentioned in the "future directions" page on the D web site. However, I don't remember the details of the array operations feature.

Basically, the idea was to allow such things as

   int[] x, y, z;
   ....
   z = x + y;
   y = x * 3;

However, it was too ill-defined to implement, and this is one of the reasons it was withdrawn.  I proposed a rewrite of it a while ago:

http://www.digitalmars.com/d/archives/digitalmars/D/16647.html

> What are the design goals of the this feature?

To enable vector arithmetic expressions to be intuitively notated.

> To leverage multi-cores and SSE?  Are GPU's also a consideration?

That's another potential benefit that has been considered.

Stewart. 

1 2
Next ›   Last »