November 29, 2019
On Friday, 29 November 2019 at 15:56:19 UTC, Jab wrote:
> On Friday, 29 November 2019 at 10:12:32 UTC, Ethan wrote:
>> On Friday, 29 November 2019 at 05:16:08 UTC, Jab wrote:
>>> IIRC GPUs are limited in what they can do in parallel, so if you only need to do 1 things for a specific job the rest of the GPU isn't really being fully utilized.
>>
>> Yeah, that's not how GPUs work. They have a number of shader units that execute on outputs in parallel. It used to be an explicit split between vertex and pixel pipelines in the early days, where it was very easy to underutilise the vertex pipeline. But shader units have been unified for a long time. Queue up a bunch of outputs and the driver and hardware will schedule it properly.
>
> Wasn't really talking about different processes such as that. You can't run 1000s of different kernels in parallel. In graphical terms it'd be like queuing up a command to draw a single pixel. While that single pixel is drawing, the rest of the stack allocated to that kernel would be idle. It wouldn't be able to be utilized by another kernel.

That's not necessarily true anymore. Some more recent GPU architectures have been extended so that different cores can run different kernels. I haven't looked at the details because I haven't had a need for these features yet, but they're exposed and documented.
November 29, 2019
On Friday, 29 November 2019 at 16:40:01 UTC, Gregor Mückl wrote:
> This presentation is of course a simplification of what is going on in a GPU, but it gets the core idea across. AMD and nVidia do have a lot of documentation that goes into some more detail, but at some point you're going to hit a wall.

I think it is a bit interesting that Intel was pushing their Phi solution (many Pentium-cores), but seems to not update it recently. So I wonder if they will be pushing more independent GPU cores on-die (CPU-chip). It would make sense for them to build one architecture that can cover many market segments.

> The convolutions for aurealization are done in the frequency domain. Room impulse responses are quite long (up to several seconds), so a time domain convolutions are barely feasible offline. The only feasible way is to use the convolution theorem, transform everything into frequency space, multiply it there, and transform things back...

I remember reading a paper about casting rays into a 3D model to estimate an acoustic model for the room in the mid 90s. I assume they didn't do it real time.

I guess you could create a psychoacoustic parametric model that works in the time domain... it wouldn't be very accurate, but I wonder if it still could be effective. It is not like Hollywood movies have accurate sound...  We have optic illusions for the visual system, but there are also auditive illusions for the aural systems. E.g. shepard tones that ascend forever, and I've heard the same have been done with motion of sound by morphing the phase of a sound over speakers, that have been carefully placed with an exact distance between them, so that a sound moves to the left forever. I find such things kinda neat... :)

Some electro acoustic composers explore this field, I think it is called spatialization/diffusion? I viewed one of your vidoes and the phasing reminded me a bit of how these composers work. I don't have access to my record collection right now, but there are some soundtracks that are surprisingly spatial. Kind of like audio-versions of non-photorealistic rendering techniques. :-) The only one I can remember right now seems to be Utilty of Space by N. Barrett (unfortunately a short clip):
https://electrocd.com/en/album/2322/Natasha_Barrett/Isostasie

> There's a lot of pitfalls. I'm doing all of the convolution on the CPU because the output buffer is read from main memory by the sound hardware. Audio buffer updates are not in lockstep with screen refreshes, so you can't reliably copy the next audio frame to the GPU, convolve it there and read it back in time because the GPU is on it's own schedule.

Right, why can't audiobuffers be supported in the same way as screenbuffers? Anyway, if Intel decides to integrate GPU cores and CPU cores tighter then... maybe. Unfortunately, Intel tends to focus on making existing apps run faster, not to enable the next-big-thing.

> Perceptually, it seems that you can get away with a fairly low update rate for the reverb in many cases.

If the sound sources are at a distance then there should be some time to work it out? I haven't actually thought very hard on that... You could also treat early and late reflections separately (like in a classic reverb).

I wonder though if it actually has to be physically correct, because, it seems to me that Hollywood movies can create more intense experiences by breaking with physical rules.  But the problem is coming up with a good psychoacoustic model, I guess. So in a way, going with the physical model is easier... it easier to evaluate anyway.

> And those pesky graphics programmers want every ounce of GPU performance all to themselves and never share! ;)

Yes, but maybe the current focus on neural-networks will make hardware vendors focus on reducing latency and thus improve the situation for audio as well. That is my prediction, but I could be very wrong. Maybe they just will insist on making completely separate coprocessors for NN.

November 30, 2019
On Friday, 29 November 2019 at 13:27:17 UTC, Gregor Mückl wrote:
> A complete wall of text that missed the point entirely.

Wow.

Well. I said it would need to be thorough, I didn't say it would need to be filled with lots of irrelevant facts to hide the fact you couldn't give a thorough answer to most things.

1 and 2 can both be answered with "a method of hidden surface removal." A more detailed explanation of 1 is "a method of hidden surface removal using a scalar buffer representing distance of an object from the viewpoint" whereas 2 is "a method of hidden surface removal using a set of planes or a matrix to discard non-visible objects". Othographic is a projectionless frustum, ie nothing is distorted based on distance and there is no field of view. Given your ranting about how hard clipping 2D surfaces is, the fact that you didn't tie these questions together speaks volumes.

3, it's a simplistic understanding at best. Paint calls are no longer based on whether a region on the screen buffer needs to be filled, they're called on each control that the compositor handles whenever a control is dirty.

4 entirely misses the point. Entirely. ImGui retains state behind the scenes, and *then* decides how best to batch that up for rendering. The advantage for using the API is that you don't need to keep state yourself, and zero data is required from disk to layout your UI.

5, pathetic. The thorough answer is "determine the distance of your output pixel from the line and emit a colour accordingly." Which, consequently, is exactly how you'd handle filling regions, your line will have a direction from which you can derive a positive and negative space from. No specific curve was asked for. But especially rich is that the article you linked provides an example of how to render text on the GPU.

(Anyone actually reading: You'd use this methodology these days to build a distance field atlas of glyphs that you'd use to then render strings of text. Any game you see with fantastic quality text these days uses this. Its applications in the desktop space is that you don't necessarily need to re-render your glyph atlas for zooming text or different font sizes. But as others point out: Each operating system has its own text rendering engine that gives distinctive output even with the same typefaces, so while you could homebrew it like this you'd ideally want to let the OS render your text and carry on from there.)

So short story: If I wanted a bunch of barely-relevant facts, I'd read Wikipedia. If I want someone with a thorough understanding of rendering technology and how to apply that to a desktop environment, you'd be well down the bottom of the list.
November 30, 2019
On Friday, 29 November 2019 at 23:55:55 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 29 November 2019 at 16:40:01 UTC, Gregor Mückl wrote:
>> This presentation is of course a simplification of what is going on in a GPU, but it gets the core idea across. AMD and nVidia do have a lot of documentation that goes into some more detail, but at some point you're going to hit a wall.
>
> I think it is a bit interesting that Intel was pushing their Phi solution (many Pentium-cores), but seems to not update it recently. So I wonder if they will be pushing more independent GPU cores on-die (CPU-chip). It would make sense for them to build one architecture that can cover many market segments.

Yep, they are, seems that Intel Xe will succeed Phi:

https://en.wikipedia.org/wiki/Intel_Xe

But probably only for special purpose and enthusiasts, so not very relevant for UI. One interesting point regarding UI is that Linux drivers for AMD APUs support zero-copying using the unified GPU/CPU memory:

https://en.wikipedia.org/wiki/Heterogeneous_System_Architecture#Software_support

Whereas Intel on-die GPUs seem to require copying, but it isn't quite clear to me if that will be faster if it is in cache or not... I suspect it will have to flush first.. :-/

Another interesting fact is that NVIDIA has shown interest in RISC-V:

https://abopen.com/news/nvidia-turns-to-risc-v-for-rc18-research-chip-io-core/

November 30, 2019
On Saturday, 30 November 2019 at 10:12:42 UTC, Ethan wrote:
> (Anyone actually reading: You'd use this methodology these days to build a distance field atlas of glyphs that you'd use to then render strings of text. Any game you see with fantastic quality text these days uses this. Its applications in the desktop space is that you don't necessarily need to re-render your glyph atlas for zooming text or different font sizes.

Distance fields are ok for VR/AR applications, but not accurate enough and wastes way too much resources for a generic UI that covers all platforms and use-cases.

This is where the game-engine mindset goes wrong: A generic portable UI framework should leave as much resources as possible for the application (CPU/GPU/Memory/power). You don't need to do real time scaling at a high frame-rate in a generic UI.

So, even if you can get to a game-engine-like solution that only use 5% of the resources on a high end computer, that still translates to eating up 50% of the resources on the low-end. Which is unacceptable. A UI framework has to work equally well with devices that is nowhere near being able to run games...


November 30, 2019
On Saturday, 30 November 2019 at 11:13:42 UTC, Ola Fosheim Grøstad wrote:
> Another wall of text

Amazing. Every word you just

Ah why am I even bothering, you've proven many times over that you love the sound of you own voice and nothing else. You didn't even read the section of my post you quoted correctly for starters.
November 30, 2019
On Saturday, 30 November 2019 at 10:12:42 UTC, Ethan wrote:
> 4 entirely misses the point. Entirely. ImGui retains state behind the scenes, and *then* decides how best to batch that up for rendering. The advantage for using the API is that you don't need to keep state yourself, and zero data is required from disk to layout your UI.
>

I described the actual Vulkan implementation of ImGUI rendering based on the its source code. And it does exactly what you just said: batching! I even included reasons for why it does what it does the way it does. It's straightforward. Go check the code yourself if you don't believe me.

> 5, pathetic. The thorough answer is "determine the distance of your output pixel from the line and emit a colour accordingly." Which, consequently, is exactly how you'd handle filling regions, your line will have a direction from which you can derive a positive and negative space from. No specific curve was asked for. But especially rich is that the article you linked provides an example of how to render text on the GPU.
>
> (Anyone actually reading: You'd use this methodology these days to build a distance field atlas of glyphs that you'd use to then render strings of text. Any game you see with fantastic quality text these days uses this. Its applications in the desktop space is that you don't necessarily need to re-render your glyph atlas for zooming text or different font sizes. But as others point out: Each operating system has its own text rendering engine that gives distinctive output even with the same typefaces, so while you could homebrew it like this you'd ideally want to let the OS render your text and carry on from there.)
>

I didn't point anybody to distance field based text rendering because it doesn't handle a few things that desktop graphics care about. The main things are font hinting (which depends on font size in relation to screen resolution, so glyphs change *shape* when scaling under hinting to retain readability) and text shaping, so that text glyphs change shape depending on the neighboring glyphs. Text shaping is mandatory for Arabic script, for example. Also, distance field based text rendering is prone to artifacts under magnification: texture interpolation on the distance field texture causes sharp edges to be rounded off (that's even described in the original Valve paper!).

I'll stop this discussion with you here, Ethan. This is becoming unhealthy. We need to take a step back from this.

November 30, 2019
On Saturday, 30 November 2019 at 12:44:51 UTC, Ethan wrote:
> On Saturday, 30 November 2019 at 11:13:42 UTC, Ola Fosheim Grøstad wrote:
>> Another wall of text
>
> Amazing. Every word you just
>
> Ah why am I even bothering, you've proven many times over that you love the sound of you own voice and nothing else. You didn't even read the section of my post you quoted correctly for starters.

I mean you're being pretty condescending and keep assuming people's knowledge of GPUs is 20 years old for some reason. I get that impression from you, you just want to hear your own voice.
November 30, 2019
On Friday, 29 November 2019 at 23:55:55 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 29 November 2019 at 16:40:01 UTC, Gregor Mückl wrote:
>> This presentation is of course a simplification of what is going on in a GPU, but it gets the core idea across. AMD and nVidia do have a lot of documentation that goes into some more detail, but at some point you're going to hit a wall.
>
> I think it is a bit interesting that Intel was pushing their Phi solution (many Pentium-cores), but seems to not update it recently. So I wonder if they will be pushing more independent GPU cores on-die (CPU-chip). It would make sense for them to build one architecture that can cover many market segments.
>

Intel Xe is supposed to be a dedicated GPU. I expect a radical departure from their x86 cores and their previous Xeon Phi chips that used a reduced x86 instruction set. Any successor to that needs more cores, but these can be a lot simpler.

>> The convolutions for aurealization are done in the frequency domain. Room impulse responses are quite long (up to several seconds), so a time domain convolutions are barely feasible offline. The only feasible way is to use the convolution theorem, transform everything into frequency space, multiply it there, and transform things back...
>
> I remember reading a paper about casting rays into a 3D model to estimate an acoustic model for the room in the mid 90s. I assume they didn't do it real time.
>

Back in the 90s they probably didn't. But this is slowly becoming feasible. See e.g.

https://www.oculus.com/blog/simulating-dynamic-soundscapes-at-facebook-reality-labs/

This has been released as part of the Oculus Audio SDK earlier this year.

> I guess you could create a psychoacoustic parametric model that works in the time domain... it wouldn't be very accurate, but I wonder if it still could be effective. It is not like Hollywood movies have accurate sound...  We have optic illusions for the visual system, but there are also auditive illusions for the aural systems. E.g. shepard tones that ascend forever, and I've heard the same have been done with motion of sound by morphing the phase of a sound over speakers, that have been carefully placed with an exact distance between them, so that a sound moves to the left forever. I find such things kinda neat... :)
>

I think what you're getting to is filter chains that emulate reverb, but stay in the time domain. The canonical artificial reverb is the Schroeder reverberator. However, you still need a target RT60 to get the correct reverb tail length.

You can try to derive that time in various ways. Path tracing is one. Maybe you could get away with an estimated reverb time based on the Sabine equation. I've never tried. Microsoft Research is working on an approach that precomputed wave propagation using FDTD and resorts to runtime lookup of these results.

> Some electro acoustic composers explore this field, I think it is called spatialization/diffusion? I viewed one of your vidoes and the phasing reminded me a bit of how these composers work. I don't have access to my record collection right now, but there are some soundtracks that are surprisingly spatial. Kind of like audio-versions of non-photorealistic rendering techniques. :-) The only one I can remember right now seems to be Utilty of Space by N. Barrett (unfortunately a short clip):
> https://electrocd.com/en/album/2322/Natasha_Barrett/Isostasie
>

Spatialization is something slightly different. It refers to the creation of the illusion that sound originate from a specific point or volume in space. That's surprisingly hard to get right and it's an active area of research.

That track is interesting. I don't remember encountering any other purely artistic use of audio spatialization.

>> There's a lot of pitfalls. I'm doing all of the convolution on the CPU because the output buffer is read from main memory by the sound hardware. Audio buffer updates are not in lockstep with screen refreshes, so you can't reliably copy the next audio frame to the GPU, convolve it there and read it back in time because the GPU is on it's own schedule.
>
> Right, why can't audiobuffers be supported in the same way as screenbuffers? Anyway, if Intel decides to integrate GPU cores and CPU cores tighter then... maybe. Unfortunately, Intel tends to focus on making existing apps run faster, not to enable the next-big-thing.
>

A GPU in compute mode doesn't really care about the semantics of the data in the buffers it gets handed. FIR filters should map fine to GPU computing, IIR filters not so much. So, depending on the workload, GPUs can do just fine.

The real problem is one of keeping different sets of deadlines in a realtime system. Graphics imposes one set (the screen refresh rate) and audio imposes another (audio output buffer update rate). The GPU is usually lagging behind the CPU rather than in perfect lockstep and it's typically under high utilization, so it won't have appropriate open timeslots to meet other deadlines in most situations.

>> Perceptually, it seems that you can get away with a fairly low update rate for the reverb in many cases.
>
> If the sound sources are at a distance then there should be some time to work it out? I haven't actually thought very hard on that... You could also treat early and late reflections separately (like in a classic reverb).
>

Early and late reverb need to be treated separately for perceptual reasons. The crux that I didn't mention previously is that you need an initial reverb ready as soon as a sound source starts playing. That can be problem with low update rates in games where sound sources come and go quite often.

> I wonder though if it actually has to be physically correct, because, it seems to me that Hollywood movies can create more intense experiences by breaking with physical rules.  But the problem is coming up with a good psychoacoustic model, I guess. So in a way, going with the physical model is easier... it easier to evaluate anyway.
>

I'm really taking a hint from graphics here: animation studios started to use PBR (that is path tracing with physically plausible materials) same as VFX houses that do photorealistic effects. They want it that way because then the default is being correct. They can always stylize later.

If you're interested, we can take this discussion offline. This thread is the wrong place for this.
November 30, 2019
On Friday, 29 November 2019 at 16:40:01 UTC, Gregor Mückl wrote:
>
> The OpenGL part of my method is for actually propagating sound through the scene and computing the impulse response from that. That is typically so expensive that it's also run asynchronously to the audio processing and mixing. Only the final impulse response is moved to the audio processing thread. Perceptually, it seems that you can get away with a fairly low update rate for the reverb in many cases.


Very interesting, early reflections are hard. Interested to hear your results!