May 24, 2019
With linear interpolation you get roughly the same result with floats, a little more efficient too (half the memory and a bit faster):

__gshared float[512+1] QuarterSinTab;

void init(){
    const auto res = QuarterSinTab.length-1;
    for(int i = 0; i < res; i++)
        QuarterSinTab[i] = sin(PI*(0.5*i/cast(double)res));	
	QuarterSinTab[$-1] = sin(PI*0.5);
}

auto fastQuarterLookup(float x){
    const uint mantissa = cast(uint)( (x - floor(x)) * (cast(float)(1U<<24)) );
    const float sign = cast(float)(1 - cast(int)((mantissa>>22)&2));
    const uint phase = (mantissa^((1U<<23)-((mantissa>>22)&1)))&((1U<<23) -1);
    const uint quarterphase = (phase>>13)&511;
    const float frac = cast(float)(phase&((1U<<13)-1))*cast(float)(1.0f/(1U<<13));
    return sign*((1.0f-frac)*QuarterSinTab[quarterphase] + frac*QuarterSinTab[quarterphase+1]);
}
May 24, 2019
On Friday, 24 May 2019 at 13:57:30 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 24 May 2019 at 12:24:02 UTC, Alex wrote:
>> If it truly is a 27x faster then then that is very relevant and knowing why is important.
>>
>> Of course, a lot of that might simply be due to LDC and I wasn't able to determine this.
>
> Just one more thing you really ought to consider:
>
> It isn't obvious that a LUT using double will be more precise than computing sin using single precision float.
>
> So when I use single precision float with ldc and "-O3 -ffast-math" then I get roughly 300ms. So in that case the LUT is only 3 times faster.
>
> Perhaps not worth it then. You might as well just use float instead of double.
>
> The downside is that -ffast-math makes all your computations more inaccurate.
>
> It is also possible that recent processors can do multiple sin/cos as simd.
>
> Too many options… I know.

The thing is, the LUT can have as much precision as one wants. One could even spend several days calculating it then loading it from disk.

I'm not sure what the real precision of the build in functions are but it shouldn't be hard to max out a double using standard methods(even if slow, but irrelevant after the LUT has been created).

Right now I'm just using the built ins... Maybe later I'll get back around to all this and make some progress.

May 24, 2019
On Friday, 24 May 2019 at 17:04:33 UTC, Alex wrote:
> I'm not sure what the real precision of the build in functions are but it shouldn't be hard to max out a double using standard methods(even if slow, but irrelevant after the LUT has been created).

LUTs are primarily useful when you use sin(x) as a signal or when a crude approximation is good enough.

One advantage of a LUT is that you can store a more complex computation than the basic function.  Like a filtered square wave.


May 25, 2019
On Friday, 24 May 2019 at 17:04:33 UTC, Alex wrote:
> On Friday, 24 May 2019 at 13:57:30 UTC, Ola Fosheim Grøstad wrote:
>> On Friday, 24 May 2019 at 12:24:02 UTC, Alex wrote:
>>> If it truly is a 27x faster then then that is very relevant and knowing why is important.
>>>
>>> Of course, a lot of that might simply be due to LDC and I wasn't able to determine this.
>>
>> Just one more thing you really ought to consider:
>>
>> It isn't obvious that a LUT using double will be more precise than computing sin using single precision float.
>>
>> So when I use single precision float with ldc and "-O3 -ffast-math" then I get roughly 300ms. So in that case the LUT is only 3 times faster.
>>
>> Perhaps not worth it then. You might as well just use float instead of double.
>>
>> The downside is that -ffast-math makes all your computations more inaccurate.
>>
>> It is also possible that recent processors can do multiple sin/cos as simd.
>>
>> Too many options… I know.
>
> The thing is, the LUT can have as much precision as one wants. One could even spend several days calculating it then loading it from disk.

With linear interpolation of what is effectively a discrete time signal, you get an extra 12db signal to noise ratio each time you 2x oversample the input. So basically if you start with LUT of 4 samples, that should give you about 8dbs at baseline, each time you double the length you get an extra 12dbs.

Or in simpler terms double the table length get 2 bits less error. So 8192 table size should give you approximately 24 bits clear of errors.

But be aware that error is relative to the maximum magnitude of the signal in the table.


May 25, 2019
On Friday, 24 May 2019 at 17:40:40 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 24 May 2019 at 17:04:33 UTC, Alex wrote:
>> I'm not sure what the real precision of the build in functions are but it shouldn't be hard to max out a double using standard methods(even if slow, but irrelevant after the LUT has been created).
>
> LUTs are primarily useful when you use sin(x) as a signal or when a crude approximation is good enough.
>
> One advantage of a LUT is that you can store a more complex computation than the basic function.  Like a filtered square wave.

Its pretty common technique in audio synthesis. What i've done in the past is store a table of polynomial segments that were optimised with curve fitting. It's bit extra work to calculate the the waveform but actual works out faster than having huge LUTs since you're typically only producing maybe 100 samples in each interrupt callback, so it gets pretty likely that your LUT is pushed into slower cache memory in between calls to generate the the audio.
May 25, 2019
On Saturday, 25 May 2019 at 09:04:31 UTC, NaN wrote:
> Its pretty common technique in audio synthesis.

Indeed. CSound does this.

> What i've done in the past is store a table of polynomial segments that were optimised with curve fitting.

That's an interesting solution, how do you avoid higher order discontinuities between segments? Crossfading? Or maybe it wasn't audible.


May 25, 2019
On Saturday, 25 May 2019 at 09:04:31 UTC, NaN wrote:
> It's bit extra work to calculate the the waveform but actual works out faster than having huge LUTs since you're typically only producing maybe 100 samples in each interrupt callback

Another hybrid option when filling a buffer might be to fill two buffers with an approximation that is crossfaded between the end-points.

A bit tricky, but say you have a taylor polynomial (or even a recurrence relation) that  is "correct" near the beginning of the buffer, and another one that is correct near the end of the buffer.

Then you could fill two buffers from each end and cross fade between the buffers. You might get some phase-cancellation errors and phasing-like distortion though.

May 25, 2019
On Saturday, 25 May 2019 at 09:52:22 UTC, Ola Fosheim Grøstad wrote:
> On Saturday, 25 May 2019 at 09:04:31 UTC, NaN wrote:
>> Its pretty common technique in audio synthesis.
>
> Indeed. CSound does this.
>
>> What i've done in the past is store a table of polynomial segments that were optimised with curve fitting.
>
> That's an interesting solution, how do you avoid higher order discontinuities between segments? Crossfading? Or maybe it wasn't audible.

I used an evolutionary optimisation algorithm on the table all at once. So you do a weighted sum of max deviation, and 1st and 2nd order discontinuity at the joins. And minimise that across the table as a whole. It seemed you could massively overweight the discontinuities without really affecting the curve fitting that much. So perfect joins only cost a little extra deviation in the fit of the polynomial.


May 25, 2019
On Wednesday, 22 May 2019 at 00:22:09 UTC, JS wrote:
> I am trying to create some fast sin, sinc, and exponential routines to speed up some code by using tables... but it seems it's slower than the function itself?!?
>
> [...]

I'll just leave this here:

https://www.dannyarends.nl/Using%20CTFE%20in%20D%20to%20speed%20up%20Sine%20and%20Cosine?

May 25, 2019
On Saturday, 25 May 2019 at 12:51:20 UTC, NaN wrote:
> I used an evolutionary optimisation algorithm on the table all at once. So you do a weighted sum of max deviation, and 1st and 2nd order discontinuity at the joins. And minimise that across the table as a whole. It seemed you could massively overweight the discontinuities without really affecting the curve fitting that much. So perfect joins only cost a little extra deviation in the fit of the polynomial.

Wow, it is pretty cool that you managed to minimize 2nd order discontinuity like that! Is there a paper describing the optimisation algorithm you used?