Jump to page: 1 2
Thread overview
result of FFT
4 days ago
Matthew
4 days ago
Dennis
3 days ago
Matthew
3 days ago
Andy Valencia
4 days ago
Serg Gini
4 days ago
Serg Gini
4 days ago
Timon Gehr
3 days ago
Timon Gehr
2 days ago
claptrap
4 days ago

Hi,

I'm writing a program where I'm trying to decode DTMF tones. I already completed the wave file decoder and I know I'm supposed to use an FFT to transform the samples from time domain to frequency domain but I'm stuck on determining which of the DTMF frequencies are present.

Consider the following, the input is the sound wave samples as an array of floats between -1 and +1.

void decode_sound(float[] samples)
{
	import std.numeric;
	import std.math;
	import std.complex;
	import std.algorithm;

	size_t fft_size = 4096;
	
	auto f = new Fft(fft_size);

	foreach(ch; samples.chunks(fft_size))
	{
		auto res = f.fft(ch);
		// res is an array of 4096 complex numbers
	}
}

I can't figure out how the 4096 results of the FFT relate to the frequencies in the input.

I tried taking the magnitude of each element, I tried taking just the real or imaginary parts. I plotted them but the graph doesn't look how I'm expecting.

What do the 4096 resulting complex numbers represent?
How should I use the result to check whether the 1209Hz, 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the sound?

Thanks,
Matthew

4 days ago

On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:

>

I can't figure out how the 4096 results of the FFT relate to the frequencies in the input.

I tried taking the magnitude of each element,

That's correct!

>

What do the 4096 resulting complex numbers represent?

The magnitude of each element (computed with std.complex.abs) corresponds to the amplitude of each frequency component, the angle in the complex plane represents the phase (computed with std.complex.arg in radians).

The frequencies are all relative to the FFT window. res[0] is 0 Hz, res[1] corresponds to a sine wave that fits 1 cycle inside your window, res[2] is 2 cycles etc. The frequency in Hz depends on your sample rate. If it's 44100, 44100 / 4096 = ~10 so your window fits 10 times in 1 second. That means res[1] is around 10 hz, res[2] 20 hz etc. up to res[4095] which is 40950 hz. Although everything from res[2048] onwards is just a mirrored copy since 44100 samples/s can only capture frequencies up to 22 Khz (for more info search 'Nyquist frequency' and 'aliasing').

>

How should I use the result to check whether the 1209Hz, 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the sound?

The closest bucket to 1209Hz is 1209 * (4096 / 44100) = 112.3, which is not an exact match so it will leak frequencies in all other bins, but it will still mostly contribute to bins 112 and 113 so it's probably good enough to just check those. If you need better frequency resolution you can try applying a 'window function' to reduce spectral leakage or increasing the window size either by including more samples reducing the time resolution, or by padding the window with 0's which will essentially adds interpolated bins.

I haven't programmed pitch detection myself yet (still on my todo list!) so I don't know how much of this is needed for your use case, but you can just start by checking the closest bin and see how far you get. Good luck!

4 days ago

On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:

>

Hi,
What do the 4096 resulting complex numbers represent?
How should I use the result to check whether the 1209Hz, 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the sound?

Thanks,
Matthew

The result of FFT should be the same as in NumPy[1] I suppose.
So D's module doesn't have the functionality that you want, so you need to write it by yourself.

chatGPT showed this python code, that you can port to D:

n = len(data)
fft_result = np.fft.fft(data)
frequencies = np.fft.fftfreq(n, d=1/fs)
magnitude = np.abs(fft_result)
# Frequencies to detect (rounded to avoid precision issues)
dtmf_freqs = [697, 770, 852, 941, 1209, 1336, 1477]

# Frequency resolution
resolution = 5  # ±5 Hz tolerance

# Find peaks near DTMF frequencies
detected_freqs = []
for target in dtmf_freqs:
    idx = np.where((frequencies > target - resolution) & (frequencies < target + resolution))
    if np.any(magnitude[idx] > np.max(magnitude) * 0.1):  # adjust threshold if needed
        detected_freqs.append(target)

print("Detected DTMF frequencies:", detected_freqs)

There is no command for fftfreq in D, but you can write your own implementation, based on the NumPy documentation[2].

Also you can be interested in some other packages and examples:

  • audio-formats[3] for reading and decoding different formats
  • NumPy-similar package example, which is similar to NumPy functionality[4]

From my perspective - solve it in NumPy will be safer approach, but it should be doable in D as well.

References:
[1] https://numpy.org/doc/2.1/reference/generated/numpy.fft.fft.html
[2] https://numpy.org/doc/2.1/reference/generated/numpy.fft.fftfreq.html
[3] https://github.com/AuburnSounds/audio-formats
[4] https://github.com/libmir/numir/blob/master/example/audio_separation/source/app.d

4 days ago
On 7/8/25 20:11, Matthew wrote:
> Hi,
> 
> I'm writing a program where I'm trying to decode DTMF tones. I already completed the wave file decoder and I know I'm supposed to use an FFT to transform the samples from time domain to frequency domain but I'm stuck on determining which of the DTMF frequencies are present.
> 
> Consider the following, the input is the sound wave samples as an array of floats between -1 and +1.
> 
> ```D
> void decode_sound(float[] samples)
> {
>      import std.numeric;
>      import std.math;
>      import std.complex;
>      import std.algorithm;
> 
>      size_t fft_size = 4096;
> 
>      auto f = new Fft(fft_size);
> 
>      foreach(ch; samples.chunks(fft_size))
>      {
>          auto res = f.fft(ch);
>          // res is an array of 4096 complex numbers
>      }
> }
> ```
> 
> I can't figure out how the 4096 results of the FFT relate to the frequencies in the input.
> 
> I tried taking the magnitude of each element, I tried taking just the real or imaginary parts.  I plotted them but the graph doesn't look how I'm expecting.
> 
> What do the 4096 resulting complex numbers represent?
> How should I use the result to check whether the 1209Hz, 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the sound?
> 
> Thanks,
> Matthew

Well, with `N=fft_size`, `fft` computes

```
output[j]=∑ₖe^(-2πijk/N) input[k].
```

and we know that ifft gives:

```
input[k]=∑ⱼe^(2πijk/N) output[j]/N.
```

You are trying to write your input as a linear combination of sine waves, so we interpret `k` as time and `j/N` as frequency:

```
input[k]=∑ⱼe^(2πi k j/N)·output[j]/N.
```

Or, with `f_j[k]=e^(2πi k j/N)`:

```
input[k]=∑ⱼ(output[j]/N)·f_j[k].
```

One complication here is that both `f_j` and `output/N` are complex functions.

Note the relationship:

```
cos(ω·t+φ) = ½·(e^(i(ωt+φ))+e^(-i(ωt+φ))) = ½·(e^(iωt)·e^φ+e^(-iωt)·e^-φ)
```

In our setup, e^(iωt) corresponds to f_j, e^(-iωt) corresponds to f_(N-j), φ corresponds to the phase component of our outputs.

Interestingly, as your `input` signal is real, we actually know that `output[j]` and `output[N-j]` must be conjugate to each other as this is the only way to cancel out the imaginary parts.

This also implies that `output[N/2]` is real, which works out because `f_(N/2)[k]=(-1)^k`.


Putting it in practice:

```d
import std;

alias K=double;

enum sample_rate=44100;

auto fToWave(R)(size_t N,R coefficients_f){
    return iota(N/2+1).map!(j=>
        tuple!("magnitude","frequency","phase")(
            (j==N/2?1.0:2.0)*abs(coefficients_f[j]).re,
            K(j)*sample_rate/N,
            std.complex.log(coefficients_f[j]).im)
        );
}

void main(){
    enum N=4096;
    auto input=wave(N,1336.0f);
    auto output=fft(input);
    auto coefficients_f=output.map!(o=>o/N);
    //auto input2=linearCombination(N,coefficients_f,f(N)); // (slow, just for illustration)
    //assert(isClose(input,input2,1e-7,1e-7));
    auto coefficients_w=fToWave(N,coefficients_f);
    //auto input3=reconstructWave(N,coefficients_w); // (slow, just for illustration)
    //assert(isClose(input,input3,1e-3,1e-3));
    auto sorted=coefficients_w.array.sort.retro;
    foreach(magnitude,frequency,phase;sorted.until!(coeff=>coeff[0]<1e-2)){
 writefln!"%s·cos(2π·%s·t%s%s)"(magnitude,frequency,text(phase).startsWith("-")?"":"+",phase);
    }
}

// helper functions for illustration

auto time(size_t N)=>iota(N).map!(i=>K(i)/sample_rate);

auto wave(size_t N,K frequency,K phase=0.0f){
    K angularVelocity=2*PI*frequency;
    return time(N).map!(t=>cos(angularVelocity*t+phase));
}

/+
// helper functions for slow illustrations:

enum I=Complex!K(0,1);

auto f(size_t N){
    return iota(N).map!(j=>iota(N).map!(k=>exp(2*PI*I*j*k/N)));
}

auto linearCombination(R,RoR)(size_t N,R coefficients,RoR functions){ // compute ∑ⱼcoefficients[j]·functions[j]
    return iota(N).map!(k=>sum(iota(coefficients.length).map!(j=>coefficients[j]*functions[j][k])));
}

auto reconstructWave(R)(size_t N,R coefficients_w){
    auto coefficients=coefficients_w.map!(coeff=>coeff.magnitude);
    auto functions=coefficients_w.map!(coeff=>wave(N,coeff.frequency,coeff.phase));
    return linearCombination(N,coefficients,functions);
}
+/

```

(Note that additional processing may be helpful, as `fft` is based on the assumption that signals repeat with a period of `N`.)
4 days ago

On Tuesday, 8 July 2025 at 19:59:39 UTC, Serg Gini wrote:

>

On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
From my perspective - solve it in NumPy will be safer approach, but it should be doable in D as well.

Or even better - create D library for this :)
It seems (after fast googling) there are plenty examples in the Web

https://github.com/gdereese/dtmf
https://github.com/PlasterPate/DTMF_Decoder
https://github.com/hfeeki/dtmf

3 days ago

On Tuesday, 8 July 2025 at 21:10:35 UTC, Serg Gini wrote:

>

On Tuesday, 8 July 2025 at 19:59:39 UTC, Serg Gini wrote:

>

On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
From my perspective - solve it in NumPy will be safer approach, but it should be doable in D as well.

Or even better - create D library for this :)
It seems (after fast googling) there are plenty examples in the Web

https://github.com/gdereese/dtmf
https://github.com/PlasterPate/DTMF_Decoder
https://github.com/hfeeki/dtmf

I believe dplug exists for this kind of thing.

https://github.com/AuburnSounds/Dplug
https://github.com/AuburnSounds/Dplug/tree/0e96b27db15bd0a3d6c69e4d5827175337310225/fft/dplug/fft

3 days ago

On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:

>

Hi,

I'm writing a program where I'm trying to decode DTMF tones. I already completed the wave file decoder and I know I'm supposed to use an FFT to transform the samples from time domain to frequency domain but I'm stuck on determining which of the DTMF frequencies are present.

Consider the following, the input is the sound wave samples as an array of floats between -1 and +1.

void decode_sound(float[] samples)
{
	import std.numeric;
	import std.math;
	import std.complex;
	import std.algorithm;

	size_t fft_size = 4096;
	
	auto f = new Fft(fft_size);

	foreach(ch; samples.chunks(fft_size))
	{
		auto res = f.fft(ch);
		// res is an array of 4096 complex numbers
	}
}

I can't figure out how the 4096 results of the FFT relate to the frequencies in the input.

I tried taking the magnitude of each element, I tried taking just the real or imaginary parts. I plotted them but the graph doesn't look how I'm expecting.

What do the 4096 resulting complex numbers represent?
How should I use the result to check whether the 1209Hz, 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the sound?

Thanks,
Matthew

I have not been doing DSP stuff since [2018]. I am not sure how DTMF is similar to a regular sound signal, but I remember a few things from sound signals. The FFT result is symmetrical, so you probably only need to take the first 2048 of 4096. A Spectrogram would be useful to visualize the components [2]. If this is not the issue, sorry and please ignore this.

1: https://en.wikipedia.org/wiki/Spectrogram
2: https://medium.com/@adler7210/manually-decoding-dtmf-through-spectrogram-562e4b0b99c3

2018: Kurtulmuş, F., Öztüfekçi, S., & Kavdır, İ. (2018). Classification of chestnuts according to moisture levels using impact sound analysis and machine learning. Journal of Food Measurement and Characterization, 12(4), 2819-2834.

3 days ago
On 7/8/25 23:06, Timon Gehr wrote:
> 
> auto fToWave(R)(size_t N,R coefficients_f){
>      return iota(N/2+1).map!(j=>
>          tuple!("magnitude","frequency","phase")(
>              (j==N/2?1.0:2.0)*abs(coefficients_f[j]).re,
>              K(j)*sample_rate/N,
>              std.complex.log(coefficients_f[j]).im)
>          );
> }

As `f_0` is also real, similar to `f_(N/2)`, this should of course have been:

```d
auto fToWave(R)(size_t N,R coefficients_f){
    return iota(N/2+1).map!(j=>
        tuple!("magnitude","frequency","phase")(
            (j==0||j==N/2?1.0:2.0)*abs(coefficients_f[j]).re,
            K(j)*sample_rate/N,
            std.complex.log(coefficients_f[j]).im)
        );
}
```

3 days ago

On Tuesday, 8 July 2025 at 19:39:37 UTC, Dennis wrote:

>

The magnitude of each element (computed with std.complex.abs) corresponds to the amplitude of each frequency component, the angle in the complex plane represents the phase (computed with std.complex.arg in radians).

This is what I picked up from wikipedia.

>

The frequencies are all relative to the FFT window. res[0] is 0 Hz, res[1] corresponds to a sine wave that fits 1 cycle inside your window, res[2] is 2 cycles etc. The frequency in Hz depends on your sample rate. If it's 44100, 44100 / 4096 = ~10 so your window fits 10 times in 1 second. That means res[1] is around 10 hz, res[2] 20 hz etc. up to res[4095] which is 40950 hz. Although everything from res[2048] onwards is just a mirrored copy since 44100 samples/s can only capture frequencies up to 22 Khz (for more info search 'Nyquist frequency' and 'aliasing').

This was the key bit of information I didn't know.

>

The closest bucket to 1209Hz is 1209 * (4096 / 44100) = 112.3, which is not an exact match so it will leak frequencies in all other bins, but it will still mostly contribute to bins 112 and 113 so it's probably good enough to just check those.

Checking the closest bins to each frequency seems to work well, at least with clean sound from a wav file. It remains to be seen how it fares against noisy real life signals. I'll likely need a window function or interpolation but this is a good start.

void decode_sound(float[] samples)
{
	import std.numeric;
	import std.math;
	import std.complex;
	import std.algorithm;

	size_t fft_size = 4096;
	auto f = new Fft(fft_size);

	foreach(ch; samples.chunks(fft_size))
	{
		auto res = f.fft(ch);
		double[] magnitudes = res.map!(x => x.abs).array;

		auto high = [magnitudes[112], magnitudes[124], magnitudes[137], magnitudes[151]].maxIndex;
		auto low  = [magnitudes[65], magnitudes[72], magnitudes[78], magnitudes[87]].maxIndex;

		// high and low are 0,1,2,3 depending on which DTMF tone pair was detected
	}
}

Thanks,
Matthew

3 days ago

On Wednesday, 9 July 2025 at 19:39:43 UTC, Matthew wrote:

>

Checking the closest bins to each frequency seems to work well, at least with clean sound from a wav file. It remains to be seen how it fares against noisy real life signals. I'll likely need a window function or interpolation but this is a good start.

Many moons ago I had to fix Asterisk's DTMF detection/decode. Ultimately fixes (but not my code) made it into their source repo, and if you need guidance on a very mature DTMF treatment, main/dsp.c in the Asterisk source would be a solid resource.

Andy

« First   ‹ Prev
1 2