November 29, 2015
On Sunday, 29 November 2015 at 13:21:53 UTC, Ola Fosheim Grøstad wrote:
>
> I remember the electro acoustic people here in Oslo (NoTAM) doing live pitchtracking 20 years ago, I believe they used an envelope follower of some sort. Just measuring the distance between the tops? That was to have the "electronic accompaniment" follow the lead of the vocalist I believe.
>

The hard thing about live pitch-tracking is getting the minimal latency keeping reliability. It's not that simple. You also want "voicedness", which is more challenging than pitch.

>> But it has different latency characteristics, overlapped FFT easily goes into the 20/30 ms.
>
> It depends on how low down in frequency you need to go, a female voice is at 160 hz and up, and a child is at 250hz and up. In that frequency range one could do better. And at the cost of complexity you could use two FFTs, one for the lower range and another for the higher range.

Thought about it but a singer could usually cover a range of 3 octaves, even if very few song mandate it: https://www.youtube.com/watch?v=cveoHrMyUDs&t=41s.

A man voice could go as low as say 40hz.
Only if you would need only one period to guess the pitch (unlikely), that's already 25ms latency guaranteed, and that's before you introduce FFT overlap! (if you want eg. to track harmonics, get formants through linear prediction, etc).

I've not tried the multiple FFT, I was worried pitch would lag oddly when changing FFT size. Perhaps it could work.


> Or maybe one can use wavelets, but I don't know much about wavelet transforms (they don't map to cosine, so imagine it will be much harder to do well).

I have trouble to imagine the reconstruction so don't use them (well, I did once, but didn't _get_ it).
November 29, 2015
On Sunday, 29 November 2015 at 15:13:41 UTC, Guillaume Piolat wrote:
> The hard thing about live pitch-tracking is getting the minimal latency keeping reliability. It's not that simple. You also want "voicedness", which is more challenging than pitch.

I think they developed it for a specific work, but IIRC it was challenging to get it accurate.

I don't now much about current pitch trackers, but I think you can do a high quality one for voice using filterbanks. Some people do resynthesis that way (and well, that is just an alternative to FFT after all). That's pretty much how cochlea works, I think, by having overlapping frequency bands. But it probably is hard to get right. I assume you can make a better pitch tracker that is specialized for voice by thinking about FoF synthesis, the sound of the voice is really a sequence of bursts of roughly the same shape (like granular synthesis in a way) and you should be able to figure out some statistical relationship between formants and how they change with pitch. I'm not saying it is easy. Probably a lot published on this though.

I don't know what "voicedness" is? You mean things like vibrato?

> I've not tried the multiple FFT, I was worried pitch would lag oddly when changing FFT size. Perhaps it could work.

I think it should work in theory, but you'll probably get some of complications due to the distortions that comes with the windowing function etc? And making a real time phase vocoder is more work than it looks like on paper... Obviously doable, but there are some "missing bits" in the theoretical descriptions. I guess that's why IRCAM can sell licenses to superVP. :)

>> Or maybe one can use wavelets, but I don't know much about wavelet transforms (they don't map to cosine, so imagine it will be much harder to do well).
>
> I have trouble to imagine the reconstruction so don't use them (well, I did once, but didn't _get_ it).

Yeah, I don't know. Still, in the past few years it has been popular with distorted and glitchy sounds, so maybe one could do some cool distorted effects with it.

November 29, 2015
On Sunday, 29 November 2015 at 15:34:34 UTC, Ola Fosheim Grøstad wrote:
>
> I don't now much about current pitch trackers, but I think you can do a high quality one for voice using filterbanks. Some people do resynthesis that way (and well, that is just an alternative to FFT after all).

You are precisely right, if you don't need reconstruction nothing forces you to use the FFT!
There is also a sample-wise FFT I've came across, which is expensive but avoids chunking.


> I assume you can make a better pitch tracker that is specialized for voice by thinking about FoF synthesis, the sound of the voice is really a sequence of bursts of roughly the same shape (like granular synthesis in a way) and you should be able to figure out some statistical relationship between formants and how they change with pitch.

Looking for similar grains is the idea behind the popular auto-correlation pitch detection methods. Require two periods else no autocorrelation peak though. The rumor says that the non-realtime Autotune works with that, along with many modern pitch detection methods.


> I'm not saying it is easy. Probably a lot published on this though.
>
> I don't know what "voicedness" is? You mean things like vibrato?

vibrato is the pitch variation that occur when the larynx is well relaxed.

voicedness is the difference between sssssss(unvoiced) and zzzzzz (voiced).
A phonem is voiced when there is periodic glottal closure and openings.

When the sound isn't voiced, there is no period. There isn't a "pitch" there. So pitch detection tend to come with a confidence measure.

The devil in that is that voicedness itself is half a lie, or let say a leaky abstraction, it breaks down for distorted vocals.

> I guess that's why IRCAM can sell licenses to superVP. :)

Their paper on that topic are interesting, they group spectral peaks by formants and move them together.




November 29, 2015
On Sunday, 29 November 2015 at 16:15:32 UTC, Guillaume Piolat wrote:
> There is also a sample-wise FFT I've came across, which is expensive but avoids chunking.

Hm, I don't know what that is :).

> Looking for similar grains is the idea behind the popular auto-correlation pitch detection methods. Require two periods else no autocorrelation peak though. The rumor says that the non-realtime Autotune works with that, along with many modern pitch detection methods.

I thought they used Laroche and Dolson's FFT based one combined with a peak detector, but maybe that was the real time version.

There are other full spectral resynthesis methods that throw away phase information and represent each spectral components as a bandpass filter of noise. That is rather expressive since you can do morphing with it. (Like you can do with images). But since you throw away phase information I guess some attacks suffer, so you have to special case the attacks as "residue" samples that are left in the time domain (the difference between what you can represent as spectral components and the left over bits).

>> I don't know what "voicedness" is? You mean things like vibrato?
>
> vibrato is the pitch variation that occur when the larynx is well relaxed.

Yes, so that will generate sidebands in the frequency spectrum, like FM synthesis, right? So in order to pick up fast vibrato I would assume you would also need to do analysis of the spectrum, or?

> voicedness is the difference between sssssss(unvoiced) and zzzzzz (voiced).
> A phonem is voiced when there is periodic glottal closure and openings.

Ah! In the 90s I read a paper in Computer Music journal where they did song synthesis by emulating the vocal tract as a "physical" filter-model. I'm not sure if they used FoF for generating the sound. I think there was a vinyl flexi disc with it too. :-) I have it somewhere...

You might find it interesting.

> When the sound isn't voiced, there is no period. There isn't a "pitch" there. So pitch detection tend to come with a confidence measure.

So it is a problem for real time, but in non-real time you can work your way backwards and fill in the missing parts before doing resynthesis? I guess?

> The devil in that is that voicedness itself is half a lie, or let say a leaky abstraction, it breaks down for distorted vocals.

Right. You have a lot of these problems in sound analysis. Like sound separation. The brain is so impressive. I still have problem understanding how we can hear 3D with two ears. Like distinguishing above and below. I understand the basics of it, but it is still impressive when you try to figure out _how_.

>> I guess that's why IRCAM can sell licenses to superVP. :)
>
> Their paper on that topic are interesting, they group spectral peaks by formants and move them together.

I've read the Laroche and Dowson paper in detail, and more or less know it by heart now, but maybe you are thinking about some other paper? Their paper was good on the science part, but they leave the artistic engineering part open to the reader... ;-) More insight on the artistic engineering part is most welcome!!



November 29, 2015
On Sunday, 29 November 2015 at 17:23:20 UTC, Ola Fosheim Grøstad wrote:


> Yes, so that will generate sidebands in the frequency spectrum, like FM synthesis, right?

It won't because the vibrato frequency is too low, around 10hz.

> So in order to pick up fast vibrato I would assume you would also need to do analysis of the spectrum, or?

Or just be fast to react, else this vibrato amplitude would be smoothed out.


> So it is a problem for real time, but in non-real time you can work your way backwards and fill in the missing parts before doing resynthesis? I guess?

In non-realtime, everything is possible, you can have a bigger analysis window and much less problems.
For example, some use dynamic programming for pitch detection.

> I've read the Laroche and Dowson paper in detail, and more or less know it by heart now, but maybe you are thinking about some other paper? Their paper was good on the science part, but they leave the artistic engineering part open to the reader... ;-) More insight on the artistic engineering part is most welcome!!

Whatever the method, it's important to spend a lot of time in tuning.


November 29, 2015
On Sunday, 29 November 2015 at 17:36:47 UTC, Guillaume Piolat wrote:
> On Sunday, 29 November 2015 at 17:23:20 UTC, Ola Fosheim Grøstad wrote:
>
>
>> Yes, so that will generate sidebands in the frequency spectrum, like FM synthesis, right?
>
> It won't because the vibrato frequency is too low, around 10hz.

Right, so it will only be a serious problem if one went beyond 2048 samples FFT, which isn't really necessary.

This is an interesting swedish paper on vibrato btw:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.436.4888&rep=rep1&type=pdf

December 17, 2015
On Thursday, 26 November 2015 at 15:48:48 UTC, Guillaume Piolat wrote:
> OT: Readers of this NG probably know me under the name "ponce", however over the year I was made aware that it's an english swear word so I'll post under my IRL name from now on.
>
> [...]

Hi, Is there a tutorial on how to design VST's using D? I would like to get into vst programming using D but I've found little useful information.

Thanks
December 17, 2015
On Thursday, 17 December 2015 at 18:17:41 UTC, Thomas wrote:
> On Thursday, 26 November 2015 at 15:48:48 UTC, Guillaume Piolat wrote:
>> OT: Readers of this NG probably know me under the name "ponce", however over the year I was made aware that it's an english swear word so I'll post under my IRL name from now on.
>>
>> [...]
>
> Hi, Is there a tutorial on how to design VST's using D? I would like to get into vst programming using D but I've found little useful information.
>
> Thanks

- checkout the dplug repositery
- make sure you have DUB and DMD
  * On Windows, build the example in examples/distort with DUB by typing "dub"
  * For Mac VST bundles, you will need to build the "release" tool in tools/release
    then type "release"
- to make a new VST, copy-paste the examples/distort directory
February 09, 2016
On Thursday, 17 December 2015 at 18:17:41 UTC, Thomas wrote:
> On Thursday, 26 November 2015 at 15:48:48 UTC, Guillaume Piolat wrote:
>> OT: Readers of this NG probably know me under the name "ponce", however over the year I was made aware that it's an english swear word so I'll post under my IRL name from now on.
>>
>> [...]
>
> Hi, Is there a tutorial on how to design VST's using D? I would like to get into vst programming using D but I've found little useful information.
>
> Thanks

I've written a tutorial to hopefully make it easier.

http://www.auburnsounds.com/blog/2016-02-08_Making-a-Windows-VST-plugin-with-D.html
1 2 3 4 5 6
Next ›   Last »