Thread overview
Examples of low level control (synthesis) of audio ? - speakers, HDMI audio
Aug 22
Zoadian
Aug 22
monkyyy
Aug 24
Dukc
August 22

Are there good explanations or examples of code for low level control of audio in Dlang?

I assume that all high level audio synthesis programs (music, speech synthesis, etc.) rely on some common method of outputing sound. I also assume the output to analog speakers must be different code than output over HDMI. Is that correct?

I visualize the final output as data that stores a waveform as a sequence of values that represent amplitudes. Is that low level of representation necessary? - or do modern speakers and audio devices have built-in programs that accept more concise formats?

August 22

On Thursday, 22 August 2024 at 16:56:05 UTC, Stephen Tashiro wrote:

>

Are there good explanations or examples of code for low level control of audio in Dlang?

I assume that all high level audio synthesis programs (music, speech synthesis, etc.) rely on some common method of outputing sound. I also assume the output to analog speakers must be different code than output over HDMI. Is that correct?

I visualize the final output as data that stores a waveform as a sequence of values that represent amplitudes. Is that low level of representation necessary? - or do modern speakers and audio devices have built-in programs that accept more concise formats?

There are many ways to output audio. this is what i use: https://www.portaudio.com/

and yes, audio is just 2 channels (left+right) of amplitudes. the number of values per time depends on the samplerate of the audio device.

August 22

On Thursday, 22 August 2024 at 16:56:05 UTC, Stephen Tashiro wrote:

>

I visualize the final output as data that stores a waveform as a sequence of values that represent amplitudes.
Yes, they represent sound pressure on your headphones or speakers even, at least at the points of sampling. :)

>

Is that low level of representation necessary? - or do modern speakers and audio devices have built-in programs that accept more concise formats?

There is no format that would be as expressive for treatment:

  • time-domain is as you say a direct access to amplitude
  • spectral representations must choose a transform, which is very much destructive, it's also harder to understand
  • the galaxy of wavelet-like are even harder to use
  • encoded spaces, like ADPCM, are even more remote

The problem is the suprising resolution of human hear, routinely distinguishing sounds 100 or 110 dB below others, that's a 1e-5 factor in amplitude.

Now, codecs by themselves are very nice and do not hamper the experience very much, often bringing a color of their own just like another coloration. It can be a sought after sound, much like vinyl or tape.

August 22

On Thursday, 22 August 2024 at 16:56:05 UTC, Stephen Tashiro wrote:

>

Are there good explanations or examples of code for low level control of audio in Dlang?

I assume that all high level audio synthesis programs (music, speech synthesis, etc.) rely on some common method of outputing sound. I also assume the output to analog speakers must be different code than output over HDMI. Is that correct?

I visualize the final output as data that stores a waveform as a sequence of values that represent amplitudes. Is that low level of representation necessary? - or do modern speakers and audio devices have built-in programs that accept more concise formats?

Id try repeating this article but with ffmpeg instead of image magik

https://crazymonkyyy.github.io/writings/gif.html

August 24
Stephen Tashiro kirjoitti 22.8.2024 klo 19.56:
> I visualize the final output as data that stores a waveform as a sequence of values that represent amplitudes.  Is that low level of representation necessary?

I have dabbled a bit with this idea with SDL. You can write 16-bit integers generated with a sine function to a raw SDL_mixer audio chunk and your speaker will really output exactly that waveform as a sound, at least on Linux. This program will loop a sound resembling throttling and de-throttling combustion engine:

```D
import bindbc.sdl;

int main()
{ import std;
  auto err = initialize;
  if(err.length)
  { writeln(err);
    return 1;
  }

  auto chunk = makeCombustionEngine();
  writeln("press enter to continue");
  Mix_PlayChannel(0, &chunk, -1);
  readln; // waits for keypress

  return 0;
}

auto makeCombustionEngine()
{ import std;

  auto chunk = Mix_Chunk(0, new ubyte[0x20000].ptr, 0x20000, 255);
  // 32 Ki samples
  iota(0u, 0x8000u)
  .map!(to!double)
  .cumulativeFold!((prev, b) => (prev + (2.0 + sin(b * PI * 2 / 0x8000)) / 0x800 * PI) % (PI*2))(0.0)
  // In the map predicate argument, we
  // convert the sound samples to uints, multiplexed into
  // two 16-bit channels that both have the original sample.
  // One for the left stereo, one for the right one.
  .map!(x => (y => (y << 16) + y)(cast(ushort)(sin(x)*0xFFFF)))
  .copy(cast(uint[])chunk.abuf[0 .. 0x20000]);

  return chunk;
}

string initialize()
{ import std;
  static import bindbc.loader.sharedlib;

  switch(loadSDL)
  { case sdlSupport: break;

    case SDLSupport.badLibrary:
    return "SDL2 library (.so or .dll) only partially loaded. Most probably too old version of it.";

    default:
    return "Could not find SDL2 library! (.so or .dll)";

  }

  switch(loadSDLMixer)
  { case SDLMixerSupport.noLibrary:
    return "Could not find SDL2 library! (.so or .dll)";

    case SDLMixerSupport.badLibrary:
    return "SDL2 library (.so or .dll) only partially loaded. Most probably too old version of it.";

    default: break;

  }

  if(SDL_Init(SDL_INIT_VIDEO|SDL_INIT_AUDIO) == -1)
  { return text("Error: failed to init SDL: ", SDL_GetError().fromStringz);
  }

  if( Mix_OpenAudio( 11025, MIX_DEFAULT_FORMAT, 2, 2048 ) < 0 )
  { return text( "SDL_mixer could not initialize! SDL_mixer Error: ", Mix_GetError() );
  }

  return "";
}
```

August 26

On Thursday, 22 August 2024 at 16:56:05 UTC, Stephen Tashiro wrote:

>

Are there good explanations or examples of code for low level control of audio in Dlang?

I assume that all high level audio synthesis programs (music, speech synthesis, etc.) rely on some common method of outputing sound. I also assume the output to analog speakers must be different code than output over HDMI. Is that correct?

I visualize the final output as data that stores a waveform as a sequence of values that represent amplitudes. Is that low level of representation necessary? - or do modern speakers and audio devices have built-in programs that accept more concise formats?

https://code.dlang.org/packages/iota

It's a bit janky and preliminary library ATM, but you can get an audio stream for both Linux and Windows. Only caveat is you need to have the audio thread in a @nogc thread, which is pretty much expected with real-time applications. It has a few testcases, but also my game engine uses it for its software synths.

Besides of audio, it also can do windowing, input, OpenGL initialization, etc.