simple-asa
uses FFT to get the frequency domain of sampled audio data with the
help of the fftw3 library.
This is a description of simple-asa
's internal mathematical workings
and not a manual or guide!
simple-asa
is essentially an endless loop working on an audio stream with these
steps:
- Take a sequence:
$s$ PCM samples - Stop if there were not enough samples
- Window the samples (Hann, Boxcar, Blackman-Harris, Flat Top, etc.)
- Zero-pad to the FFT size
$n$ if$s < n$ . - Run FFT to get
$m = 1 + n / 2$ bins - Use
$b$ bins from$b_0$ to$b_1$ where$b ≤ m$ - Calculate the bin magnitudes
- Combine bins to get
$l$ analyser lines, see "Power of Two" - Scale, optionally apply analyser gravity and convert to unsigned 8-bit
- Output the
$l$ bytes - Advance the start of the next sequence by
$d$ samples - Go back to step 1.
It's a variant of the procedure described in this StackOverflow answer.
FFT as an implementation of DFT has the assumption that the signal repeats smoothly beyond the ends of the sequence. Usually this is not the case for music (except for the rare case of samples being same at the start and at the end of the sequence, for example for a sine wave of exactly the right period). So FFT treats the abrupt changes between sequences as high-frequency signals. Window function smooth down the sequence boundaries to zero such that these changes disappear, however this introduces other distortions.
For details about the different window functions read the Wikipedia page on Window Function and about the rationale of windowing this StackOverflow answer.
The FFT algorithm produces bins and the analyser shows lines. The number of
bins is:
cava calculates frequency cut-offs, but this is not neccessary if we just omit frequency bins.
FFT has as result
A practical example on how to cutoff frequencies: You want a visualiser with 28 lines with a frequency resolution of 200 Hz with the first line starting at 600 Hz and 27 additional lines of each 200 Hz higher, so that the last one is at 6000 Hz and you have data sampled at 44100 Hz.
A frequency resolution of 200 Hz at 44100 Hz means 220.5 bins. We round up to
We run FFT and omit the first 3 bins and all others after the 31th bin.
If we discover that FFT at size 440 runs slowly, we can try 512 instead and omit the first 4 bins and all others after the 32th bin and get a frequency resolution of 44100 Hz / 255 = 173 Hz and the first starting at 4 · 173 Hz = 692 Hz and the last at 31 · 173 Hz = 5361 Hz. Or we can try with an FFT size of 384. This means, you are going to need some experiments to find out what works for you.
When combining bins we can try to mimic music perception. Octaves are separated by doubling the frequency. For example the note C'' has four times the frequency than C. This is the musical octave law.
However a naive spectrum analyser might neglect this law. For example it shows C and D in the same line but C'' and D'' in two different lines, which can feel subtly wrong.
The base is the function
test/power
.
If
We have solved for
- Calculate naively
$\bar{g}(j) = \lceil g(j) \rceil$ for all$j$ . - However the sum overshoots
$b:\quad\sum \bar{g}(j) > b$ - Let's correct the overshoot by calculating weights
$\omega(j) = \left| \bar{g}(j) - g(j) \right|\cdot\mathrm{ln}(g(j))$ for all$j$ . - And take away one bin for line
$j$ where$\omega(j)$ is at maximum - Repeat the previous two steps till the overshoot disappeared