Feel free to dive into any section that interests you or aligns with your focus.
- Speech & Audio Algorithms and Machine Learning
- Table of contents
- Acoustics
- Electronics
- Signal Processing
- Deep Learning
- What is sound intensity, and how do acoustic instruments measure it?
- How do you convert sound pressure between dB SPL and pascals (Pa)?
- Discuss the difference between dB SPL and dB(A) scales.
- How do the density and elasticity of a medium affect the speed of sound?
- What is room impulse response (RIR), and how do we measure it?
- Discuss the concept of reverberation and its implications in room acoustics.
- What methods are used to measure reverberation? (RT60)
- How does the GCC-PHAT algorithm differ from cross-correlation?
- What factors would you consider when selecting a microphone?
- Describe the microphone calibration process.
- Describe the process of converting analog signals into digital data.
- What is the role of an Anti-Aliasing filter?
- What are the typical sampling rates and range of bits commonly used in audio?
- What digital protocols are used in microphones, such as I2S (Inter-IC Sound) and PCM (Pulse Code Modulation)?
- What are the key differences between Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters?
- Explain the usage of the filtfilt function.
- How can zero-phase filtering be implemented, and what advantages does it offer?
- What are the various methods for testing the stability of digital filters?
- What is energy in the context of speech signals, and how is it computed?
- What are the advantages of using the zero-crossing rate (ZCR) compared to the Fast Fourier Transform (FFT)?
- What methods are commonly used to estimate the pitch of a speech signal?
- What are some common audio features, and how are they extracted?
- How can we test the similarity between two audio signals?
- Explain the Short-Time Fourier Transform (STFT) and its implementation.
- Why do we use zero padding in STFT?
- Why do we use overlap and windowing in STFT?
- What are the trade-offs when determining the STFT parameters?
- What do people usually use Mel-frequency cepstral coefficients (MFCC) for in audio processing?
- How does the number of quantizer levels affect the dynamic range?
- Describe the operation of an adaptive differential pulse code modulation (AD-PCM).
- What is linear predictive coding (LPC), and how does it represent speech signals?
- How does the mu-law quantization differ from linear quantization, and what advantages does it offer?
- How does spectral subtraction work?
- What is the Wiener filtering method?
- When are wavelet-based denoising techniques effective?
- What is Speech Presence Probability (SPP), and how is it used in noise reduction?
- How is adaptive filtering used in noise reduction and echo cancellation?
- What challenges are faced in sound classification tasks?
- How can deep learning be applied to sound classification?
- What metrics assess classification model performance?
- What deep network architectures are common for speech enhancement?
- How is the phase treated in speech enhancement?
- What loss functions are typical in speech enhancement, and why might Mean Squared Error (MSE) have limitations?
- Which objective metrics evaluate speech enhancement, and how do they differ?
- Distinguish between speaker diarization, identification, and verification.
- What are typical deep network architectures for speaker recognition?
- What are speaker embeddings, and how are they extracted and used?
- What are x-vectors, and how do they differ from i-vectors?
- What methods are used in speech recognition?
- How is audio data preprocessed for speech recognition?
- What evaluation methods are used for speech recognition models?
- How does Whisper employ weak supervision, and what is its architecture?
- Describe training and optimization for Whisper models.
- What distinguishes Wav2Vec2 from Wav2Vec?
- How does CTC encoding address limitations in decoding Wav2Vec outputs?
- Explain the role of Beam Search in Wav2Vec models.