- Poetry.
- Conda. I advice using Miniconda
- VS Code — Ubuntu, macOS and Windows installation guides.
- (Optional) CUDA Version: 11.4; Driver Version: 470.129.06 — Installation.
Course will be using Python and Pytorch as main Deep Learning Framework. So it is essential to have basic Pytorch knowledge and a bit of hands-on experience
In order to update or gain this knowledge you can use list of next tutorials and resources:
- Official Pytorch Tutorials
- Stanford Pytorch Tutorials
- Repository with different Pytorch Tutorials
- Detailed example of Sequence Classification and NER
- Install Poetry using Poetry full guide.
Important: Check if it is working using
poetry --version
. - Run command to keep your
.venv
folder right in your project:poetry config virtualenvs.in-project true
poetry shell
.Important: If you have
conda
and 2 environments were activated:conda deactivate
.poetry install --no-root
.
In order to activate environment on the next use:
poetry shell
Important: you should be inside your project folder to do it.
Next tutorial is mostly taken from NeMo repo
conda create --name nemo python==3.10.12
conda activate nemo
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install Cython
pip install nemo_toolkit['all']
pip install jupyterlab
jupyter lab --port 7766
Note: you may use any port.
- Intro in Audio ML. Basic Audio Processing. Self Supervised Representations
- Intro in Audio ML. Digital wave representation. Spectral audio representation
- Author: Volodymyr, Andrii
- Recording:
- Pre-processing, Filtering, Clustering
- Author: Oles, Andrii
- Colab: https://colab.research.google.com/drive/1PaM4K2eJoqeC8s0JLiiJ7RrDFI56rSxn?usp=sharing
- Recording:
- Self Supervised Representations
- Author: Volodymyr, Andrii
- Recording:
- Intro in Audio ML. Digital wave representation. Spectral audio representation
- Audio Classification and Detection. Validation
- Basic Audio Classification model
- Author: Volodymyr, Andrii
- Kaggle Inference: https://www.kaggle.com/code/vladimirsydor/ucu-hms-inference/notebook
- Processed Data: https://www.kaggle.com/datasets/vladimirsydor/ucu-hms-h5py/data
- Recording:
- Validation
- Author: Anton, Andrii
- Recording:
- Speaker diarization
- Author: Anton, Andrii
- Recording:
- Basic Audio Classification model
- ASR
- Introduction to ASR
- Author: Oles
- Colab: https://colab.research.google.com/drive/1iJvuurEQDaOkBba2zm1DSEkcdNDGhmU4
- Recording:
- Deeper ASR overview. CTC loss and Encoder architecture for ASR
- Author: Yurii, Volodymyr
- Recording:
- introduction into Transformers. Whisper
- Author: Yurii, Andrii
- Recording:
- Introduction to ASR
- TTS
- Introduction into TTS. Tacotron2
- Author: Taras, Volodymyr
- Original source: https://github.com/NVIDIA/NeMo?tab=readme-ov-file
- Recording:
- Pheme TTS
- Author: Taras, Yurii
- Recording:
- Introduction into TTS. Tacotron2
- Create a Kaggle account.
- Create a Notebook.
- Explore the docs and find out how to:
- Add the Kaggle dataset to the notebook.
- Turn on GPU.
TODO
- Create a Kaggle account.
- Proceed with Installation & Authentication.
- Don't forget to join a competition and accept its rules on a Kaggle website.
- Download the dataset with an API command.
Raw table: https://docs.google.com/spreadsheets/d/1zpYV6K_BtvOUqX09dIwwgJLwhoPH0q5Eu7ZaOF0-fcA/edit?usp=sharing
@misc{ucu_audio_processing_course_2024,
author = {Volodymyr Sydorskyi, Anton Bazdyrev, Oles Dobosevych, Yurii Laba, Andrii Shevtsov, Taras Sereda},
title = {UCU Audio Processing Course 2024},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/VSydorskyy/ucu_audio_processing_course}},
}