Skip to content

Homeworks for the Spring 2024 Audio Processing course

Notifications You must be signed in to change notification settings

tyaroshko/audio_processing

 
 

Repository files navigation

UCU Audio Processing Course

Setup Working Environment

Pre-requirements

Pytorch base knoweledge

Course will be using Python and Pytorch as main Deep Learning Framework. So it is essential to have basic Pytorch knowledge and a bit of hands-on experience

In order to update or gain this knowledge you can use list of next tutorials and resources:

Setup environment

Poetry (Main)

  1. Install Poetry using Poetry full guide.

    Important: Check if it is working using poetry --version.

  2. Run command to keep your .venv folder right in your project: poetry config virtualenvs.in-project true
  3. poetry shell.

    Important: If you have conda and 2 environments were activated: conda deactivate.

  4. poetry install --no-root.

In order to activate environment on the next use:

poetry shell

Important: you should be inside your project folder to do it.

Conda (For TTS)

Next tutorial is mostly taken from NeMo repo

conda create --name nemo python==3.10.12
conda activate nemo
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install Cython
pip install nemo_toolkit['all']
pip install jupyterlab

Start Jupyter

jupyter lab --port 7766

Note: you may use any port.

Content

  1. Intro in Audio ML. Basic Audio Processing. Self Supervised Representations
    1. Intro in Audio ML. Digital wave representation. Spectral audio representation
      • Author: Volodymyr, Andrii
      • Recording:
    2. Pre-processing, Filtering, Clustering
    3. Self Supervised Representations
      • Author: Volodymyr, Andrii
      • Recording:
  2. Audio Classification and Detection. Validation
    1. Basic Audio Classification model
    2. Validation
      • Author: Anton, Andrii
      • Recording:
    3. Speaker diarization
      • Author: Anton, Andrii
      • Recording:
  3. ASR
    1. Introduction to ASR
    2. Deeper ASR overview. CTC loss and Encoder architecture for ASR
      • Author: Yurii, Volodymyr
      • Recording:
    3. introduction into Transformers. Whisper
      • Author: Yurii, Andrii
      • Recording:
  4. TTS
    1. Introduction into TTS. Tacotron2
    2. Pheme TTS
      • Author: Taras, Yurii
      • Recording:

Use Kaggle or Colab for computations

Kaggle

  1. Create a Kaggle account.
  2. Create a Notebook.
  3. Explore the docs and find out how to:
    • Add the Kaggle dataset to the notebook.
    • Turn on GPU.

Colab

  1. Create a Notebook in Colab.
  2. Enable GPU.
  3. Add the Kaggle dataset to Colab following the guide.

Data

TODO

How to use Kaggle datasets

  1. Create a Kaggle account.
  2. Proceed with Installation & Authentication.
  3. Don't forget to join a competition and accept its rules on a Kaggle website.
  4. Download the dataset with an API command.

Feedback [Only For Lectors]

2024

Raw table: https://docs.google.com/spreadsheets/d/1zpYV6K_BtvOUqX09dIwwgJLwhoPH0q5Eu7ZaOF0-fcA/edit?usp=sharing

Citation

@misc{ucu_audio_processing_course_2024,
  author = {Volodymyr Sydorskyi, Anton Bazdyrev, Oles Dobosevych, Yurii Laba, Andrii Shevtsov, Taras Sereda},
  title = {UCU Audio Processing Course 2024},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/VSydorskyy/ucu_audio_processing_course}},
}

About

Homeworks for the Spring 2024 Audio Processing course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%