From e976d93ecc3865e5757426930257e200846a520a Mon Sep 17 00:00:00 2001 From: William Dye Date: Thu, 16 Nov 2023 07:32:35 -0500 Subject: [PATCH] Minor documentation updates (#565) * Minor documentation updates * Update readme * Update api.md --- README.md | 24 +++++++++++++----------- docs/api.md | 2 +- docs/mac.md | 12 ++++++------ docs/windows.md | 2 +- 4 files changed, 21 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index 347f3142..fe7e77b9 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ and only important bug fixes will be processed on the new repo. Please do not op This is the 4th release of Demucs (v4), featuring Hybrid Transformer based source separation. **For the classic Hybrid Demucs (v3):** [Go this commit][demucs_v3]. -If you are experiencing issues and want the old Demucs back, please fill an issue, and then you can get back to the v3 with +If you are experiencing issues and want the old Demucs back, please file an issue, and then you can get back to Demucs v3 with `git checkout v3`. You can also go [Demucs v2][demucs_v2]. @@ -19,7 +19,7 @@ Demucs is a state-of-the-art music source separation model, currently capable of drums, bass, and vocals from the rest of the accompaniment. Demucs is based on a U-Net convolutional architecture inspired by [Wave-U-Net][waveunet]. The v4 version features [Hybrid Transformer Demucs][htdemucs], a hybrid spectrogram/waveform separation model using Transformers. -It is based on [Hybrid Demucs][hybrid_paper] (also provided in this repo) with the innermost layers are +It is based on [Hybrid Demucs][hybrid_paper] (also provided in this repo), with the innermost layers replaced by a cross-domain Transformer Encoder. This Transformer uses self-attention within each domain, and cross-attention across domains. The model achieves a SDR of 9.00 dB on the MUSDB HQ test set. Moreover, when using sparse attention @@ -127,7 +127,7 @@ python3 -m pip install -U git+https://github.com/facebookresearch/demucs#egg=dem Advanced OS support are provided on the following page, **you must read the page for your OS before posting an issues**: - **If you are using Windows:** [Windows support](docs/windows.md). -- **If you are using MAC OS X:** [Mac OS X support](docs/mac.md). +- **If you are using macOS:** [macOS support](docs/mac.md). - **If you are using Linux:** [Linux support](docs/linux.md). ### For machine learning scientists @@ -143,7 +143,7 @@ pip install -e . This will create a `demucs` environment with all the dependencies installed. -You will also need to install [soundstretch/soundtouch](https://www.surina.net/soundtouch/soundstretch.html): on Mac OSX you can do `brew install sound-touch`, +You will also need to install [soundstretch/soundtouch](https://www.surina.net/soundtouch/soundstretch.html): on macOS you can do `brew install sound-touch`, and on Ubuntu `sudo apt-get install soundstretch`. This is used for the pitch/tempo augmentation. @@ -198,16 +198,18 @@ demucs --two-stems=vocals myfile.mp3 ``` -If you have a GPU, but you run out of memory, please use `--segment SEGMENT` to reduce length of each split. `SEGMENT` should be changed to a integer. Personally recommend not less than 10 (the bigger the number is, the more memory is required, but quality may increase). Create an environment variable `PYTORCH_NO_CUDA_MEMORY_CACHING=1` is also helpful. If this still cannot help, please add `-d cpu` to the command line. See the section hereafter for more details on the memory requirements for GPU acceleration. +If you have a GPU, but you run out of memory, please use `--segment SEGMENT` to reduce length of each split. `SEGMENT` should be changed to a integer describing the length of each segment in seconds. +A segment length of at least 10 is recommended (the bigger the number is, the more memory is required, but quality may increase). Note that the Hybrid Transformer models only support a maximum segment length of 7.8 seconds. +Creating an environment variable `PYTORCH_NO_CUDA_MEMORY_CACHING=1` is also helpful. If this still does not help, please add `-d cpu` to the command line. See the section hereafter for more details on the memory requirements for GPU acceleration. Separated tracks are stored in the `separated/MODEL_NAME/TRACK_NAME` folder. There you will find four stereo wav files sampled at 44.1 kHz: `drums.wav`, `bass.wav`, `other.wav`, `vocals.wav` (or `.mp3` if you used the `--mp3` option). -All audio formats supported by `torchaudio` can be processed (i.e. wav, mp3, flac, ogg/vorbis on Linux/Mac OS X etc.). On Windows, `torchaudio` has limited support, so we rely on `ffmpeg`, which should support pretty much anything. +All audio formats supported by `torchaudio` can be processed (i.e. wav, mp3, flac, ogg/vorbis on Linux/macOS, etc.). On Windows, `torchaudio` has limited support, so we rely on `ffmpeg`, which should support pretty much anything. Audio is resampled on the fly if necessary. -The output will be a wave file encoded as int16. +The output will be a wav file encoded as int16. You can save as float32 wav files with `--float32`, or 24 bits integer wav with `--int24`. -You can pass `--mp3` to save as mp3 instead, and set the bitrate with `--mp3-bitrate` (default is 320kbps). +You can pass `--mp3` to save as mp3 instead, and set the bitrate (in kbps) with `--mp3-bitrate` (default is 320). It can happen that the output would need clipping, in particular due to some separation artifacts. Demucs will automatically rescale each output stem so as to avoid clipping. This can however break @@ -230,8 +232,8 @@ The list of pre-trained models is: but quality can be slightly worse. - `SIG`: where `SIG` is a single model from the [model zoo](docs/training.md#model-zoo). -The `--two-stems=vocals` option allows to separate vocals from the rest (e.g. karaoke mode). -`vocals` can be changed into any source in the selected model. +The `--two-stems=vocals` option allows separating vocals from the rest of the accompaniment (i.e., karaoke mode). +`vocals` can be changed to any source in the selected model. This will mix the files after separating the mix fully, so this won't be faster or use less memory. The `--shifts=SHIFTS` performs multiple predictions with random shifts (a.k.a the *shift trick*) of the input and average them. This makes prediction `SHIFTS` times @@ -252,7 +254,7 @@ If you do not have enough memory on your GPU, simply add `-d cpu` to the command ## Calling from another Python program -The main function provides a `opt` parameter as a simple API. You can just pass the parsed command line as this parameter: +The main function provides an `opt` parameter as a simple API. You can just pass the parsed command line as this parameter: ```python # Assume that your command is `demucs --mp3 --two-stems vocals -n mdx_extra "track with space.mp3"` # The following codes are same as the command above: diff --git a/docs/api.md b/docs/api.md index ab55f922..dbd858a7 100644 --- a/docs/api.md +++ b/docs/api.md @@ -47,7 +47,7 @@ for file, sources in separated: ## API References -The types of each parameter and return value is not listed in this document. To know the exact type of them, please read the type hints in api.py (most modern code editors support infering types based on type hints). +The types of each parameter and return value is not listed in this document. To know the exact type of them, please read the type hints in api.py (most modern code editors support inferring types based on type hints). ### `class Separator` diff --git a/docs/mac.md b/docs/mac.md index 6e6c3d0c..62dd235e 100644 --- a/docs/mac.md +++ b/docs/mac.md @@ -1,6 +1,6 @@ -# Mac OS X support for Demucs +# macOS support for Demucs -If you have a sufficiently recent version of OS X, you can just run +If you have a sufficiently recent version of macOS, you can just run ```bash python3 -m pip install --user -U demucs @@ -10,10 +10,10 @@ python3 -m demucs -d cpu PATH_TO_AUDIO_FILE_1 demucs -d cpu PATH_TO_AUDIO_FILE_1 ``` -If you do not already have Anaconda installed or much experience with the terminal on Mac OS X here are some detailed instructions: +If you do not already have Anaconda installed or much experience with the terminal on macOS, here are some detailed instructions: -1. Download [Anaconda 3.8 (or more recent) 64 bits for MacOS][anaconda]: -2. Open [Anaconda Prompt in MacOSX][prompt] +1. Download [Anaconda 3.8 (or more recent) 64-bit for macOS][anaconda]: +2. Open [Anaconda Prompt in macOS][prompt] 3. Follow these commands: ```bash conda activate @@ -24,5 +24,5 @@ demucs -d cpu PATH_TO_AUDIO_FILE_1 **Important, torchaudio 0.12 update:** Torchaudio no longer supports decoding mp3s without ffmpeg installed. You must have ffmpeg installed, either through Anaconda (`conda install ffmpeg -c conda-forge`) or with Homebrew for instance (`brew install ffmpeg`). -[anaconda]: https://www.anaconda.com/distribution/#download-section +[anaconda]: https://www.anaconda.com/download [prompt]: https://docs.anaconda.com/anaconda/user-guide/getting-started/#open-nav-mac diff --git a/docs/windows.md b/docs/windows.md index acc48c70..b259b765 100644 --- a/docs/windows.md +++ b/docs/windows.md @@ -63,5 +63,5 @@ If you have an error saying that `mkl_intel_thread.dll` cannot be found, you can **If you get a permission error**, please try starting the Anaconda Prompt as administrator. -[install]: https://www.anaconda.com/distribution/#windows +[install]: https://www.anaconda.com/download [prompt]: https://docs.anaconda.com/anaconda/user-guide/getting-started/#open-prompt-win