Skip to content

Commit

Permalink
Minor documentation updates (#565)
Browse files Browse the repository at this point in the history
* Minor documentation updates

* Update readme

* Update api.md
  • Loading branch information
will2dye4 authored Nov 16, 2023
1 parent fcd0600 commit e976d93
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 19 deletions.
24 changes: 13 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ and only important bug fixes will be processed on the new repo. Please do not op

This is the 4th release of Demucs (v4), featuring Hybrid Transformer based source separation.
**For the classic Hybrid Demucs (v3):** [Go this commit][demucs_v3].
If you are experiencing issues and want the old Demucs back, please fill an issue, and then you can get back to the v3 with
If you are experiencing issues and want the old Demucs back, please file an issue, and then you can get back to Demucs v3 with
`git checkout v3`. You can also go [Demucs v2][demucs_v2].


Demucs is a state-of-the-art music source separation model, currently capable of separating
drums, bass, and vocals from the rest of the accompaniment.
Demucs is based on a U-Net convolutional architecture inspired by [Wave-U-Net][waveunet].
The v4 version features [Hybrid Transformer Demucs][htdemucs], a hybrid spectrogram/waveform separation model using Transformers.
It is based on [Hybrid Demucs][hybrid_paper] (also provided in this repo) with the innermost layers are
It is based on [Hybrid Demucs][hybrid_paper] (also provided in this repo), with the innermost layers
replaced by a cross-domain Transformer Encoder. This Transformer uses self-attention within each domain,
and cross-attention across domains.
The model achieves a SDR of 9.00 dB on the MUSDB HQ test set. Moreover, when using sparse attention
Expand Down Expand Up @@ -127,7 +127,7 @@ python3 -m pip install -U git+https://github.com/facebookresearch/demucs#egg=dem

Advanced OS support are provided on the following page, **you must read the page for your OS before posting an issues**:
- **If you are using Windows:** [Windows support](docs/windows.md).
- **If you are using MAC OS X:** [Mac OS X support](docs/mac.md).
- **If you are using macOS:** [macOS support](docs/mac.md).
- **If you are using Linux:** [Linux support](docs/linux.md).

### For machine learning scientists
Expand All @@ -143,7 +143,7 @@ pip install -e .

This will create a `demucs` environment with all the dependencies installed.

You will also need to install [soundstretch/soundtouch](https://www.surina.net/soundtouch/soundstretch.html): on Mac OSX you can do `brew install sound-touch`,
You will also need to install [soundstretch/soundtouch](https://www.surina.net/soundtouch/soundstretch.html): on macOS you can do `brew install sound-touch`,
and on Ubuntu `sudo apt-get install soundstretch`. This is used for the
pitch/tempo augmentation.

Expand Down Expand Up @@ -198,16 +198,18 @@ demucs --two-stems=vocals myfile.mp3
```


If you have a GPU, but you run out of memory, please use `--segment SEGMENT` to reduce length of each split. `SEGMENT` should be changed to a integer. Personally recommend not less than 10 (the bigger the number is, the more memory is required, but quality may increase). Create an environment variable `PYTORCH_NO_CUDA_MEMORY_CACHING=1` is also helpful. If this still cannot help, please add `-d cpu` to the command line. See the section hereafter for more details on the memory requirements for GPU acceleration.
If you have a GPU, but you run out of memory, please use `--segment SEGMENT` to reduce length of each split. `SEGMENT` should be changed to a integer describing the length of each segment in seconds.
A segment length of at least 10 is recommended (the bigger the number is, the more memory is required, but quality may increase). Note that the Hybrid Transformer models only support a maximum segment length of 7.8 seconds.
Creating an environment variable `PYTORCH_NO_CUDA_MEMORY_CACHING=1` is also helpful. If this still does not help, please add `-d cpu` to the command line. See the section hereafter for more details on the memory requirements for GPU acceleration.

Separated tracks are stored in the `separated/MODEL_NAME/TRACK_NAME` folder. There you will find four stereo wav files sampled at 44.1 kHz: `drums.wav`, `bass.wav`,
`other.wav`, `vocals.wav` (or `.mp3` if you used the `--mp3` option).

All audio formats supported by `torchaudio` can be processed (i.e. wav, mp3, flac, ogg/vorbis on Linux/Mac OS X etc.). On Windows, `torchaudio` has limited support, so we rely on `ffmpeg`, which should support pretty much anything.
All audio formats supported by `torchaudio` can be processed (i.e. wav, mp3, flac, ogg/vorbis on Linux/macOS, etc.). On Windows, `torchaudio` has limited support, so we rely on `ffmpeg`, which should support pretty much anything.
Audio is resampled on the fly if necessary.
The output will be a wave file encoded as int16.
The output will be a wav file encoded as int16.
You can save as float32 wav files with `--float32`, or 24 bits integer wav with `--int24`.
You can pass `--mp3` to save as mp3 instead, and set the bitrate with `--mp3-bitrate` (default is 320kbps).
You can pass `--mp3` to save as mp3 instead, and set the bitrate (in kbps) with `--mp3-bitrate` (default is 320).

It can happen that the output would need clipping, in particular due to some separation artifacts.
Demucs will automatically rescale each output stem so as to avoid clipping. This can however break
Expand All @@ -230,8 +232,8 @@ The list of pre-trained models is:
but quality can be slightly worse.
- `SIG`: where `SIG` is a single model from the [model zoo](docs/training.md#model-zoo).

The `--two-stems=vocals` option allows to separate vocals from the rest (e.g. karaoke mode).
`vocals` can be changed into any source in the selected model.
The `--two-stems=vocals` option allows separating vocals from the rest of the accompaniment (i.e., karaoke mode).
`vocals` can be changed to any source in the selected model.
This will mix the files after separating the mix fully, so this won't be faster or use less memory.

The `--shifts=SHIFTS` performs multiple predictions with random shifts (a.k.a the *shift trick*) of the input and average them. This makes prediction `SHIFTS` times
Expand All @@ -252,7 +254,7 @@ If you do not have enough memory on your GPU, simply add `-d cpu` to the command

## Calling from another Python program

The main function provides a `opt` parameter as a simple API. You can just pass the parsed command line as this parameter:
The main function provides an `opt` parameter as a simple API. You can just pass the parsed command line as this parameter:
```python
# Assume that your command is `demucs --mp3 --two-stems vocals -n mdx_extra "track with space.mp3"`
# The following codes are same as the command above:
Expand Down
2 changes: 1 addition & 1 deletion docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ for file, sources in separated:

## API References

The types of each parameter and return value is not listed in this document. To know the exact type of them, please read the type hints in api.py (most modern code editors support infering types based on type hints).
The types of each parameter and return value is not listed in this document. To know the exact type of them, please read the type hints in api.py (most modern code editors support inferring types based on type hints).

### `class Separator`

Expand Down
12 changes: 6 additions & 6 deletions docs/mac.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Mac OS X support for Demucs
# macOS support for Demucs

If you have a sufficiently recent version of OS X, you can just run
If you have a sufficiently recent version of macOS, you can just run

```bash
python3 -m pip install --user -U demucs
Expand All @@ -10,10 +10,10 @@ python3 -m demucs -d cpu PATH_TO_AUDIO_FILE_1
demucs -d cpu PATH_TO_AUDIO_FILE_1
```

If you do not already have Anaconda installed or much experience with the terminal on Mac OS X here are some detailed instructions:
If you do not already have Anaconda installed or much experience with the terminal on macOS, here are some detailed instructions:

1. Download [Anaconda 3.8 (or more recent) 64 bits for MacOS][anaconda]:
2. Open [Anaconda Prompt in MacOSX][prompt]
1. Download [Anaconda 3.8 (or more recent) 64-bit for macOS][anaconda]:
2. Open [Anaconda Prompt in macOS][prompt]
3. Follow these commands:
```bash
conda activate
Expand All @@ -24,5 +24,5 @@ demucs -d cpu PATH_TO_AUDIO_FILE_1

**Important, torchaudio 0.12 update:** Torchaudio no longer supports decoding mp3s without ffmpeg installed. You must have ffmpeg installed, either through Anaconda (`conda install ffmpeg -c conda-forge`) or with Homebrew for instance (`brew install ffmpeg`).

[anaconda]: https://www.anaconda.com/distribution/#download-section
[anaconda]: https://www.anaconda.com/download
[prompt]: https://docs.anaconda.com/anaconda/user-guide/getting-started/#open-nav-mac
2 changes: 1 addition & 1 deletion docs/windows.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,5 @@ If you have an error saying that `mkl_intel_thread.dll` cannot be found, you can
**If you get a permission error**, please try starting the Anaconda Prompt as administrator.


[install]: https://www.anaconda.com/distribution/#windows
[install]: https://www.anaconda.com/download
[prompt]: https://docs.anaconda.com/anaconda/user-guide/getting-started/#open-prompt-win

0 comments on commit e976d93

Please sign in to comment.