-
-
Notifications
You must be signed in to change notification settings - Fork 135
Low VRAM Option
The Low VRAM option is a crucial feature designed to enhance performance under constrained VRAM (Video RAM) conditions. TTS models that use CUDA & VRAM can require 2-4GB of VRAM to run effectively, which can be challenging when running alongside other VRAM-intensive applications like Large Language Models (LLMs).
The Low VRAM mode intelligently manages the relocation of the entire Text-to-Speech (TTS) model between your system's Random Access Memory (RAM) and VRAM. Here's the process:
- When not in use, the TTS model is stored in system RAM.
- When the TTS engine requires processing, the entire model seamlessly moves into VRAM.
- This movement causes the LLM to unload/displace some layers, ensuring optimal performance of the TTS engine.
- After TTS processing, the model moves back to system RAM, freeing up VRAM space for the LLM to reload its displaced layers.
This process adds about 1-2 seconds to both text generation by the LLM and the TTS engine but provides significant benefits in constrained VRAM environments.
- Prevents Fragmentation: By transferring the entire model between RAM and VRAM, the Low VRAM option avoids model fragmentation, ensuring the TTS model remains cohesive.
- Optimizes VRAM Usage: Ensures the TTS model has all the working space it needs in your GPU without having to work on small bits of the TTS model at a time.
- Performance Boost: Particularly beneficial for users with less than 2GB of free VRAM after loading their LLM, delivering a substantial 5-10x improvement in TTS generation speed, by avoiding fragmentation and multiple moves of the TTS models layers.
The Low VRAM option is most useful when:
- You have a smaller graphics card with limited VRAM.
- Your LLM has filled most of your available VRAM.
- You want to maintain performance while running both TTS and LLM models.
If you have ample free VRAM after loading your LLM, there's no benefit to using the Low VRAM option. In such cases, keeping the TTS model in VRAM will provide the best performance.
If you are using a TTS model like XTTS and you only have 2GB of VRAM and that model is all you have loaded. If you can load the XTTS model, you will probably find no benefit from Low VRAM mode (as I say, if you have NOTHING else loaded and IF you can get it loaded)
An Nvidia Graphics card/CUDA is required for the Low VRAM option to work effectively. Without an Nvidia GPU, the system will default to using system RAM for TTS processing.
Without the Low VRAM option, when VRAM is constrained, the GPU has to constantly swap in and out bits of the TTS model, causing significant slowdowns. The Low VRAM option mitigates this issue by ensuring the TTS model has the necessary working space when needed.