-
-
Notifications
You must be signed in to change notification settings - Fork 143
AllTalk V2 QuickStart Guide
For more in depth information, check out the AllTalk Wiki.
To start using AllTalk, you’ll use various start_xxx
files:
-
start_alltalk
- Launches the main AllTalk TTS application. -
start_environment
- Activates the AllTalk Python environment. -
start_finetune
- Starts the finetuning process. -
start_diagnostics
- Generates a diagnostics file for troubleshooting.
After running start_alltalk
, the application will display key details in the console, including update status, API address, and Gradio links.
-
GitHub Updated: Shows the last update date. Pull new changes by running
git pull
. -
API Address:
127.0.0.1:7851
- the API endpoint for TTS calls and a basic web interface. -
Gradio Interface:
- Light Mode: 127.0.0.1:7852
- Dark Mode: 127.0.0.1:7852?__theme=dark
Tip: Open the Gradio links by
CTRL+Left Click
to access the main UI.
Tip: Errors/Issues will be displayed at the terminal/console. Known Errors page is here
This is where you control TTS engine selection and model loading:
- Change TTS Engine: Go to "Generate TTS" > "Generate" tab.
- Swap Engine: Use the "Swap TTS Engine" button to select different TTS engines (e.g., XTTS, Piper, VITS).
- Load Model: Click "Load Different Model" to change the model for the chosen engine.
- Help: The "Generate Help" tab provides detailed explanations for this section.
- General: In "Generate TTS" tab you can generate text-to-speech with all the available settings for the loaded TTS engine. Some TTS engines dont support all features, so some may be greyed out/unavailable for use.
Note: Loading an engine without downloading a compatible model/voice will result in an error at the console.
Tip: This is the interface where you can control which TTS Engine or which of its models is loaded, however you will need to download some models for each TTS engine. To do this, head into the TTS Engine Settings tab and select the TTS Engine you want to work with. In there you can look over the details of each TTS engine, how it works, setup specific settings and also download the known model files for it.
Adjust global settings here. Examples include:
- Enable/Disable audio transcoding to different file formats e.g MP3.
- Delete Old WAVs: Automatically delete output TTS files older than a specified period on start-up.
- Adjust API settings: Globally change the behaviour of how AllTalk responds to TTS generation requests.
- RVC Pipeline: Enable in the "RVC Settings" tab if you need this feature and it will download the required models and set up the folders.
This section lets you configure each TTS engine individually:
- Engine Information: Detailed descriptions of each engine (F5-TTS, Piper, XTTS, Parler, etc.) and links to developer sites.
- Models/Voices Download: Download models or voices specific to each TTS engine.
- Default Settings: Set default parameters, including temperature, pitch, and repetition.
- Engine Help: Instructions on using each engine, managing models, and troubleshooting.
Model | DeepSpeed | Pitch | Speed | RepPen | MultiLang | Streaming | Low VRAM | Temp | Multi Model | Notes |
---|---|---|---|---|---|---|---|---|---|---|
F5-TTS | No | No | Yes | No | *Yes | No | Yes | No | Yes | * |
Parler-TTS | No | No | No | No | No | No | Yes | No | Yes | ** |
Piper | No | No | Yes | No | *No | No | No | No | Yes | *** |
Coqui VITS | No | No | No | No | *No | No | Yes | No | Yes | *** |
Coqui XTTS | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | **** |
- F5-TTS: Supports only Chinese and English voice cloning.
- Parler-TTS: Likely English TTS generation only.
- Piper and Coqui VITS: Language support depends on the model file loaded.
- Coqui XTTS: Multi-language and voice cloning capability.
AllTalk organizes files in the following structure:
alltalk_tts/
├── .GitHub/ # Git's version management tracking
├── alltalk_environment/ # Python packages for AllTalk
├── finetune/ # XTTS finetuning dataset files
├── models/ # Engines model files are stored in here
│ ├── f5tts/ # F5-TTS's model files/folders
│ ├── xtts/ # XTTS's model files/folders
│ └── etc.../
├── system/ # System components and config
│ ├── espeak-ng/ # Windows installer for espeak-ng
│ ├── gradio_pages/ # Gradio interface pages
│ ├── requirements/ # Requirement files
│ ├── TGWUI Extension/ # TGWUI remote extension
│ └── tts_engines/ # TTS engines setup scripts
│ ├── tts_engines.json # TTS engine configuration file
│ ├── new_engines.json # New TTS engine configuration file
│ ├── f5tts/
│ ├── parler/
│ ├── piper/
│ ├── rvc/
│ ├── template-tts-engine/
│ ├── vits/
│ └── xtts/
├── voices/ # WAV samples for voice cloning
├── outputs/ # TTS output files
├── confignew.json # Main configuration file
├── atsetup.bat # Windows setup file
├── atsetup.sh # Linux setup file
├── Other Files...
├── script.py # Main start-up script
└── tts_server.py # Engine management script
- Auto-Delete WAVs: Set in Global Settings; controls automatic deletion of old output files.
For detailed help refer to the relevant tabs in the Gradio interface or the AllTalk Wiki.