Skip to content

AllTalk V2 QuickStart Guide

erew123 edited this page Nov 25, 2024 · 35 revisions

For more in depth information, check out the AllTalk Wiki.


1. Starting AllTalk

To start using AllTalk, you’ll use various start_xxx files:

  • start_alltalk - Launches the main AllTalk TTS application.
  • start_environment - Activates the AllTalk Python environment.
  • start_finetune - Starts the finetuning process.
  • start_diagnostics - Generates a diagnostics file for troubleshooting.

After running start_alltalk, the application will display key details in the console, including update status, API address, and Gradio links.


2. Console/Terminal Information & API

Key Console/Terminal Details:

  • GitHub Updated: Shows the last update date. Pull new changes by running git pull.
  • API Address: 127.0.0.1:7851 - the API endpoint for TTS calls and a basic web interface.
  • Gradio Interface:

Tip: Open the Gradio links by CTRL+Left Click to access the main UI.
Tip: Errors/Issues will be displayed at the terminal/console. Known Errors page is here

image


3. Using the Gradio Interface

Generate TTS Tab

This is where you control TTS engine selection and model loading:

  1. Change TTS Engine: Go to "Generate TTS" > "Generate" tab.
  2. Swap Engine: Use the "Swap TTS Engine" button to select different TTS engines (e.g., XTTS, Piper, VITS).
  3. Load Model: Click "Load Different Model" to change the model for the chosen engine.
  4. Help: The "Generate Help" tab provides detailed explanations for this section.
  5. General: In "Generate TTS" tab you can generate text-to-speech with all the available settings for the loaded TTS engine. Some TTS engines dont support all features, so some may be greyed out/unavailable for use.

Note: Loading an engine without downloading a compatible model/voice will result in an error at the console. Tip: This is the interface where you can control which TTS Engine or which of its models is loaded, however you will need to download some models for each TTS engine. To do this, head into the TTS Engine Settings tab and select the TTS Engine you want to work with. In there you can look over the details of each TTS engine, how it works, setup specific settings and also download the known model files for it.

image


4. Global Settings

Adjust global settings here. Examples include:

  • Enable/Disable audio transcoding to different file formats e.g MP3.
  • Delete Old WAVs: Automatically delete output TTS files older than a specified period on start-up.
  • Adjust API settings: Globally change the behaviour of how AllTalk responds to TTS generation requests.
  • RVC Pipeline: Enable in the "RVC Settings" tab if you need this feature and it will download the required models and set up the folders.

image


5. TTS Engine Settings

This section lets you configure each TTS engine individually:

  1. Engine Information: Detailed descriptions of each engine (F5-TTS, Piper, XTTS, Parler, etc.) and links to developer sites.
  2. Models/Voices Download: Download models or voices specific to each TTS engine.
  3. Default Settings: Set default parameters, including temperature, pitch, and repetition.
  4. Engine Help: Instructions on using each engine, managing models, and troubleshooting.

image

Model DeepSpeed Pitch Speed RepPen MultiLang Streaming Low VRAM Temp Multi Model Notes
F5-TTS No No Yes No *Yes No Yes No Yes *
Parler-TTS No No No No No No Yes No Yes **
Piper No No Yes No *No No No No Yes ***
Coqui VITS No No No No *No No Yes No Yes ***
Coqui XTTS Yes No Yes Yes Yes Yes Yes Yes Yes ****

Notes

  • F5-TTS: Supports only Chinese and English voice cloning.
  • Parler-TTS: Likely English TTS generation only.
  • Piper and Coqui VITS: Language support depends on the model file loaded.
  • Coqui XTTS: Multi-language and voice cloning capability.

6. Folder Structure

AllTalk organizes files in the following structure:

alltalk_tts/
    ├── .GitHub/                 # Git's version management tracking
    ├── alltalk_environment/     # Python packages for AllTalk
    ├── finetune/                # XTTS finetuning dataset files
    ├── models/                  # Engines model files are stored in here
    │   ├── f5tts/               # F5-TTS's model files/folders
    │   ├── xtts/                # XTTS's model files/folders
    │   └── etc.../
    ├── system/                  # System components and config
    │   ├── espeak-ng/           # Windows installer for espeak-ng
    │   ├── gradio_pages/        # Gradio interface pages
    │   ├── requirements/        # Requirement files
    │   ├── TGWUI Extension/     # TGWUI remote extension
    │   └── tts_engines/         # TTS engines setup scripts
    │       ├── tts_engines.json # TTS engine configuration file
    │       ├── new_engines.json # New TTS engine configuration file
    │       ├── f5tts/
    │       ├── parler/
    │       ├── piper/
    │       ├── rvc/
    │       ├── template-tts-engine/
    │       ├── vits/
    │       └── xtts/
    ├── voices/                  # WAV samples for voice cloning
    ├── outputs/                 # TTS output files
    ├── confignew.json           # Main configuration file
    ├── atsetup.bat              # Windows setup file
    ├── atsetup.sh               # Linux setup file
    ├── Other Files...
    ├── script.py                # Main start-up script
    └── tts_server.py            # Engine management script


7. Additional Information

  • Auto-Delete WAVs: Set in Global Settings; controls automatic deletion of old output files.

For detailed help refer to the relevant tabs in the Gradio interface or the AllTalk Wiki.


Clone this wiki locally