Identify speakers with stable voice timbre

English | 简体中文

Identify speakers with stable voice timbre

This script aims to identify speakers with stable voice timbre.

The evaluation is based on the Institute for Intelligent Computing's ERes2NetV2 speaker recognition model.

File Description

audio_generator.py: Script to generate random speakers and create test audio
consistency_evaluator.py: Script to evaluate stability
test_data.yaml: Text data for generating audio

Installation

Please ensure you have installed the ChatTTS project and it is running correctly. Installation steps

Clone this project and copy the speaker_consistency folder to the root directory of the ChatTTS project.

The directory should look like this:

├── ChatTTS
│         ├── __init__.py
│         ├── core.py
│         └── ...
└── speaker_consistency
          ├── audio_generator.py
          ├── consistency_evaluator.py
          ├── requirements.txt
          └── test_data.yaml

Install dependencies:

pip install -r ./speaker_consistency/requirements.txt

Usage

1. Generate Test Audio

Run the audio_generator.py script to generate test audio:

python speaker_consistency/audio_generator.py

2. Evaluate the Stability of the Generated Audio

Run the consistency_evaluator.py script to perform the evaluation:

python speaker_consistency/consistency_evaluator.py

The evaluation results will be saved in the evaluation_results.csv file.

3. Dataset

You can edit the test_data.yaml file to modify the test text.

Command Line Parameters

1. Generation Script (`audio_generator.py`)

python audio_generator.py --dir <output_directory> --num <number_of_speakers> --ds <dataset_yaml_file>

Parameter Description:

--dir: Directory to store the generated test audio files. Default is ./test_audio.
--num: Number of random speakers to generate. Default is 10.
--ds: Path to the dataset YAML file. Default is test_data.yaml.

2. Evaluation Script (`consistency_evaluator.py`)

python consistency_evaluator.py --dir <output_directory>

Parameter Description:

--dir: Directory containing the audio files generated in the previous step. Default is ./test_audio.

Rank Metric Calculation

The cosine similarity between the embedding vectors of each pair of audio segments is calculated to obtain the mean and standard deviation of the similarity. After normalizing the standard deviation, the rank metric assigns a weight of 70% to the mean and 30% to the standard deviation. Generally, the higher the rank metric, the better the consistency of the audio segments.

Sample Output

id	...	rank_TestA	rank_TestB
0000	...	0.802779	0.809263
0001	...	0.858448	0.773149
0002	...	0.763376	0.779981

Speakers with consistently high scores across all sample sets have relatively stable voice timbre. You can adjust the test set according to your specific use case.

How to Use Speaker Timbre

The speaker timbre is stored by default in the directory of the generated audio under the folder, with the filename speaker.pt.

You can load the speaker timbre with the following code:

spk = torch.load(<PT-FILE-PATH>, map_location=torch.device("cpu"))
params_infer_code = {
    'spk_emb': spk,
}

Limitations

The evaluation results may be limited by the quantity and diversity of the samples.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
speaker_consistency		speaker_consistency
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Identify speakers with stable voice timbre

File Description

Installation

Usage

1. Generate Test Audio

2. Evaluate the Stability of the Generated Audio

3. Dataset

Command Line Parameters

1. Generation Script (`audio_generator.py`)

2. Evaluation Script (`consistency_evaluator.py`)

Rank Metric Calculation

Sample Output

How to Use Speaker Timbre

Limitations

Related Projects

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

2noise/ChatEval

Folders and files

Latest commit

History

Repository files navigation

Identify speakers with stable voice timbre

File Description

Installation

Usage

1. Generate Test Audio

2. Evaluate the Stability of the Generated Audio

3. Dataset

Command Line Parameters

1. Generation Script (audio_generator.py)

2. Evaluation Script (consistency_evaluator.py)

Rank Metric Calculation

Sample Output

How to Use Speaker Timbre

Limitations

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

1. Generation Script (`audio_generator.py`)

2. Evaluation Script (`consistency_evaluator.py`)

Packages