Training Data

This repo supports the following speech dataset:

Nawar Halabi

You can use any other dataset if you write a preprocessor for it.

Writing a Preprocessor

Each training example consists of:

The text that was spoken
A mel-scale spectrogram of the audio
A linear-scale spectrogram of the audio

The preprocessor is responsible for generating these. See nawar.py for a commented example.

For each training example, a preprocessor should:

Load the audio file:
```
wav = audio.load_wav(wav_path)
```

Compute linear-scale and mel-scale spectrograms (float32 numpy arrays):

spectrogram = audio.spectrogram(wav).astype(np.float32)
mel_spectrogram = audio.melspectrogram(wav).astype(np.float32)

Save the spectrograms to disk:

np.save(os.path.join(out_dir, spectrogram_filename), spectrogram.T, allow_pickle=False)
np.save(os.path.join(out_dir, mel_spectrogram_filename), mel_spectrogram.T,  allow_pickle=False)

Note that the transpose of the matrix returned by audio.spectrogram is saved so that it's in time-major format.

Generate a tuple (spectrogram_filename, mel_spectrogram_filename, n_frames, text) to write to train.txt. n_frames is just the length of the time axis of the spectrogram.

After you've written your preprocessor, you can add it to preprocess.py by following the example of the other preprocessors in that file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRAINING_DATA.md

TRAINING_DATA.md

Training Data

Writing a Preprocessor

Files

TRAINING_DATA.md

Latest commit

History

TRAINING_DATA.md

File metadata and controls

Training Data

Writing a Preprocessor