Skip to content

johnidm/kaldi-asr-architeture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kaldi ASR Architeture Proposal

This repository contains a proposal for a speech recognition solution based on Kaldi ASR.

There are two main parts involved:

  • Model training: the file Dockerfile.train includes the steps for training a model.
  • Server API: the file Dockerfile.api sets up a REST API for consuming a trained model.

Benefits:

  • Training a new model is simple.
  • Consuming a trained model via a REST API is also simple.
  • All artifacts are produced using Docker files.

How to train a model:

docker build -f Dockerfile.train -t iara-train/latest .
docker run -v $PWD/model:/model --rm -it iara-train/latest

The training process may take a long time, about 12 hours.

How to consume a model:

docker build -f Dockerfile.api -t iara-api/latest .
docker run --rm -p 8000:8000 -it iara-api/latest

The transcription endpoint is avaliable at http://localhost:8000/transcribe.

Sample transcript request

curl -i -X POST \
   -H "Content-Type:application/json" \
   -d \
'{
  "url": "https://raw.githubusercontent.com/johnidm/kaldi-asr-architeture/master/audio.wav" 
}' \
 'http://127.0.0.1:8000/transcribe'

Sample transcript response

{
"elapsed_time": "0:00:24.651320",
"transcription": "esta é a da república federativa do brasil triangular las representantes da empresa brasileira é o em assembleia nacional constituinte eleita instituir o estado democrático ..."
}

Don't forget to visit the Kaldi ASR documentation.

The alphacep git repo contains several interesting projects involving Kaldi ASR.

The FalaBrasil scripts for Kaldi is a set of scripts to help creating a Kaldi ASR model for Brazilian Portuguese.

Feel free to make suggestions or contributions to the project.

I hope these ideias can be helpful!

About

Kaldi ASR Architeture Proposal

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published