Language Models for German ASR

This repository contains scripts to train German language models. Check licenses on the sites of the specific datasets, if you want use the these language models.

Data Sources

Name	Num. Sentences	Site
EuroParl	1920208	http://www.statmt.org/europarl/
News-Commentary	383764	https://www.statmt.org/wmt13/translation-task.html
Tuda-Text	7776674	https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/acoustic-models.html

Reproduce

In order to reproduce the language models, first the data has to be prepared. The required commands can be found in prepare_data.sh.

kenlm

For n-gram language models, kenlm is used.

Trained Models

Trained models can be found in the attachements of the releases.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
path.sh		path.sh
prepare_data.sh		prepare_data.sh
requirements.txt		requirements.txt
run_kenlm.sh		run_kenlm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Models for German ASR

Data Sources

Reproduce

kenlm

Trained Models

About

Releases 1

Packages

Languages

License

german-asr/german-asr-lm

Folders and files

Latest commit

History

Repository files navigation

Language Models for German ASR

Data Sources

Reproduce

kenlm

Trained Models

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages