Skip to content
Sylvain Chevalier edited this page Jun 1, 2014 · 3 revisions

This page summarizes frequent questions about pocketsphinx.js.

Why are there so many levels of wrapping?

With the new API based on embind, there are not that many levels of wrapping anymore. Here are the reasons why we have these different levels:

  1. The public C API of PocketSphinx is pretty large and some actions like initialization and adding grammars require many steps that involve passing and receiving pointers. That is why we have added another layer which makes it easy to initialize the recognizer and pass and get data from it with limited interaction through pointers.
  2. We have used C++ to implement that layer above the original PocketSphinx API, which allows us to use convenient C++ features such as containers and strings. Using embind, that layer is directly accessible from JavaScript..
  3. The JavaScript generated from PocketSphinx is fairly large and loading it directly in the HTML gives a quite unpleasant experience as it blocks the UI thread. That's why we have wrapped it inside a Web Worker.

Recognition is not very accurate

This is a wide question, and to start with, you should have reasonable expectations based on other experiences with open-source speech recognizers. A few areas where you could look at to improve accuracy are:

  • Make sure your grammar is able to catch what you or your users actually say.
  • If you have audio data from your expected users, you can try training your own acoustic model or adapt an existing one. You can also use another available acoustic model, PocketSphinx ships with a bunch of them.

I would like to recognize another language than English

Refer to Sphinx documents, what you will need is:

  • A pronunciation dictionary for your language
  • An acoustic model
  • Grammars that use words from your dictionary.

You can find a lot of resources on the CMU Sphinx website, or on Voxforge for instance.