- Installation 📀
- Constructor 🦺
- Props 📦
- Methods 🖇
- Browser Support 🔮
- Contributing 🏗
- Getting Help ☎️
- Changelog 💾
- Further Reading 📚
Install the latest version
npm i @vatis-tech/asr-client-js
This will install the latest version of @vatis-tech/asr-client-js
with the caret (^
) symbol to its version, inside the package.json
file.
This means, that when you will do a later install into your project, it will take the latest minor version.
You can read more about this here: npm caret and tilde.
Install the exact latest version.
npm i -E @vatis-tech/asr-client-js
This will install the latest version of @vatis-tech/asr-client-js
without the caret (^
).
This means that on each new install, you will still have the initial installed version.
You can read more about this here: npm install --save-exact.
You can also use this plugin via CDN, and use it inside an HTML & JavaScript project, that will run in browsers. Just copy and paste the following script into your project:
<script src="unpkg.com/@vatis-tech/asr-client-js@1.2.1/umd/vatis-tech-asr-client.umd.js" charset="utf-8"></script>
You can also choose to download it, and use it locally, instead of a CDN. You can download it by pressing the following link: download here. Or, download it from Github here. After that copy and paste the following script into your app:
<script src="%path%/asr-client-js/dist/umd/vatis-tech-asr-client.umd.js" charset="utf-8"></script>
And replace %path%
with the path where you've downloaded and unzipped the plugin.
First you need to import the plugin:
import VTC from "@vatis-tech/asr-client-js";
After that, you can initialize it like so:
const vtc = new VTC({
service: "LIVE_ASR",
language: "ro_RO",
apiKey: "YOUR_API_KEY",
onData: (data) => { console.log(data); },
log: true,
});
If you opted out to use it as a downloadable or CDN (i.e. via a script
tag inside a static HTML & JavaScript project), you will be able to use the constructor as follows:
const vtc = new VatisTechClient.default({
service: "LIVE_ASR",
language: "ro_RO",
apiKey: "YOUR_API_KEY",
onData: (data) => { console.log(data); },
log: true,
});
This is an Object with the following structure:
{
"spokenCommandsList": [
{
"command": "COMMAND_NAME",
"regex": [ "regex1", "regex2", "regex3", ... ]
},
...
],
"findReplaceList": [
{
"replacement": "REPLACEMENT",
"regex": [ "regex1", "regex2", "regex3", ... ]
}
]
}
Where the value of spokenCommandsList
is an array of objects that have two properties, command
and regex
.
The value of the command
, i.e. COMMAND_NAME
, is a String.
The value of the regex
, i.e. [ "regex1", "regex2", "regex3", ... ]
, is an Array of Strings, i.e. regex1
, regex2
, regex3
are Strings.
The ideea with this spokenCommandsList
, is that each time one of the values from the regex
array is matched in the transcript, it will fire the onCommandData callback, with a special header
on the data, named SpokenCommand
.
The value of the SpokenCommand
header will be exactly the value of the command
, i.e. COMMAND_NAME
.
For example, you can use this spokenCommandsList
to define rules of when you want a new paragraph:
{
"spokenCommandsList": [
{
"command": "NEW_LINE",
"regex": ["new line", "new paragraph", "from the start", "start new line"]
}
]
}
So each time the back-end algorithm will find in the transcript one of "new line"
, "new paragraph"
, "from the start"
, "start new line"
phrases, the VTC client will fire the onCommandData callback. This way, in your application, you will be able to know, when to start a new paragraph.
And the value of findReplaceList
is an array of objects that have two properties, replacement
and regex
.
The value of the replacement
, i.e. REPLACEMENT
, is a String.
The value of the regex
, i.e. [ "regex1", "regex2", "regex3", ... ]
, is an Array of Strings, i.e. regex1
, regex2
, regex3
are Strings.
The ideea with this findReplaceList
, is that each time one of the values from the regex
array is matched in the transcript, it will change it to the replacement
.
For example, you can use this findReplaceList
to define rules for wrong named entities
{
"findReplaceList": [
{
"replacement": "SpongeBob",
"regex": ["Spange Bwab", "SpanBob", "Spwange Bob", "Sponge Boob"]
}
]
}
So each time the back-end algorithm will find in the transcript one of "Spange Bwab"
, "SpanBob"
, "Spwange Bob"
, "Sponge Boob"
phrases, it will change it to "SpongeBob"
.
You can also have replacements as symbols and punctuation marks:
{
"findReplaceList": [
{
"replacement": "(",
"regex": ["open parentheses", "new parentheses"]
},
{
"replacement": ")",
"regex": ["close parentheses", "stop parentheses"]
},
{
"replacement": "[",
"regex": ["open square brackets", "new square brackets"]
},
{
"replacement": "]",
"regex": ["close square brackets", "stop square brackets"]
}
]
}
When sending a config
to the client, the first callback to be fired, will be the onConfig callback.
This is a String that refers to the service that you would like to use.
Vatis Tech offers two speech-to-text services, LIVE_ASR
, you will receive the transcript while recording your microphone.
And STATIC_ASR
, you upload a file, and receive the transcript on a given link (at the moment, this plugin does not support this feature).
Only LIVE_ASR
can be used at the moment.
This is a String that represents the ID of the model you want to use.
If not specified, the default model of the selected language will be used.
This is a String for the language you want to transcribe from.
It must be in the following format: language_region
.
At the moment, only ro_RO
is available.
This is a String of your API key.
To get one, please follow these instructions:
- If you do not have one, please create an account on https://vatis.tech/.
- Log in to your account on https://vatis.tech/login.
- Got to the API key page on your account, https://vatis.tech/account/api-key.
- Copy the API key from there and add it to the
@vatis-tech/asr-client-js
constructor.
This is an Object with the following structure:
{
"service_host": "service_host",
"use_same_service_host_on_ws_connection": true | false,
"auth_token": "auth_token"
}
Where service_host
is a string, and the value of it is the host where the Vatis Tech Transcription Service is located. And auth_token
is a string, that is the Authentication token for connecting to the Vatis Tech Transcription Service.
The use_same_service_host_on_ws_connection
specifies if the returned live service IP should be ignored when making the connection, and the service_host
should be used instead. It defaults to false
.
You will only use one of the connectionConfig
or apiKey
method to connect to the Vatis Tech Transcription Service.
You will use the apiKey
when connecting to the Vatis Tech Cloud API, and you will use the connectionConfig
method when using the Vatis Tech On Premise Installation, and you will be provided with the necessary connectionConfig
object.
This is a Function on which you will receive from the back-end the transcript chunks. It is a callback it is always fired..
It has the following signature:
const onData = (data) => {
/* do something with data */
}
Or with function names:
function onData(data) {
/* do something with data */
}
The data
object that is received has the following structure:
{
"type": "<str>",
"headers": {
"key1": "value1",
"key2": "value2"
}
}
{
"type": "TIMESTAMPED_TRANSCRIPTION",
"headers": {},
"transcript": "hello world",
"words": [
{
"word": "hello",
"start_time": 1350.39,
"end_time": 4600.5,
"speaker": "Speaker 1",
"confidence": 0.96,
"entity": null,
"entity_group_id": null
},
{
"word": "world",
"start_time": 6200.3,
"end_time": 8020.0,
"speaker": "Speaker 1",
"confidence": 0.98,
"entity": null,
"entity_group_id": null
}
]
}
{
"type": "PROCESSED_TIMESTAMPED_TRANSCRIPTION",
"headers": {},
"transcript": "Hello, world!",
"words": [
{
"word": "hello",
"start_time": 1350.39,
"end_time": 4600.5,
"speaker": "Speaker 1",
"confidence": 0.96,
"entity": null,
"entity_group_id": null
},
{
"word": "world",
"start_time": 6200.3,
"end_time": 8020.0,
"speaker": "Speaker 1",
"confidence": 0.98,
"entity": null,
"entity_group_id": null
}
],
"processed_words": [
{
"word": "Hello,",
"start_time": 1350.39,
"end_time": 4600.5,
"speaker": "Speaker 1",
"confidence": 0.96,
"entity": null,
"entity_group_id": null
},
{
"word": "world!",
"start_time": 6200.3,
"end_time": 8020.0,
"speaker": "Speaker 1",
"confidence": 0.98,
"entity": null,
"entity_group_id": null
}
]
}
Name | Type | Description |
---|---|---|
PacketNumber | int | Incremental packet number |
Sid | string | Session id |
FrameStartTime | double | Frame start time in milliseconds |
FrameEndTime | double | Frame end time in milliseconds |
FinalFrame | boolean | Flag for marking that a segment of speech has ended and it won't be updated |
SilenceDetected | boolean | Flag to indicate silence was detected on the audio frame |
ProcessingTimeSeconds | double | Time of inferencing |
SplitPacket | boolean | Flag that indicates the response packet was split and this is one of the pieces |
FinalSplitPacket | boolean | Flag that indicates this is the final piece of the split response |
SplitId | string | Full packet id in format <packet_number>.<split_id>.<sub-split-id>.<sub-sub-split-id> |
RequestBytes | int | Additional bytes requested to produce a frame. This is just an estimation, any number of bytes can be sent |
SpokenCommand | string | Command detected in frame |
So, the data
can be final frame - i.e. the backend has fully finalized the transcript for those words and the time intervals (start and end time).
Or can be partial frame - i.e. the backend has not fully finalized the transcript for those words and the time intervals, and it will most likely change until it is overlapped by a final frame.
This is a Function on which you will receive from the back-end the partial transcript chunks.
It is identical to what the onData callback does, just that the data
will always represent partial frames.
It has the following signature:
const onPartialData = (data) => {
/* do something with data */
}
Or with function names:
function onPartialData(data) {
/* do something with data */
}
The data
object that comes on the current onPartialData
callback overrides the data
object that came on the previous onPartialData
callback.
This is a Function on which you will receive from the back-end the final transcript chunks.
It is identical to what the onData callback does, just that the data
will always represent final frames.
It has the following signature:
const onFinalData = (data) => {
/* do something with data */
}
Or with function names:
function onFinalData(data) {
/* do something with data */
}
The data
object that comes from the onFinalData
callback overrides the data
object that came on the previous onPartialData
callback.
This is a Function on which you will receive from the back-end a message saying if the config was succesfully added ore not.
It has the following signature:
const onConfig = (data) => {
/* do something with data */
}
Where data
object has the following structure:
{
"type": "CONFIG_APPLIED",
"headers": {},
"config_packet": {
"type": "CONFIG",
"headers": {},
"spokenCommandsList": [
{
"command": "NEW_PARAGRAPH",
"regex": ["new line"]
}
]
}
}
This is a Function on which you will receive from the back-end the transcript chunks for speciffic commands.
For example, if you initialize the plugin with a set of commands (e.g. {spokenCommandsList: [ { "command": "NEW_PARAGRAPH", "regex": ["start new paragraph", "new phrase", "new sentence"] } ] }
), each time the back-end algorithm will find these sets of commands, it will send on this function the data.
It has the following signature:
const onCommandData = (data) => {
/* do something with data */
}
Or with function names:
function onCommandData(data) {
/* do something with data */
}
The data
object from this callback, is the same as the one from onData callback, but it also has a new property, named spokenCommand
, with the actual command that triggered the callback.
This is a Boolean prop.
If set to true, it will call the logger
function with an object that has the following structure:
{
currentState: ...,
description: ....
}
This tells you the current state of the plugin.
The last state will be the following:
{
currentState: `@vatis-tech/asr-client-js: Initialized the "MicrophoneGenerator" plugin.`,
description: `@vatis-tech/asr-client-js: The MicrophoneGenerator was successful into getting user's microphone, and will start sending data each 1 second.`,
}
This is a Function on which you will receive data about the plugin state.
It has the following signature:
const logger = (info) => {
/* do something with info */
}
Or with function names:
function onData(info) {
/* do something with info */
}
The info
object that is received has the props from above.
If log
prop is set to true
and the logger
prop is not set, or is not a function with the above signature, the plugin will default the logger
to console.log
.
This is a Function that will be called upon successful destruction;
This is a Function that will be called upon errors;
This is the host for generating a key. It defaults to "https://vatis.tech/".
How fast you want data to be captured from the microphone. Default is 250 milliseconds
.
The frame length of what the microphone catches. Default is 0.3 seconds
. (For a microphoneTimeslice
of 250
, the frameLength
is 0.3
).
Default is 0.3 seconds
.
Default is 0.3 seconds
.
This is a number that needs to be > 0. It represents the number of message to be sent to the ASR Service, before waiting for a response. Default is 5
.
This is a boolean, and if set to true
, it means, that each time the transcription sees one command, it will trigger a final frame there.
This will destroy the instantiated @vatis-tech/asr-client-js
.
Also, the destroy method will be invoked if any error will come through the socket.io-client
as a response from Vatis Tech ASR SERVICE.
NOTE! If the VTC plugin did not send all messages, or it did not receive all messages, the destruction will not happen instantly.
NOTE! The destruction of the VTC plugin will happen only when all messages have been sent and received.
NOTE! If you wish to destroy the VTC plugin without waiting for all messages to be sent and received, you can pass { hard: true}
as a parameter to the .destroy
call.
Call this method, if you want to pause for a while the recording.
After calling the pause
method, you can call this one to resume recording.
This is to specify which audioinput
device id, should be used by the client. If undefined
or the browser does not have that audioinput
device id, it will select a default one.
You can read more on the following links:
- https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/enumerateDevices
- https://developer.mozilla.org/en-US/docs/Web/API/MediaDeviceInfo
Call this methos if you want to download the audio file as audio/webm
type.
Call this methos if you want to get all chunks from your michrophone as blobs.
You can then use this to download the audio as you wish. Below is an example of downloading as audio/webm
.
// ... code
try {
const allBlobData = vtc.getRecordingAsBlobChunks();
if (allBlobData && allBlobData.length) {
const audioBlob = new Blob(allBlobData, {
type: "audio/webm",
});
const audioUrl = URL.createObjectURL(audioBlob);
const anchor = document.createElement("a");
anchor.style.display = "none";
document.body.appendChild(anchor);
anchor.href = audioUrl;
anchor.download = "audio.webm";
anchor.click();
window.URL.revokeObjectURL(audioUrl);
anchor?.remove();
}
} catch (error) {
console.error(error);
}
// ... code
We officially support the latest versions of the following browsers:
Chrome | Firefox | Safari | Safari | Edge |
---|---|---|---|---|
We love pull requests!
Our community is safe for all. Before submitting a pull request, please review and agree our Code of Conduct, after that, please check the Contribution guidelines.
If you have questions, you need some help, you've found a bug, or you have an improvement idea, do not hesitate to open an issue here.
There are three types of issues:
To keep the README a bit lighter, you can read the Changelog here.
If you are a developer, the following links might interest you:
- API documentation: https://vatis.tech/documentation/
- API status: https://vatistech.statuspage.io/
- Supported languages: https://vatis.tech/languages
- Accepted file formats: https://vatis.tech/formats
- Check the pricing: https://vatis.tech/pricing
- Join the team: https://vatis.tech/careers
If you are just curios to learn more about Vatis Tech, please refer to these links:
- Landing page for Vatis Tech: https://vatis.tech/
- About Vatis Tech: https://vatis.tech/about
- Vatis Tech newsroom: https://vatis.tech/press
- Message us on Facebook: https://www.facebook.com/VatisTech/
- Connect with us on LinkedIn: https://www.linkedin.com/company/vatis-tech/
- Chat with out Facebook community: https://www.facebook.com/groups/1630293847133624