Improve Speech-To-Text (Rules-Based Approach) #17

chidiewenike · 2020-10-17T01:48:31Z

Objective

The DeepSpeech Speech-To-Text system needs to be improved to handle uncommon & non-English words. The rules-based approach is to inspect the output of the current DeepSpeech model and create a mapping of transcribed audio to the actual expected output. It will look-up all the substrings of the transcribed text to search for potential errors when the text is transcribed and replace those substrings in the event of errors. You can store these mappings in a JSON.

Key Result

Using the run_stt function of stream_deepspeech.py, return a string of audio input that is correctly transcribed.

swanton/stream_deepspeech.py

Line 16 in b8e5502

def run_stt(time_len=TIME_LEN):

Example

Expected Transcription: "What is Casa Verde used for?" => DeepSpeech transcription: "What is cause uh very day used for?"

mapping={
    "cause uh very day" : "Casa Verde"
}
transcription_substring = "cause uh very day"
print(mapping[transcription_substring ]) # Output: Casa Verde

Details

Correctly transcribe all QA pairs from the question-answer pairs Google Sheet.

You will need the following DeepSpeech model and DeepSpeech scorer to use run_stt. Ensure that these files are in the same directory as the stream_deepspeech.py program.

If in need of assistance, please ask @chidiewenike

chidiewenike · 2020-10-28T02:46:04Z

Algorithms to consider in the future:

Levenshtein Distances/Fuzzy Matching

chidiewenike · 2020-10-28T02:51:39Z

Storing JSON from Python Dict

import json

mapper = {
        "ramona roderigo":"Ramon Rodriguez"
}

with open("test_json.json", "w") as in_json:
    json.dump(mapper, in_json)

print(mapper["ramona roderigo"]) # Output => Ramon Rodriguez

Pseudo-Python for Substring Mapper

from stream_deepspeech import *

def stt_mapper(predicted):
    mapper = {
    "romona roderigo" : "ramon rodriguez"            
}

    for substring in predicted:
        if substring in mapper:
            predicted.replace(substring, mapper[predicted])

    return predicted

predicted = run_stt(5)
# predicted => "romona roderigo is an operator at the ranch"
correct = stt_mapper(predicted)
# correct => "ramon rodriguez is an operator at the ranch"

chidiewenike · 2020-12-15T17:22:32Z

@akimminavarro @taylor-nguyen-987 Can you folks try out Swanton, Swanton Pacific, and Swanton Pacific Ranch? Maybe start with those?

tsolution branch for issue #17

chidiewenike added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed good-new-member-issue labels Oct 17, 2020

chidiewenike assigned akimmi Oct 28, 2020

chidiewenike assigned taylor-nguyen-987 Dec 15, 2020

taylor-nguyen-987 added a commit that referenced this issue Apr 23, 2021

Merge pull request #46 from calpoly-csai/tsolution

8f5aed7

tsolution branch for issue #17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Speech-To-Text (Rules-Based Approach) #17

Improve Speech-To-Text (Rules-Based Approach) #17

chidiewenike commented Oct 17, 2020 •

edited

Loading

chidiewenike commented Oct 28, 2020

chidiewenike commented Oct 28, 2020 •

edited

Loading

chidiewenike commented Dec 15, 2020

Improve Speech-To-Text (Rules-Based Approach) #17

Improve Speech-To-Text (Rules-Based Approach) #17

Comments

chidiewenike commented Oct 17, 2020 • edited Loading

Objective

Key Result

Example

Details

chidiewenike commented Oct 28, 2020

chidiewenike commented Oct 28, 2020 • edited Loading

chidiewenike commented Dec 15, 2020

chidiewenike commented Oct 17, 2020 •

edited

Loading

chidiewenike commented Oct 28, 2020 •

edited

Loading