Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable_word_time_offsets=False to a few examples, and added enable_wor… #1046

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion speech/cloud-client/quickstart.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,8 @@ def run_quickstart():
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
language_code='en-US',
enable_word_time_offsets=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this defaults to false, then we should just leave it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with @jonparrott on this: This is definitely something we want to make clear in the API/client library reference, but adding this in the sample codes would make the API seem harder to use than it actually is (note that the other parameters encoding, langauge_code, and sample_rate_hertz are all required, but enable_word_time_offsets is optional and defaults to false).

What are some possible use case scenarios where not showing enable_word_time_offsets=False could cause friction for the users?


# Detects speech in the audio file
response = client.recognize(config, audio)
Expand Down
12 changes: 11 additions & 1 deletion speech/cloud-client/transcribe_async.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ def transcribe_file(speech_file):
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
language_code='en-US',
enable_word_time_offsets=True)

# [START migration_async_response]
operation = client.long_running_recognize(config, audio)
Expand All @@ -63,6 +64,15 @@ def transcribe_file(speech_file):
for alternative in alternatives:
print('Transcript: {}'.format(alternative.transcript))
print('Confidence: {}'.format(alternative.confidence))

for word_info in alternative.words:
word = word_info.word
start_time = word_info.start_time
end_time = word_info.end_time
print('Word: {}, start_time: {}, end_time: {}'.format(
word,
start_time.seconds + start_time.nanos * 1e-9,
end_time.seconds + end_time.nanos * 1e-9))
# [END migration_async_response]
# [END def_transcribe_file]

Expand Down