Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide guidance on creating a header for streaming_synthesize in streaming_tts_quickstart.py #13080

Open
parthea opened this issue Jan 21, 2025 · 1 comment
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. samples Issues that are directly related to samples. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@parthea
Copy link
Collaborator

parthea commented Jan 21, 2025

From googleapis/google-cloud-python#13405, the response to streaming_synthesize is headerless LINEAR16 audio with a sample rate of 24000.. The code sample below prints the size of the audio content but does not include the necessary header to actually play the audio.

streaming_responses = client.streaming_synthesize(itertools.chain([config_request], request_generator()))
for response in streaming_responses:
print(f"Audio content size in bytes is: {len(response.audio_content)}")

This may not be the purpose of the code sample, however having this extra information in the code sample will help with debugging customer issues such as googleapis/google-cloud-python#13405.

I added code which includes the raw audio header, however there is likely an easier way to achieve this. We should provide guidance on how folks should create the audio header.

# This is a raw header based on the spec at https://docs.fileformat.com/audio/wav/
header = b'RIFF\x00\x00\x00\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00\xc0]\x00\x00\x80\xbb\x00\x00\x02\x00\x10\x00data\x00\x00\x00\x00'

total_length = 0

with open(f"output.wav", "wb") as out:
    out.write(header)
    for response in streaming_responses:
        # calculate the length of the content
        total_length += len(response.audio_content)
        out.write(response.audio_content)
    # Position 40 - 43: Size of the data section
    out.seek(40)
    out.write(bytes([total_length & 0xFF, (total_length >> 8) & 0xFF, (total_length >> 16) & 0xFF, (total_length >> 24) & 0xFF]))

import os
file_size = os.path.getsize("output.wav")

with open(f"output.wav", "r+b") as out:
    # Position 4-7: Size of the overall file - 8 bytes, in bytes (32-bit integer). Typically, you’d fill this in after creation.
    out.seek(4)
    out.write(bytes([file_size & 0xFF, (file_size >> 8) & 0xFF, (file_size >> 16) & 0xFF, (total_length >> 24) & 0xFF]))
@parthea parthea added priority: p2 Moderately-important priority. Fix may not be included in next release. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Jan 21, 2025
@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Jan 21, 2025
@glasnt
Copy link
Contributor

glasnt commented Jan 21, 2025

This part of this WIP PR might be similar to what you need here (possibly)

https://github.com/GoogleCloudPlatform/python-docs-samples/pull/13053/files#diff-5d664c635b2f6262b57f11d8b4d2016da17a18a41a8f57efd60d69b39c37365dR254-R272

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. samples Issues that are directly related to samples. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants