-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add text generation stream status to shared module, use for better TTS with auto-play #192
Conversation
- Adds "still_streaming" to shared module for extensions to know if generation is complete - Changed TTS extension with new options: - Show text under the audio widget - Automatically play the audio once text generation finishes - manage the generated wav files (only keep files for finished generations, optional max file limit) - [wip] ability to change voice pitch and speed - added 'tensorboard' to requirements, since python sent "tensorboard not found" errors after a fresh installation.
You beat me to it :) Though your implementation has a lot more functionality. Seeing as this should be able to run as a server I don't think |
Using the audio block autoplay is definitely the better method, thanks for pointing it out! And I might as well disable audio generation during the stream like you too. I'll test out adding some of your changes and report back. |
…block autoplay - Keeping simpleaudio until audio block "autoplay" doesn't play previous messages - Only generate audio for finished messages - Better name for autoplay, clean up comments - set default to unlimited wav files. Still a few bugs when wav id resets Co-Authored-By: Christoph Hess <9931495+ChristophHess@users.noreply.github.com>
Indeed,
|
- New autoplay using html tag, removed from old message when new input provided - Add voice pitch and speed control - Group settings together - Use name + conversation history to match wavs to messages, minimize problems when changing characters Current minor bugs: - Gradio seems to cache the audio files, so using "clear history" and generating new messages will play the old audio (the new messages are saving correctly). Gradio will clear cache and use correct audio after a few messages or after a page refresh. - Switching characters does not immediately update the message ID used for the audio. ID is updated after the first new message, but that message will use the wrong ID
All right, things should be ready. I've removed I did a bunch of testing (on windows), and found two minor bugs affecting the audio history which I've changed to handle switching between characters, but it looks like the source of the bugs are outside of the extension. I'm pretty sure these bugs are also present in the current version of silero_tts on the main branch. Anyway, let me know if there are any issues on linux. |
needing to manually install tensorboard might be a windows-only problem. Can be easily solved manually.
Two questions:
This is looking very good now! |
- Change wav naming to be completely unique using timestamp instead of message ID, stops browser using cached audio when new audio is made with the same file name (eg after regenerate or clear history). - Make the autoplay setting actually disable autoplay. - Make Settings panel a bit more compact. - Hide html errors when audio file of chat history is missing. - Add button to permanently convert TTS history to normal text messages - Changed the "show message text" toggle to affect the chat history.
I also added an option to remove all the audio blocks from chat history (the errors in the terminal from deleted audio files were annoying, but I couldn't find a way to mute them. This fixes that haha) Finally I used some similar code so that changing the "show message text" would also affect the chat history. But it looks like the new streaming method broke the audio generation, so there is still a bit of work to do. |
- Need to update `shared.still_streaming = False` before the final `yield formatted_outputs`, shifted the position of some yields.
Everything seems to be working on my end now. If how I'm changing |
Looks good to me too. Thanks again for submitting this PR, this is a massive improvement to the silero extension and I really liked it. When combined with the whisper extension, it should allow for a very immersive chat experience. I will merge now. Credits have been added to https://github.com/oobabooga/text-generation-webui/wiki/Extensions. |
Add text generation stream status to shared module, use for better TTS with auto-play
I am attempting to port this function to elevenlabs, but I have come against a similar issue as described above. When there are cached messages the bot will not generate new audio and only plays old messages. If I delete the cache, then the bot only generates audio sporadically. I am not a dev, I'm just a guy trying to get this to work, but maybe you could take a look at my code and recommend a fix? |
Were you trying to port the stream status or the autoplay? There were some changes to the sileroTTS extension after this pull that removed the "stream status" variable, since it wasn't using the output_modifier function as intended. Instead it now sets If you're talking about the autoplay feature and how it sometimes plays old messages after regeneration, the fix was to make every audio file's name unique using a timestamp. That way the ui is forced to not use the cache. To do this we Great work with the extension, good luck with the fix! |
I have made those fixes and gotten everything sorted to the best of my abilities, but now upon generation of text, it fails to generate audio and terminal shows the following error. I know this isn't your problem and I'm sorry if I'm bothering you, but I feel like I'm one step away from getting this to work. `Output generated in 5.74 seconds (1.92 tokens/s, 11 tokens, context 30, seed 1352445741) During handling of the above exception, another exception occurred: Traceback (most recent call last): |
No worries, the code is easy enough to read and it really does look like one step away haha. I haven't done any testing myself, but it looks like you just need to replace line 128 from If that doesn't fix things then the issue might have to do with how long elevenlabs takes to generate the audio, which I would have no idea on how to handle. |
THANK YOU SO MUCH! This works perfectly! All I need to do now is clone Scarlett Johansson's voice and I've got a full "Her" situation going. |
Hey oobabooga, thanks for this webui! I added a simple way for extensions to know when text generation is finished, so that I could auto-play the TTS audio. I also made some other quality-of-life changes to the TTS extension, see the commit description for the details.
Feel free to do whatever you like with this pull request.