Add text generation stream status to shared module, use for better TTS with auto-play #192

xanthousm · 2023-03-08T11:54:30Z

Hey oobabooga, thanks for this webui! I added a simple way for extensions to know when text generation is finished, so that I could auto-play the TTS audio. I also made some other quality-of-life changes to the TTS extension, see the commit description for the details.

Feel free to do whatever you like with this pull request.

- Adds "still_streaming" to shared module for extensions to know if generation is complete - Changed TTS extension with new options: - Show text under the audio widget - Automatically play the audio once text generation finishes - manage the generated wav files (only keep files for finished generations, optional max file limit) - [wip] ability to change voice pitch and speed - added 'tensorboard' to requirements, since python sent "tensorboard not found" errors after a fresh installation.

…ebui

CypherNaught-0x · 2023-03-08T17:48:04Z

You beat me to it :) Though your implementation has a lot more functionality. Seeing as this should be able to run as a server I don't think simpleaudio is the best solution as it would only produce the sounds on the host computer. May I suggest you or oobabooga have a peek at my PR. I used the native autoplay functionality of the audio block.
Perhaps a merge of our solutions would be best?

xanthousm · 2023-03-08T21:55:26Z

Using the audio block autoplay is definitely the better method, thanks for pointing it out! And I might as well disable audio generation during the stream like you too. I'll test out adding some of your changes and report back.

…block autoplay - Keeping simpleaudio until audio block "autoplay" doesn't play previous messages - Only generate audio for finished messages - Better name for autoplay, clean up comments - set default to unlimited wav files. Still a few bugs when wav id resets Co-Authored-By: Christoph Hess <9931495+ChristophHess@users.noreply.github.com>

oobabooga · 2023-03-09T13:15:22Z

Indeed, simpleaudio seems a bit troublesome. Trying to install it with pip on Linux caused

      c_src/simpleaudio_alsa.c:8:10: fatal error: alsa/asoundlib.h: No such file or directory
          8 | #include <alsa/asoundlib.h>

- New autoplay using html tag, removed from old message when new input provided - Add voice pitch and speed control - Group settings together - Use name + conversation history to match wavs to messages, minimize problems when changing characters Current minor bugs: - Gradio seems to cache the audio files, so using "clear history" and generating new messages will play the old audio (the new messages are saving correctly). Gradio will clear cache and use correct audio after a few messages or after a page refresh. - Switching characters does not immediately update the message ID used for the audio. ID is updated after the first new message, but that message will use the wrong ID

xanthousm · 2023-03-11T05:52:45Z

All right, things should be ready. I've removed simpleaudio and am using the html tags like Christoph. To handle the old messages, i remove the autoplay tag from the previous message in input_modifier by accessing the internal and visible shared.history. I also finished the pitch+speed control and cleaned up the settings for the extension.

I did a bunch of testing (on windows), and found two minor bugs affecting the audio history which I've changed to handle switching between characters, but it looks like the source of the bugs are outside of the extension. I'm pretty sure these bugs are also present in the current version of silero_tts on the main branch.

Anyway, let me know if there are any issues on linux.

needing to manually install tensorboard might be a windows-only problem. Can be easily solved manually.

oobabooga · 2023-03-11T14:19:50Z

Two questions:

About the bug that you mentioned while switching questions, I could not understand/reproduce it. Can you explain in more details?
When I click on Regenerate, it seems like the audio is not updated, at least if I had already played part of the previous audio.

This is looking very good now!

- Change wav naming to be completely unique using timestamp instead of message ID, stops browser using cached audio when new audio is made with the same file name (eg after regenerate or clear history). - Make the autoplay setting actually disable autoplay. - Make Settings panel a bit more compact. - Hide html errors when audio file of chat history is missing. - Add button to permanently convert TTS history to normal text messages - Changed the "show message text" toggle to affect the chat history.

xanthousm · 2023-03-12T07:11:10Z

The bug had to do with using a message_id=len(shared.history['visible']), where the value of message_id was not updating if shared.history was changed from choosing another character or clearing history.
The other bug where regenerated messages were using old audio was a problem with the browser caching the old audio. Both of those problems are now fixed by using timestamps instead of message_id.

I also added an option to remove all the audio blocks from chat history (the errors in the terminal from deleted audio files were annoying, but I couldn't find a way to mute them. This fixes that haha)

Finally I used some similar code so that changing the "show message text" would also affect the chat history.

But it looks like the new streaming method broke the audio generation, so there is still a bit of work to do.

- Need to update `shared.still_streaming = False` before the final `yield formatted_outputs`, shifted the position of some yields.

xanthousm · 2023-03-12T08:02:18Z

Everything seems to be working on my end now. If how I'm changing shared.history doesn't feel right, I'm happy to scrap those parts.

oobabooga · 2023-03-12T16:40:05Z

Looks good to me too. Thanks again for submitting this PR, this is a massive improvement to the silero extension and I really liked it.

When combined with the whisper extension, it should allow for a very immersive chat experience.

I will merge now. Credits have been added to https://github.com/oobabooga/text-generation-webui/wiki/Extensions.

Add text generation stream status to shared module, use for better TTS with auto-play

bubbabug · 2023-04-22T20:12:57Z

I am attempting to port this function to elevenlabs, but I have come against a similar issue as described above. When there are cached messages the bot will not generate new audio and only plays old messages. If I delete the cache, then the bot only generates audio sporadically. I am not a dev, I'm just a guy trying to get this to work, but maybe you could take a look at my code and recommend a fix?

script.txt

xanthousm · 2023-04-23T04:21:13Z

Were you trying to port the stream status or the autoplay?

There were some changes to the sileroTTS extension after this pull that removed the "stream status" variable, since it wasn't using the output_modifier function as intended. Instead it now sets shared.args.no_stream = True in the input_modifier function temporarily to work when streaming is enabled.

If you're talking about the autoplay feature and how it sometimes plays old messages after regeneration, the fix was to make every audio file's name unique using a timestamp. That way the ui is forced to not use the cache. To do this we import time at the start and set the audio file name using a string like output_file = Path(f'extensions/silero_tts/outputs/{shared.character}_{int(time.time())}.wav'). In your file it looks like you'd need to change the file path names on lines 123 and 126.

Great work with the extension, good luck with the fix!

bubbabug · 2023-04-23T20:06:59Z

I have made those fixes and gotten everything sorted to the best of my abilities, but now upon generation of text, it fails to generate audio and terminal shows the following error. I know this isn't your problem and I'm sorry if I'm bothering you, but I feel like I'm one step away from getting this to work.

`Output generated in 5.74 seconds (1.92 tokens/s, 11 tokens, context 30, seed 1352445741)
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\responses.py", line 335, in call
stat_result = await anyio.to_thread.run_sync(os.stat, self.path)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'D:\oobabooga_windows\text-generation-webui\extensions\elevenlabs_tts\outputs\None_1682279953.wav'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 429, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\fastapi\applications.py", line 276, in call
await super().call(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\errors.py", line 184, in call
raise exc
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\errors.py", line 162, in call
await self.app(scope, receive, _send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\cors.py", line 84, in call
await self.app(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\exceptions.py", line 79, in call
raise exc
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in call
raise e
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\routing.py", line 718, in call
await route.handle(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\routing.py", line 276, in handle
await self.app(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\routing.py", line 69, in app
await response(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\responses.py", line 338, in call
raise RuntimeError(f"File at path {self.path} does not exist.")
RuntimeError: File at path D:\oobabooga_windows\text-generation-webui\extensions\elevenlabs_tts\outputs\None_1682279953.wav does not exist.`

script.txt

xanthousm · 2023-04-23T22:59:24Z

No worries, the code is easy enough to read and it really does look like one step away haha. I haven't done any testing myself, but it looks like you just need to replace line 128 from save_bytes_to_path(Path((f'extensions/elevenlabs_tts/{shared.character}_{int(time.time())}.wav')), audio_data) to save_bytes_to_path(output_file, audio_data) so that the save function actually uses the output_file variable that the string variable on line 131 is looking for.

If that doesn't fix things then the issue might have to do with how long elevenlabs takes to generate the audio, which I would have no idea on how to handle.

bubbabug · 2023-04-23T23:13:55Z

THANK YOU SO MUCH! This works perfectly! All I need to do now is clone Scarlett Johansson's voice and I've got a full "Her" situation going.

xanthousm added 3 commits March 8, 2023 22:02

Merge branch 'main' of https://github.com/xanthousm/text-generation-w…

5648a41

…ebui

Fix merge errors and unlimited wav bug

738be6d

xanthousm and others added 3 commits March 11, 2023 17:05

Undo changes to requirements

b8f7d34

needing to manually install tensorboard might be a windows-only problem. Can be easily solved manually.

Merge remote-tracking branch 'upstream/main'

33df4bd

Minor style changes to silero_tts

8f8da67

Fix merge conflict in text_generation

b3e10e4

- Need to update `shared.still_streaming = False` before the final `yield formatted_outputs`, shifted the position of some yields.

xanthousm and others added 2 commits March 12, 2023 19:06

clean up

9276af3

Reorder the imports

4066ab4

oobabooga merged commit d8bea76 into oobabooga:main Mar 12, 2023

RJSprod mentioned this pull request Mar 20, 2023

SileroTTS creates new audio file for each token #461

Closed

1 task

Ph0rk0z pushed a commit to Ph0rk0z/text-generation-webui-testing that referenced this pull request Apr 17, 2023

Merge pull request oobabooga#192 from xanthousm/main

7a60815

Add text generation stream status to shared module, use for better TTS with auto-play

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text generation stream status to shared module, use for better TTS with auto-play #192

Add text generation stream status to shared module, use for better TTS with auto-play #192

xanthousm commented Mar 8, 2023

CypherNaught-0x commented Mar 8, 2023

xanthousm commented Mar 8, 2023

oobabooga commented Mar 9, 2023

xanthousm commented Mar 11, 2023

oobabooga commented Mar 11, 2023

xanthousm commented Mar 12, 2023 •

edited

Loading

xanthousm commented Mar 12, 2023

oobabooga commented Mar 12, 2023

bubbabug commented Apr 22, 2023

xanthousm commented Apr 23, 2023 •

edited

Loading

bubbabug commented Apr 23, 2023 •

edited

Loading

xanthousm commented Apr 23, 2023

bubbabug commented Apr 23, 2023

Add text generation stream status to shared module, use for better TTS with auto-play #192

Add text generation stream status to shared module, use for better TTS with auto-play #192

Conversation

xanthousm commented Mar 8, 2023

CypherNaught-0x commented Mar 8, 2023

xanthousm commented Mar 8, 2023

oobabooga commented Mar 9, 2023

xanthousm commented Mar 11, 2023

oobabooga commented Mar 11, 2023

xanthousm commented Mar 12, 2023 • edited Loading

xanthousm commented Mar 12, 2023

oobabooga commented Mar 12, 2023

bubbabug commented Apr 22, 2023

xanthousm commented Apr 23, 2023 • edited Loading

bubbabug commented Apr 23, 2023 • edited Loading

xanthousm commented Apr 23, 2023

bubbabug commented Apr 23, 2023

xanthousm commented Mar 12, 2023 •

edited

Loading

xanthousm commented Apr 23, 2023 •

edited

Loading

bubbabug commented Apr 23, 2023 •

edited

Loading