Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text generation stream status to shared module, use for better TTS with auto-play #192

Merged
merged 12 commits into from
Mar 12, 2023

Conversation

xanthousm
Copy link
Contributor

Hey oobabooga, thanks for this webui! I added a simple way for extensions to know when text generation is finished, so that I could auto-play the TTS audio. I also made some other quality-of-life changes to the TTS extension, see the commit description for the details.

Feel free to do whatever you like with this pull request.

- Adds "still_streaming" to shared module for extensions to know if generation is complete
- Changed TTS extension with new options:
   - Show text under the audio widget
   - Automatically play the audio once text generation finishes
   - manage the generated wav files (only keep files for finished generations, optional max file limit)
   - [wip] ability to change voice pitch and speed
- added 'tensorboard' to requirements, since python sent "tensorboard not found" errors after a fresh installation.
@CypherNaught-0x
Copy link
Contributor

You beat me to it :) Though your implementation has a lot more functionality. Seeing as this should be able to run as a server I don't think simpleaudio is the best solution as it would only produce the sounds on the host computer. May I suggest you or oobabooga have a peek at my PR. I used the native autoplay functionality of the audio block.
Perhaps a merge of our solutions would be best?

@xanthousm
Copy link
Contributor Author

Using the audio block autoplay is definitely the better method, thanks for pointing it out! And I might as well disable audio generation during the stream like you too. I'll test out adding some of your changes and report back.

…block autoplay

- Keeping simpleaudio until audio block "autoplay" doesn't play previous messages
- Only generate audio for finished messages
- Better name for autoplay, clean up comments
- set default to unlimited wav files. Still a few bugs when wav id resets

Co-Authored-By: Christoph Hess <9931495+ChristophHess@users.noreply.github.com>
@oobabooga
Copy link
Owner

Indeed, simpleaudio seems a bit troublesome. Trying to install it with pip on Linux caused

      c_src/simpleaudio_alsa.c:8:10: fatal error: alsa/asoundlib.h: No such file or directory
          8 | #include <alsa/asoundlib.h>

- New autoplay using html tag, removed from old message when new input provided
- Add voice pitch and speed control
- Group settings together
- Use name + conversation history to match wavs to messages, minimize problems when changing characters

Current minor bugs:
- Gradio seems to cache the audio files, so using "clear history" and generating new messages will play the old audio (the new messages are saving correctly). Gradio will clear cache and use correct audio after a few messages or after a page refresh.
- Switching characters does not immediately update the message ID used for the audio. ID is updated after the first new message, but that message will use the wrong ID
@xanthousm
Copy link
Contributor Author

All right, things should be ready. I've removed simpleaudio and am using the html tags like Christoph. To handle the old messages, i remove the autoplay tag from the previous message in input_modifier by accessing the internal and visible shared.history. I also finished the pitch+speed control and cleaned up the settings for the extension.

I did a bunch of testing (on windows), and found two minor bugs affecting the audio history which I've changed to handle switching between characters, but it looks like the source of the bugs are outside of the extension. I'm pretty sure these bugs are also present in the current version of silero_tts on the main branch.

Anyway, let me know if there are any issues on linux.

xanthousm and others added 3 commits March 11, 2023 17:05
needing to manually install tensorboard might be a windows-only problem. Can be easily solved manually.
@oobabooga
Copy link
Owner

Two questions:

  1. About the bug that you mentioned while switching questions, I could not understand/reproduce it. Can you explain in more details?
  2. When I click on Regenerate, it seems like the audio is not updated, at least if I had already played part of the previous audio.

This is looking very good now!

- Change wav naming to be completely unique using timestamp instead of message ID, stops browser using cached audio when new audio is made with the same file name (eg after regenerate or clear history).
- Make the autoplay setting actually disable autoplay.
- Make Settings panel a bit more compact.
- Hide html errors when audio file of chat history is missing.
- Add button to permanently convert TTS history to normal text messages
- Changed the "show message text" toggle to affect the chat history.
@xanthousm
Copy link
Contributor Author

xanthousm commented Mar 12, 2023

  1. The bug had to do with using a message_id=len(shared.history['visible']), where the value of message_id was not updating if shared.history was changed from choosing another character or clearing history.
  2. The other bug where regenerated messages were using old audio was a problem with the browser caching the old audio. Both of those problems are now fixed by using timestamps instead of message_id.

I also added an option to remove all the audio blocks from chat history (the errors in the terminal from deleted audio files were annoying, but I couldn't find a way to mute them. This fixes that haha)

Finally I used some similar code so that changing the "show message text" would also affect the chat history.

But it looks like the new streaming method broke the audio generation, so there is still a bit of work to do.

- Need to update `shared.still_streaming = False` before the final `yield formatted_outputs`, shifted the position of some yields.
@xanthousm
Copy link
Contributor Author

Everything seems to be working on my end now. If how I'm changing shared.history doesn't feel right, I'm happy to scrap those parts.

@oobabooga
Copy link
Owner

Looks good to me too. Thanks again for submitting this PR, this is a massive improvement to the silero extension and I really liked it.

When combined with the whisper extension, it should allow for a very immersive chat experience.

I will merge now. Credits have been added to https://github.com/oobabooga/text-generation-webui/wiki/Extensions.

@oobabooga oobabooga merged commit d8bea76 into oobabooga:main Mar 12, 2023
Ph0rk0z pushed a commit to Ph0rk0z/text-generation-webui-testing that referenced this pull request Apr 17, 2023
Add text generation stream status to shared module, use for better TTS with auto-play
@bubbabug
Copy link

I am attempting to port this function to elevenlabs, but I have come against a similar issue as described above. When there are cached messages the bot will not generate new audio and only plays old messages. If I delete the cache, then the bot only generates audio sporadically. I am not a dev, I'm just a guy trying to get this to work, but maybe you could take a look at my code and recommend a fix?

script.txt

@xanthousm
Copy link
Contributor Author

xanthousm commented Apr 23, 2023

Were you trying to port the stream status or the autoplay?

There were some changes to the sileroTTS extension after this pull that removed the "stream status" variable, since it wasn't using the output_modifier function as intended. Instead it now sets shared.args.no_stream = True in the input_modifier function temporarily to work when streaming is enabled.

If you're talking about the autoplay feature and how it sometimes plays old messages after regeneration, the fix was to make every audio file's name unique using a timestamp. That way the ui is forced to not use the cache. To do this we import time at the start and set the audio file name using a string like output_file = Path(f'extensions/silero_tts/outputs/{shared.character}_{int(time.time())}.wav'). In your file it looks like you'd need to change the file path names on lines 123 and 126.

Great work with the extension, good luck with the fix!

@bubbabug
Copy link

bubbabug commented Apr 23, 2023

I have made those fixes and gotten everything sorted to the best of my abilities, but now upon generation of text, it fails to generate audio and terminal shows the following error. I know this isn't your problem and I'm sorry if I'm bothering you, but I feel like I'm one step away from getting this to work.

`Output generated in 5.74 seconds (1.92 tokens/s, 11 tokens, context 30, seed 1352445741)
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\responses.py", line 335, in call
stat_result = await anyio.to_thread.run_sync(os.stat, self.path)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'D:\oobabooga_windows\text-generation-webui\extensions\elevenlabs_tts\outputs\None_1682279953.wav'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 429, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\fastapi\applications.py", line 276, in call
await super().call(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\errors.py", line 184, in call
raise exc
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\errors.py", line 162, in call
await self.app(scope, receive, _send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\cors.py", line 84, in call
await self.app(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\exceptions.py", line 79, in call
raise exc
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in call
raise e
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\routing.py", line 718, in call
await route.handle(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\routing.py", line 276, in handle
await self.app(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\routing.py", line 69, in app
await response(scope, receive, send)
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\responses.py", line 338, in call
raise RuntimeError(f"File at path {self.path} does not exist.")
RuntimeError: File at path D:\oobabooga_windows\text-generation-webui\extensions\elevenlabs_tts\outputs\None_1682279953.wav does not exist.`

script.txt

@xanthousm
Copy link
Contributor Author

No worries, the code is easy enough to read and it really does look like one step away haha. I haven't done any testing myself, but it looks like you just need to replace line 128 from save_bytes_to_path(Path((f'extensions/elevenlabs_tts/{shared.character}_{int(time.time())}.wav')), audio_data) to save_bytes_to_path(output_file, audio_data) so that the save function actually uses the output_file variable that the string variable on line 131 is looking for.

If that doesn't fix things then the issue might have to do with how long elevenlabs takes to generate the audio, which I would have no idea on how to handle.

@bubbabug
Copy link

THANK YOU SO MUCH! This works perfectly! All I need to do now is clone Scarlett Johansson's voice and I've got a full "Her" situation going.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants