Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add tts to rai core #419

Open
wants to merge 30 commits into
base: development
Choose a base branch
from
Open

feat: add tts to rai core #419

wants to merge 30 commits into from

Conversation

rachwalk
Copy link
Contributor

@rachwalk rachwalk commented Feb 12, 2025

Purpose

With the new design based on agents and connectors there was need to refactor rai tts, to reflect these changes.

Proposed Changes

Adds:

  • TextToSpeech models
  • TextToSpeech agent
  • Agent runners

Issues

#399
#309

Testing

CI

Manual tests performed using soundevices:

  • "default" (ALSA - system defautl)
  • "Jabra Speak2 40 MS: USB Audio" (conference speaker/microphone)
  • "Sennheiser USB headset: Audio" (headset)

To run and test this PR run:

docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak

and then in 3 separate terminal windows run:
python examples/s2s/asr.py
python examples/s2s/tts.py
python examples/s2s/conversational.py

If you want to specify a particular device (example):
python ./examples/s2s/asr.py --device-name "Jabra Speak2 40 MS: USB Audio (hw:2,0)"
python examples/s2s/tts.py --device-name "Jabra Speak2 40 MS: USB Audio (hw:2,0)"
python examples/s2s/conversational.py

Also note: Default values of agents configurations should work ok, but it's not guaranteed. Particular sound data profiles can have a significant effect on the accuracy of voice activity and wake word detections, so for best results it is reccomended to experiment with the values, until desired accuracy is achieved. A very common option to change is VAD_DETECTION_THRESHOLD in examples/s2s/asr.py, as the VAD performance with given device can differ a lot.

@rachwalk rachwalk requested a review from maciejmajek February 12, 2025 11:00
consumer_sampling_rate=vad.sampling_rate,
is_input=True,
)
asr_agent = VoiceRecognitionAgent(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it more explicit so that the user can guess that ros2 is enabled. Please use keyword arguments e.g. ros2_name="automatic_speech_recognition"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep in mind that the default run method of the s2s should not use ROS2 to communicate.

@MagdalenaKotynia

This comment was marked as outdated.

@MagdalenaKotynia

This comment was marked as outdated.

@rachwalk rachwalk changed the title Add TTS to RAI Core feat: add tts to rai core Feb 12, 2025
@maciejmajek
Copy link
Member

maciejmajek commented Feb 26, 2025

Unfortunately it's still lagging on my setup. The tts is lagging (mic on/mic off). On the first tts launch a loud buzzing could be heard.
As we have found out, the oww is problematic in this implementation so I've turned it off.
asr.py line 103 # agent.add_detection_model(vad, pipeline="record")
voice_agent.py line 195 should_record = voice_detected# self._should_record(indata, output_parameters)

Even though the TTS lags, ASR and response stopping works very well in the oww commented out scenario. Well done

Copy link
Member

@maciejmajek maciejmajek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done,
lots of random comments, please clean up the code.
Did you test FasterWhisper and OpenAIWhisper?
Please add ElevenLabsTTS, you can use my commit 33e2a45

```bash
python -c 'import sounddevice as sd; print(sd.query_devices())'
The Agent utilises sounddevice module to access user's microphone, by default the `"default"` sound device is used.
To get information about available sounddeives use:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

```

keep_speaker_busy: some speakers may go into low power mode, which may result in truncated speech beginnings. Set to true to play low frequency, low volume noise to prevent sleep mode.
The device can be identifed by name and passed to the configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part of the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? All of it


## TextToSpeechAgent

See `examples/s2s/wtts.py` for an example usage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 3579e8e

Comment on lines 62 to 64
$ python ./examples/s2s/asr.py
$ python examples/s2s/tts.py
$ python examples/s2s/conversational.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$ python ./examples/s2s/asr.py
$ python examples/s2s/tts.py
$ python examples/s2s/conversational.py
$ python examples/s2s/asr.py
$ python examples/s2s/tts.py
$ python examples/s2s/conversational.py
Suggested change
$ python ./examples/s2s/asr.py
$ python examples/s2s/tts.py
$ python examples/s2s/conversational.py
$ python ./examples/s2s/asr.py
$ python ./examples/s2s/tts.py
$ python ./examples/s2s/conversational.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 3579e8e

vad = SileroVAD(args.vad_sampling_rate, args.vad_threshold)
oww = OpenWakeWord("hey jarvis", args.oww_threshold)
whisper = LocalWhisper("tiny", args.vad_sampling_rate)
# whisper = OpenAIWhisper("whisper-1", args.vad_sampling_rate, "en")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style
Did you test OpenAIWhisper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added info in 6123d5c

ros2_name = "rai_asr_agent"

agent = VoiceRecognitionAgent(microphone_configuration, ros2_name, whisper, vad)
# agent.add_detection_model(oww, pipeline="record")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more information why is this commented out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added info in 6123d5c

Comment on lines 203 to +216
_targets = [
target
if isinstance(target, tuple)
else (target, TopicConfig(is_subscriber=False))
(
target
if isinstance(target, tuple)
else (target, TopicConfig(is_subscriber=False))
)
for target in targets
]
_sources = [
source
if isinstance(source, tuple)
else (source, TopicConfig(is_subscriber=True))
(
source
if isinstance(source, tuple)
else (source, TopicConfig(is_subscriber=True))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what's going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pyright prefers this for some reason to simple multiline configuration. It's ruff compatible, and python valid syntax. Do you want this changed?

Comment on lines 235 to 237
# def run(self):
# self._executor.spin()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in e69437e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants