feat: add tts to rai core #419

rachwalk · 2025-02-12T10:59:54Z

Purpose

With the new design based on agents and connectors there was need to refactor rai tts, to reflect these changes.

Proposed Changes

Adds:

TextToSpeech models
TextToSpeech agent
Agent runners

Issues

#399
#309

Testing

CI

Manual tests performed using soundevices:

"default" (ALSA - system defautl)
"Jabra Speak2 40 MS: USB Audio" (conference speaker/microphone)
"Sennheiser USB headset: Audio" (headset)

To run and test this PR run:

docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak

and then in 3 separate terminal windows run:
python examples/s2s/asr.py
python examples/s2s/tts.py
python examples/s2s/conversational.py

If you want to specify a particular device (example):
python ./examples/s2s/asr.py --device-name "Jabra Speak2 40 MS: USB Audio (hw:2,0)"
python examples/s2s/tts.py --device-name "Jabra Speak2 40 MS: USB Audio (hw:2,0)"
python examples/s2s/conversational.py

Also note: Default values of agents configurations should work ok, but it's not guaranteed. Particular sound data profiles can have a significant effect on the accuracy of voice activity and wake word detections, so for best results it is reccomended to experiment with the values, until desired accuracy is achieved. A very common option to change is VAD_DETECTION_THRESHOLD in examples/s2s/asr.py, as the VAD performance with given device can differ a lot.

src/rai_core/rai/runners/base.py

boczekbartek · 2025-02-12T12:42:15Z

src/rai_core/rai/runners/s2s.py

+            consumer_sampling_rate=vad.sampling_rate,
+            is_input=True,
+        )
+        asr_agent = VoiceRecognitionAgent(


I would make it more explicit so that the user can guess that ros2 is enabled. Please use keyword arguments e.g. ros2_name="automatic_speech_recognition"

We need to keep in mind that the default run method of the s2s should not use ROS2 to communicate.

… format

maciejmajek · 2025-02-26T20:07:34Z

Unfortunately it's still lagging on my setup. The tts is lagging (mic on/mic off). On the first tts launch a loud buzzing could be heard.
As we have found out, the oww is problematic in this implementation so I've turned it off.
asr.py line 103 # agent.add_detection_model(vad, pipeline="record")
voice_agent.py line 195 should_record = voice_detected# self._should_record(indata, output_parameters)

Even though the TTS lags, ASR and response stopping works very well in the oww commented out scenario. Well done

This reverts commit cb60dbc.

maciejmajek

Well done,
lots of random comments, please clean up the code.
Did you test FasterWhisper and OpenAIWhisper?
Please add ElevenLabsTTS, you can use my commit 33e2a45

maciejmajek · 2025-02-28T10:47:13Z

docs/human_robot_interface/voice_interface.md

-```bash
-python -c 'import sounddevice as sd; print(sd.query_devices())'
+The Agent utilises sounddevice module to access user's microphone, by default the `"default"` sound device is used.
+To get information about available sounddeives use:


maciejmajek · 2025-02-28T10:47:32Z

docs/human_robot_interface/voice_interface.md

 ```

-keep_speaker_busy: some speakers may go into low power mode, which may result in truncated speech beginnings. Set to true to play low frequency, low volume noise to prevent sleep mode.
+The device can be identifed by name and passed to the configuration.


Which part of the name?

What do you mean? All of it

maciejmajek · 2025-02-28T10:48:02Z

docs/human_robot_interface/voice_interface.md

+
+## TextToSpeechAgent
+
+See `examples/s2s/wtts.py` for an example usage.


Applied in 3579e8e

maciejmajek · 2025-02-28T10:49:50Z

docs/human_robot_interface/voice_interface.md

+$ python ./examples/s2s/asr.py
+$ python examples/s2s/tts.py
+$ python examples/s2s/conversational.py


Suggested change

$ python ./examples/s2s/asr.py

$ python examples/s2s/tts.py

$ python examples/s2s/conversational.py

$ python examples/s2s/asr.py

$ python examples/s2s/tts.py

$ python examples/s2s/conversational.py

Suggested change

$ python ./examples/s2s/asr.py

$ python examples/s2s/tts.py

$ python examples/s2s/conversational.py

$ python ./examples/s2s/asr.py

$ python ./examples/s2s/tts.py

$ python ./examples/s2s/conversational.py

Applied in 3579e8e

maciejmajek · 2025-02-28T10:50:46Z

examples/s2s/asr.py

+    vad = SileroVAD(args.vad_sampling_rate, args.vad_threshold)
+    oww = OpenWakeWord("hey jarvis", args.oww_threshold)
+    whisper = LocalWhisper("tiny", args.vad_sampling_rate)
+    # whisper = OpenAIWhisper("whisper-1", args.vad_sampling_rate, "en")


style
Did you test OpenAIWhisper?

Added info in 6123d5c

maciejmajek · 2025-02-28T10:51:13Z

examples/s2s/asr.py

+    ros2_name = "rai_asr_agent"
+
+    agent = VoiceRecognitionAgent(microphone_configuration, ros2_name, whisper, vad)
+    # agent.add_detection_model(oww, pipeline="record")


Please add more information why is this commented out

Added info in 6123d5c

maciejmajek · 2025-02-28T10:56:07Z

src/rai_core/rai/communication/ros2/connectors.py

        _targets = [
-            target
-            if isinstance(target, tuple)
-            else (target, TopicConfig(is_subscriber=False))
+            (
+                target
+                if isinstance(target, tuple)
+                else (target, TopicConfig(is_subscriber=False))
+            )
            for target in targets
        ]
        _sources = [
-            source
-            if isinstance(source, tuple)
-            else (source, TopicConfig(is_subscriber=True))
+            (
+                source
+                if isinstance(source, tuple)
+                else (source, TopicConfig(is_subscriber=True))
+            )


Just curious, what's going on here?

Pyright prefers this for some reason to simple multiline configuration. It's ruff compatible, and python valid syntax. Do you want this changed?

maciejmajek · 2025-02-28T10:56:13Z

src/rai_core/rai/communication/ros2/connectors.py

+    # def run(self):
+    #     self._executor.spin()
+


Applied in e69437e

rachwalk requested a review from maciejmajek February 12, 2025 11:00

boczekbartek reviewed Feb 12, 2025

View reviewed changes

src/rai_core/rai/runners/base.py Outdated Show resolved Hide resolved

boczekbartek reviewed Feb 12, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

rachwalk changed the title ~~Add TTS to RAI Core~~ feat: add tts to rai core Feb 12, 2025

rachwalk requested a review from boczekbartek February 12, 2025 17:00

rachwalk added 17 commits February 26, 2025 16:16

feat: add base impl of tts agent and start moving tts models into new…

03a0e6f

… format

feat: change connector api to support AudioSegment

ef39181

feat: working TTS, with pausing

983d148

feat: working S2S

87adc65

feat: add agent runner

f2eef35

chore: add runner to __init__

1591a59

fix: working demo after rebase

b018bb5

feat: add runners to create configurable, multi-agent deployments

8644b5b

diocs: add docstrings for affected classes

64b55b8

chore: rename runner main method to run

81a252a

fix: tts agent support HRI msg

b82f7e4

fix: s2s migrate to HRIMessage

aacf6c7

fix: end to end working runner with HRI

d41053f

test: update tests to support AudioSegment api

237a279

feat: working multiterminal version

ca88199

feat: working singleterminal setup

5905697

feat: remove runner

18957ba

rachwalk force-pushed the refactor/rai_tts branch from 3c2d44e to 18957ba Compare February 26, 2025 15:16

rachwalk added 3 commits February 26, 2025 16:25

chore: remove trash file

7e7aa40

fix: race condition on cancelling speech task

6e02636

fix: race condition on single transcribe queue

92396a8

fix: send voice commands only on changes

cb60dbc

maciejmajek and others added 4 commits February 27, 2025 14:36

Revert "fix: send voice commands only on changes"

d4340d5

This reverts commit cb60dbc.

fix: minimise ros2 traffic

161fe5b

docs: add S2S docs

e2d08d4

fix: minimise ros2 traffic -- add missing if

7d2e24d

maciejmajek requested changes Feb 28, 2025

View reviewed changes

rachwalk and others added 5 commits February 28, 2025 12:25

fix: conversational example use history

3d5a1ea

docs: fix typos

3579e8e

chore: add comments on example

6123d5c

chore: remove useless comment

e69437e

feat: add ElevenLabsTTS

582b944

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add tts to rai core #419

feat: add tts to rai core #419

rachwalk commented Feb 12, 2025 •

edited

Loading

boczekbartek Feb 12, 2025

maciejmajek Feb 13, 2025

This comment was marked as outdated.

This comment was marked as outdated.

maciejmajek commented Feb 26, 2025 •

edited

Loading

maciejmajek left a comment

maciejmajek Feb 28, 2025

maciejmajek Feb 28, 2025

rachwalk Feb 28, 2025

maciejmajek Feb 28, 2025

rachwalk Feb 28, 2025

maciejmajek Feb 28, 2025

rachwalk Feb 28, 2025

maciejmajek Feb 28, 2025

rachwalk Feb 28, 2025

maciejmajek Feb 28, 2025

rachwalk Feb 28, 2025

maciejmajek Feb 28, 2025

rachwalk Feb 28, 2025

maciejmajek Feb 28, 2025

rachwalk Feb 28, 2025


		## TextToSpeechAgent

		See `examples/s2s/wtts.py` for an example usage.

feat: add tts to rai core #419

Are you sure you want to change the base?

feat: add tts to rai core #419

Conversation

rachwalk commented Feb 12, 2025 • edited Loading

Purpose

Proposed Changes

Issues

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as outdated.

This comment was marked as outdated.

maciejmajek commented Feb 26, 2025 • edited Loading

maciejmajek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rachwalk commented Feb 12, 2025 •

edited

Loading

maciejmajek commented Feb 26, 2025 •

edited

Loading