Simplify bot structures (#28)

intentional-ai · Nov 29, 2024 · 76d36fd · 76d36fd
1 parent 21f6578
commit 76d36fd
Show file tree

Hide file tree

Showing 37 changed files with 553 additions and 517 deletions.
diff --git a/answer.wav b/answer.wav
diff --git a/docs/config-file.md b/docs/config-file.md
@@ -6,13 +6,13 @@ The configuration file is the core of Intentional bots. They are YAML files that
 Here is an example of a conversation file. Don't feel overwhelmed just yet! Each part will be explained separately.
 
 ```yaml
+interface: textual_ui
+modality: text_messages
 bot:
-  type: text_chat
+  type: direct_to_llm
   llm:
     client: openai
     name: gpt-4o
-interface: textual_ui
-modality: text_messages
 
 plugins:
 - intentional_textual_ui
@@ -97,61 +97,58 @@ conversation:
 ### Bot configuration
 
 ```yaml
+interface: textual_ui
+modality: text_messages
 bot:
   type: text_chat
   llm:
     client: openai
     name: gpt-4o
-interface: textual_ui
-modality: text_messages
 ```
 
-Intentional supports several styles of bots, so the configuration file must first of all specify what sort of bot we're building. The `bot` section takes care of this definition.
+Intentional supports several styles of bots, so the configuration file must first of all specify what sort of bot we're building. The `bot` section and a few other related fields take care of this definition.
 
-#### Bot type
+#### Interface
 
-First, we need to specify the `type`. Right now Intentional supports a few types of bots:
+`interface` makes you configure the user interface the bot will use to communicate. If you want the bot to show its replies in the commend line, use `interface: terminal`. Do you prefer to use a chat application? Intentional can spin up a Telegram bot for you if you specify `interface: telegram`. Need a FastAPI endpoint? `interface: fastapi`. And so on.
 
-- `text chat`: the bot and the user take turns exchanging text messages, as in a regular chat application. To each single message of the user the bot will respond with one or more messages.
+!!! note
 
-- `audio/text`: the bot and the user each communicate by audio. They may either take turns explicitly (such as in a chat conversation where both parties exchange audio messages) or they may both talk continuously and be able to interrupt each other. The audio messages are converted to text and vice-versa, to make text-only LLMs able to be used for voice conversations.
+    Interfaces are always provided by a plugin: `intentional` will install the `intentional-terminal` plugin to help you get started, but `intentional-core` comes with no interfaces by default. Make sure to install the plugins you need for your interface to work.
 
-- `websocket`: the bot and the user each communicate by publishing audio messages on a websocket continuously, without taking turns. There is no transcription to text happening within Intentional. This modality mirrors how OpenAI's Realtime API works.
+    You can find a list of available plugins in the API Reference sidebar. Better documentation of the available plugins is coming soon.
 
-!!! note
+#### Modality
 
-    More documentation on the `type` field coming soon!
+Last, let's specify the `modality`. The modality is the medium the bot uses to communicate with the user, such as text messages, audio messages, audio stream, even video stream (not supported yet).
 
-#### LLM
+Some bot interfaces support more than one modality, so we need to specify what our bot is supposed to use as its primary modality.
 
-Next, we need to specify what LLM we want to use. The `llm` field takes two parameters:
+Right now, most bots support either one of these modalities:
 
-- `client`: which client to use to connect to the model. For example, `openai` (provided by the `intentional-openai` plugin, see below) will tell Intentional to use the OpenAI SDK to connect to the model.
+- `text_messages`: classic chat-style messages.
+- `audio_stream`: telephone-like interaction where bot and user freely talk together.
 
-- `name`: the name of the model (if required by the specified client). In this case, we specify `gpt-4o`.
 
-If the client you specified requires any other parameters, they can be listed in this section.
+#### Bot type
 
-#### Interface
+First, we need to specify the `type`, which defines the implementation style of the bot, any intermediate steps that need to be done to make the user's input understandable to the LLM. Right now Intentional supports a few types of bots:
 
-`interface` makes you configure the user interface the bot will use to communicate. If you want the bot to show its replies in the commend line, use `interface: terminal`. Do you prefer to use a chat application? Intentional can spin up a Telegram bot for you if you specify `interface: telegram`. Need a FastAPI endpoint? `interface: fastapi`. And so on.
+- `direct_to_llm`: the LLM is able to handle directly the messages of the user. For example, if the user is using text, the LLM is able to read the messages as they are. If the user is talking, the LLM is able to understand their voice without transcription.
 
 !!! note
 
-    Interfaces are always provided by a plugin: Intentional comes with no interfaces by default. Make sure to install the plugins you need.
-
-    You can find a list of available plugins in the API Reference sidebar. Better documentation of the available plugins is coming soon.
+    More documentation on the `type` field coming soon!
 
-#### Modality
+#### LLM
 
-Last, let's specify the `modality`. The modality is the medium the bot uses to communicate with the user, such as text messages, audio messages, audio stream, even video stream (not supported yet).
+Next, we need to specify what LLM we want to use. The `llm` field takes two parameters:
 
-Some bot interfaces support more than one modality, so we need to specify what our bot is supposed to use as its primary modality.
+- `client`: which client to use to connect to the LLM. For example, `openai` (provided by the `intentional-openai` plugin, see below) will tell Intentional to use the OpenAI SDK to connect to the LLM.
 
-Right now, most bots support either one of these modalities:
+- `name`: the name of the LLM (if required by the specified client). In this case, we specify `gpt-4o`.
 
-- `text_messages`: classic chat-style messages.
-- `audio_stream`: telephone-like interaction where bot and user freely talk together.
+If the client you specified requires any other parameters, they can be listed in this section.
 
 ### Plugins
 

diff --git a/docs/home.md b/docs/home.md
@@ -76,7 +76,7 @@ if __name__ == "__main__":
     main()
 ```
 
-There are other methods to load only parts of your Intentional bot, such as skipping the model interface entirely if you want to interact with it using the Python API. To find out which methods you can use, have a look at the [API Reference](/docs/core-reference.md).
+There are other methods to load only parts of your Intentional bot, such as skipping the bot interface entirely if you want to interact with it using the Python API. To find out which methods you can use, have a look at the [API Reference](/docs/core-reference.md).
 
 ## What next?
 

diff --git a/docs/plugins.md b/docs/plugins.md
@@ -4,7 +4,7 @@
 
     Work in progress
 
-Every layer of an Intentional bot that is configurable from the configuration file (interfaces, bot structures, model client, tools etc) can be expanded through a plugin.
+Every layer of an Intentional bot that is configurable from the configuration file (interfaces, bot structures, LLM client, tools etc) can be expanded through a plugin.
 
 ## Writing your own plugins
 

diff --git a/examples/cli_audio_realtime_api.yml b/examples/cli_audio_realtime_api.yml
@@ -5,7 +5,7 @@ plugins:
 interface: terminal
 modality: audio_stream
 bot:
-  type: websocket
+  type: direct_to_llm
   llm:
     client: openai_realtime
     name: gpt-4o-realtime-preview-2024-10-01

diff --git a/examples/cli_text_chat.yml b/examples/cli_text_chat.yml
@@ -5,7 +5,7 @@ plugins:
 interface: terminal
 modality: text_messages
 bot:
-  type: text_chat
+  type: direct_to_llm
   llm:
     client: openai
     name: gpt-4o

diff --git a/examples/fastapi_realtime_api.yml b/examples/fastapi_realtime_api.yml
@@ -5,7 +5,7 @@ plugins:
 interface: fastapi
 modality: audio_stream
 bot:
-  type: websocket
+  type: direct_to_llm
   llm:
     client: openai_realtime
     name: gpt-4o-realtime-preview-2024-10-01

diff --git a/examples/fastapi_text_chat.yml b/examples/fastapi_text_chat.yml
@@ -5,7 +5,7 @@ plugins:
 interface: fastapi
 modality: text_messages
 bot:
-  type: text_chat
+  type: direct_to_llm
   llm:
     client: openai
     name: gpt-4o

diff --git a/examples/telegram_text_chat.yml b/examples/telegram_text_chat.yml
@@ -5,7 +5,7 @@ plugins:
 interface: telegram
 modality: text_messages
 bot:
-  type: text_chat
+  type: direct_to_llm
   llm:
     client: openai
     name: gpt-4o

diff --git a/examples/textualui_realtime_api.png b/examples/textualui_realtime_api.png
diff --git a/examples/textualui_realtime_api.yml b/examples/textualui_realtime_api.yml
@@ -5,7 +5,7 @@ plugins:
 interface: textual_ui
 modality: audio_stream
 bot:
-  type: websocket
+  type: direct_to_llm
   llm:
     client: openai_realtime
     name: gpt-4o-realtime-preview-2024-10-01

diff --git a/examples/textualui_text_chat.yml b/examples/textualui_text_chat.yml
@@ -5,7 +5,7 @@ plugins:
 interface: textual_ui
 modality: text_messages
 bot:
-  type: text_chat
+  type: direct_to_llm
   llm:
     client: openai
     name: gpt-4o

diff --git a/intentional-core/src/intentional_core/__init__.py b/intentional-core/src/intentional_core/__init__.py
@@ -5,25 +5,17 @@
 """
 
 from intentional_core.events import EventEmitter, EventListener
-
-from intentional_core.bot_interface import BotInterface, load_bot_interface_from_dict, load_configuration_file
-
+from intentional_core.bot_interface import (
+    BotInterface,
+    load_bot_interface_from_dict,
+    load_configuration_file,
+)
 from intentional_core.bot_structures.bot_structure import (
     BotStructure,
-    ContinuousStreamBotStructure,
-    TurnBasedBotStructure,
     load_bot_structure_from_dict,
 )
-from intentional_core.bot_structures.text_chat import TextChatBotStructure
-from intentional_core.bot_structures.websocket import WebsocketBotStructure
-
-from intentional_core.model_client import (
-    ModelClient,
-    ContinuousStreamModelClient,
-    TurnBasedModelClient,
-    load_model_client_from_dict,
-)
-
+from intentional_core.bot_structures.direct_to_llm import DirectToLLMBotStructure
+from intentional_core.llm_client import LLMClient, load_llm_client_from_dict
 from intentional_core.tools import Tool, load_tools_from_dict
 from intentional_core.intent_routing import IntentRouter
 
@@ -34,15 +26,10 @@
     "load_bot_interface_from_dict",
     "load_configuration_file",
     "BotStructure",
-    "ContinuousStreamBotStructure",
-    "TurnBasedBotStructure",
     "load_bot_structure_from_dict",
-    "TextChatBotStructure",
-    "WebsocketBotStructure",
-    "ModelClient",
-    "ContinuousStreamModelClient",
-    "TurnBasedModelClient",
-    "load_model_client_from_dict",
+    "DirectToLLMBotStructure",
+    "LLMClient",
+    "load_llm_client_from_dict",
     "Tool",
     "IntentRouter",
     "load_tools_from_dict",

diff --git a/intentional-core/src/intentional_core/bot_interface.py b/intentional-core/src/intentional_core/bot_interface.py
@@ -82,7 +82,10 @@ def load_bot_interface_from_dict(config: Dict[str, Any]) -> BotInterface:
     Returns:
         The bot interface instance.
     """
-    log.debug("Loading bot interface from configuration:", bot_interface_config=json.dumps(config, indent=4))
+    log.debug(
+        "Loading bot interface from configuration:",
+        bot_interface_config=json.dumps(config, indent=4),
+    )
 
     # Import all the necessary plugins
     plugins = config.pop("plugins")

diff --git a/intentional-core/src/intentional_core/bot_structures/__init__.py b/intentional-core/src/intentional_core/bot_structures/__init__.py
@@ -4,21 +4,12 @@
 Bot structures supported by Intentional.
 """
 
-from intentional_core.bot_structures.bot_structure import (
-    BotStructure,
-    ContinuousStreamBotStructure,
-    TurnBasedBotStructure,
-    load_bot_structure_from_dict,
-)
-from intentional_core.bot_structures.text_chat import TextChatBotStructure
-from intentional_core.bot_structures.websocket import WebsocketBotStructure
-
+from intentional_core.bot_structures.bot_structure import BotStructure, load_bot_structure_from_dict
+from intentional_core.bot_structures.direct_to_llm import DirectToLLMBotStructure
 
 __all__ = [
     "BotStructure",
     "load_bot_structure_from_dict",
-    "ContinuousStreamBotStructure",
-    "TurnBasedBotStructure",
-    "TextChatBotStructure",
-    "WebsocketBotStructure",
+    "BotStructure",
+    "DirectToLLMBotStructure",
 ]
diff --git a/intentional-core/src/intentional_core/bot_structures/bot_structure.py b/intentional-core/src/intentional_core/bot_structures/bot_structure.py
@@ -27,7 +27,7 @@ class BotStructure(EventListener):
     Tiny base class used to recognize Intentional bot structure classes.
 
     The bot structure's name is meant to represent the **structure** of the bot. For example a bot that uses a direct
-    WebSocket connection to a model such as OpenAI's Realtime API could be called "RealtimeAPIBotStructure", one that
+    WebSocket connection to a LLM such as OpenAI's Realtime API could be called "RealtimeAPIBotStructure", one that
     uses a VAD-STT-LLM-TTS stack could be called "AudioToTextBotStructure", and so on
 
     In order for your bot structure to be usable, you need to assign a value to the `name` class variable in the bot
@@ -102,7 +102,7 @@ def add_event_handler(self, event_name: str, handler: Callable) -> None:
 
     async def handle_event(self, event_name: str, event: Dict[str, Any]) -> None:
         """
-        Handle different types of events that the model may generate.
+        Handle different types of events that the LLM may generate.
         """
         if "*" in self.event_handlers:
             log.debug("Calling wildcard event handler", event_name=event_name)
@@ -115,18 +115,6 @@ async def handle_event(self, event_name: str, event: Dict[str, Any]) -> None:
             log.debug("No event handler for event", event_name=event_name)
 
 
-class ContinuousStreamBotStructure(BotStructure):
-    """
-    Base class for structures that support continuous streaming of data, as opposed to turn-based message exchanges.
-    """
-
-
-class TurnBasedBotStructure(BotStructure):
-    """
-    Base class for structures that support turn-based message exchanges, as opposed to continuous streaming of data.
-    """
-
-
 def load_bot_structure_from_dict(intent_router: IntentRouter, config: Dict[str, Any]) -> BotStructure:
     """
     Load a bot structure from a dictionary configuration.

diff --git a/...entional_core/bot_structures/text_chat.py → ...onal_core/bot_structures/direct_to_llm.py b/...entional_core/bot_structures/text_chat.py → ...onal_core/bot_structures/direct_to_llm.py
@@ -6,20 +6,20 @@
 from typing import Any, Dict, AsyncGenerator
 
 import structlog
-from intentional_core.bot_structures.bot_structure import TurnBasedBotStructure
-from intentional_core.model_client import TurnBasedModelClient, load_model_client_from_dict
+from intentional_core.bot_structures.bot_structure import BotStructure
+from intentional_core.llm_client import LLMClient, load_llm_client_from_dict
 from intentional_core.intent_routing import IntentRouter
 
 
 log = structlog.get_logger(logger_name=__name__)
 
 
-class TextChatBotStructure(TurnBasedBotStructure):
+class DirectToLLMBotStructure(BotStructure):
     """
     Bot structure implementation for text chat.
     """
 
-    name = "text_chat"
+    name = "direct_to_llm"
 
     def __init__(self, config: Dict[str, Any], intent_router: IntentRouter):
         """
@@ -33,24 +33,26 @@ def __init__(self, config: Dict[str, Any], intent_router: IntentRouter):
         # Init the model client
         llm_config = config.pop("llm", None)
         if not llm_config:
-            raise ValueError(
-                f"{self.__class__.__name__} requires a 'llm' configuration key to know which model to use."
-            )
-        self.model: TurnBasedModelClient = load_model_client_from_dict(
-            parent=self, intent_router=intent_router, config=llm_config
-        )
+            raise ValueError(f"{self.__class__.__name__} requires a 'llm' configuration key.")
+        self.llm: LLMClient = load_llm_client_from_dict(parent=self, intent_router=intent_router, config=llm_config)
 
     async def connect(self) -> None:
-        await self.model.connect()
+        """
+        Initializes the model and connects to it as/if necessary.
+        """
+        await self.llm.connect()
 
     async def disconnect(self) -> None:
-        await self.model.disconnect()
+        """
+        Disconnects from the model and unloads/closes it as/if necessary.
+        """
+        await self.llm.disconnect()
 
     async def run(self) -> None:
         """
         Main loop for the bot.
         """
-        await self.model.run()
+        await self.llm.run()
 
     async def send(self, data: Dict[str, Any]) -> AsyncGenerator[Dict[str, Any], None]:
         """
@@ -59,7 +61,7 @@ async def send(self, data: Dict[str, Any]) -> AsyncGenerator[Dict[str, Any], Non
         Args:
             data: The message to send to the model in OpenAI format, like {"role": "user", "content": "Hello!"}
         """
-        await self.model.send({"text_message": data})
+        await self.llm.send(data)
 
     async def handle_interruption(self, lenght_to_interruption: int) -> None:
         """
@@ -70,4 +72,4 @@ async def handle_interruption(self, lenght_to_interruption: int) -> None:
                 This value could be number of characters, number of words, milliseconds, number of audio frames, etc.
                 depending on the bot structure that implements it.
         """
-        log.warning("TODO! Interruption not yet supported in text chat bot structure.")
+        await self.llm.handle_interruption(lenght_to_interruption)
-Original file line number
+Diff line change
@@ Expand Up / @@ -76,7 +76,7 @@ if __name__ == "__main__": @@
         main()
     ```
-    There are other methods to load only parts of your Intentional bot, such as skipping the model interface entirely if you want to interact with it using the Python API. To find out which methods you can use, have a look at the [API Reference](/docs/core-reference.md).
+    There are other methods to load only parts of your Intentional bot, such as skipping the bot interface entirely if you want to interact with it using the Python API. To find out which methods you can use, have a look at the [API Reference](/docs/core-reference.md).
     ## What next?
@@ Expand Down @@