Skip to content

Commit

Permalink
Simplify bot structures (#28)
Browse files Browse the repository at this point in the history
  • Loading branch information
ZanSara authored Nov 29, 2024
1 parent 21f6578 commit 76d36fd
Show file tree
Hide file tree
Showing 37 changed files with 553 additions and 517 deletions.
Binary file removed answer.wav
Binary file not shown.
55 changes: 26 additions & 29 deletions docs/config-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ The configuration file is the core of Intentional bots. They are YAML files that
Here is an example of a conversation file. Don't feel overwhelmed just yet! Each part will be explained separately.

```yaml
interface: textual_ui
modality: text_messages
bot:
type: text_chat
type: direct_to_llm
llm:
client: openai
name: gpt-4o
interface: textual_ui
modality: text_messages

plugins:
- intentional_textual_ui
Expand Down Expand Up @@ -97,61 +97,58 @@ conversation:
### Bot configuration
```yaml
interface: textual_ui
modality: text_messages
bot:
type: text_chat
llm:
client: openai
name: gpt-4o
interface: textual_ui
modality: text_messages
```
Intentional supports several styles of bots, so the configuration file must first of all specify what sort of bot we're building. The `bot` section takes care of this definition.
Intentional supports several styles of bots, so the configuration file must first of all specify what sort of bot we're building. The `bot` section and a few other related fields take care of this definition.

#### Bot type
#### Interface

First, we need to specify the `type`. Right now Intentional supports a few types of bots:
`interface` makes you configure the user interface the bot will use to communicate. If you want the bot to show its replies in the commend line, use `interface: terminal`. Do you prefer to use a chat application? Intentional can spin up a Telegram bot for you if you specify `interface: telegram`. Need a FastAPI endpoint? `interface: fastapi`. And so on.

- `text chat`: the bot and the user take turns exchanging text messages, as in a regular chat application. To each single message of the user the bot will respond with one or more messages.
!!! note

- `audio/text`: the bot and the user each communicate by audio. They may either take turns explicitly (such as in a chat conversation where both parties exchange audio messages) or they may both talk continuously and be able to interrupt each other. The audio messages are converted to text and vice-versa, to make text-only LLMs able to be used for voice conversations.
Interfaces are always provided by a plugin: `intentional` will install the `intentional-terminal` plugin to help you get started, but `intentional-core` comes with no interfaces by default. Make sure to install the plugins you need for your interface to work.

- `websocket`: the bot and the user each communicate by publishing audio messages on a websocket continuously, without taking turns. There is no transcription to text happening within Intentional. This modality mirrors how OpenAI's Realtime API works.
You can find a list of available plugins in the API Reference sidebar. Better documentation of the available plugins is coming soon.

!!! note
#### Modality

More documentation on the `type` field coming soon!
Last, let's specify the `modality`. The modality is the medium the bot uses to communicate with the user, such as text messages, audio messages, audio stream, even video stream (not supported yet).

#### LLM
Some bot interfaces support more than one modality, so we need to specify what our bot is supposed to use as its primary modality.

Next, we need to specify what LLM we want to use. The `llm` field takes two parameters:
Right now, most bots support either one of these modalities:

- `client`: which client to use to connect to the model. For example, `openai` (provided by the `intentional-openai` plugin, see below) will tell Intentional to use the OpenAI SDK to connect to the model.
- `text_messages`: classic chat-style messages.
- `audio_stream`: telephone-like interaction where bot and user freely talk together.

- `name`: the name of the model (if required by the specified client). In this case, we specify `gpt-4o`.

If the client you specified requires any other parameters, they can be listed in this section.
#### Bot type

#### Interface
First, we need to specify the `type`, which defines the implementation style of the bot, any intermediate steps that need to be done to make the user's input understandable to the LLM. Right now Intentional supports a few types of bots:

`interface` makes you configure the user interface the bot will use to communicate. If you want the bot to show its replies in the commend line, use `interface: terminal`. Do you prefer to use a chat application? Intentional can spin up a Telegram bot for you if you specify `interface: telegram`. Need a FastAPI endpoint? `interface: fastapi`. And so on.
- `direct_to_llm`: the LLM is able to handle directly the messages of the user. For example, if the user is using text, the LLM is able to read the messages as they are. If the user is talking, the LLM is able to understand their voice without transcription.

!!! note

Interfaces are always provided by a plugin: Intentional comes with no interfaces by default. Make sure to install the plugins you need.

You can find a list of available plugins in the API Reference sidebar. Better documentation of the available plugins is coming soon.
More documentation on the `type` field coming soon!

#### Modality
#### LLM

Last, let's specify the `modality`. The modality is the medium the bot uses to communicate with the user, such as text messages, audio messages, audio stream, even video stream (not supported yet).
Next, we need to specify what LLM we want to use. The `llm` field takes two parameters:

Some bot interfaces support more than one modality, so we need to specify what our bot is supposed to use as its primary modality.
- `client`: which client to use to connect to the LLM. For example, `openai` (provided by the `intentional-openai` plugin, see below) will tell Intentional to use the OpenAI SDK to connect to the LLM.

Right now, most bots support either one of these modalities:
- `name`: the name of the LLM (if required by the specified client). In this case, we specify `gpt-4o`.

- `text_messages`: classic chat-style messages.
- `audio_stream`: telephone-like interaction where bot and user freely talk together.
If the client you specified requires any other parameters, they can be listed in this section.

### Plugins

Expand Down
2 changes: 1 addition & 1 deletion docs/home.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ if __name__ == "__main__":
main()
```

There are other methods to load only parts of your Intentional bot, such as skipping the model interface entirely if you want to interact with it using the Python API. To find out which methods you can use, have a look at the [API Reference](/docs/core-reference.md).
There are other methods to load only parts of your Intentional bot, such as skipping the bot interface entirely if you want to interact with it using the Python API. To find out which methods you can use, have a look at the [API Reference](/docs/core-reference.md).

## What next?

Expand Down
2 changes: 1 addition & 1 deletion docs/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Work in progress

Every layer of an Intentional bot that is configurable from the configuration file (interfaces, bot structures, model client, tools etc) can be expanded through a plugin.
Every layer of an Intentional bot that is configurable from the configuration file (interfaces, bot structures, LLM client, tools etc) can be expanded through a plugin.

## Writing your own plugins

Expand Down
2 changes: 1 addition & 1 deletion examples/cli_audio_realtime_api.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ plugins:
interface: terminal
modality: audio_stream
bot:
type: websocket
type: direct_to_llm
llm:
client: openai_realtime
name: gpt-4o-realtime-preview-2024-10-01
Expand Down
2 changes: 1 addition & 1 deletion examples/cli_text_chat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ plugins:
interface: terminal
modality: text_messages
bot:
type: text_chat
type: direct_to_llm
llm:
client: openai
name: gpt-4o
Expand Down
2 changes: 1 addition & 1 deletion examples/fastapi_realtime_api.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ plugins:
interface: fastapi
modality: audio_stream
bot:
type: websocket
type: direct_to_llm
llm:
client: openai_realtime
name: gpt-4o-realtime-preview-2024-10-01
Expand Down
2 changes: 1 addition & 1 deletion examples/fastapi_text_chat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ plugins:
interface: fastapi
modality: text_messages
bot:
type: text_chat
type: direct_to_llm
llm:
client: openai
name: gpt-4o
Expand Down
2 changes: 1 addition & 1 deletion examples/telegram_text_chat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ plugins:
interface: telegram
modality: text_messages
bot:
type: text_chat
type: direct_to_llm
llm:
client: openai
name: gpt-4o
Expand Down
Binary file removed examples/textualui_realtime_api.png
Binary file not shown.
2 changes: 1 addition & 1 deletion examples/textualui_realtime_api.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ plugins:
interface: textual_ui
modality: audio_stream
bot:
type: websocket
type: direct_to_llm
llm:
client: openai_realtime
name: gpt-4o-realtime-preview-2024-10-01
Expand Down
2 changes: 1 addition & 1 deletion examples/textualui_text_chat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ plugins:
interface: textual_ui
modality: text_messages
bot:
type: text_chat
type: direct_to_llm
llm:
client: openai
name: gpt-4o
Expand Down
33 changes: 10 additions & 23 deletions intentional-core/src/intentional_core/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,25 +5,17 @@
"""

from intentional_core.events import EventEmitter, EventListener

from intentional_core.bot_interface import BotInterface, load_bot_interface_from_dict, load_configuration_file

from intentional_core.bot_interface import (
BotInterface,
load_bot_interface_from_dict,
load_configuration_file,
)
from intentional_core.bot_structures.bot_structure import (
BotStructure,
ContinuousStreamBotStructure,
TurnBasedBotStructure,
load_bot_structure_from_dict,
)
from intentional_core.bot_structures.text_chat import TextChatBotStructure
from intentional_core.bot_structures.websocket import WebsocketBotStructure

from intentional_core.model_client import (
ModelClient,
ContinuousStreamModelClient,
TurnBasedModelClient,
load_model_client_from_dict,
)

from intentional_core.bot_structures.direct_to_llm import DirectToLLMBotStructure
from intentional_core.llm_client import LLMClient, load_llm_client_from_dict
from intentional_core.tools import Tool, load_tools_from_dict
from intentional_core.intent_routing import IntentRouter

Expand All @@ -34,15 +26,10 @@
"load_bot_interface_from_dict",
"load_configuration_file",
"BotStructure",
"ContinuousStreamBotStructure",
"TurnBasedBotStructure",
"load_bot_structure_from_dict",
"TextChatBotStructure",
"WebsocketBotStructure",
"ModelClient",
"ContinuousStreamModelClient",
"TurnBasedModelClient",
"load_model_client_from_dict",
"DirectToLLMBotStructure",
"LLMClient",
"load_llm_client_from_dict",
"Tool",
"IntentRouter",
"load_tools_from_dict",
Expand Down
5 changes: 4 additions & 1 deletion intentional-core/src/intentional_core/bot_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,10 @@ def load_bot_interface_from_dict(config: Dict[str, Any]) -> BotInterface:
Returns:
The bot interface instance.
"""
log.debug("Loading bot interface from configuration:", bot_interface_config=json.dumps(config, indent=4))
log.debug(
"Loading bot interface from configuration:",
bot_interface_config=json.dumps(config, indent=4),
)

# Import all the necessary plugins
plugins = config.pop("plugins")
Expand Down
17 changes: 4 additions & 13 deletions intentional-core/src/intentional_core/bot_structures/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,12 @@
Bot structures supported by Intentional.
"""

from intentional_core.bot_structures.bot_structure import (
BotStructure,
ContinuousStreamBotStructure,
TurnBasedBotStructure,
load_bot_structure_from_dict,
)
from intentional_core.bot_structures.text_chat import TextChatBotStructure
from intentional_core.bot_structures.websocket import WebsocketBotStructure

from intentional_core.bot_structures.bot_structure import BotStructure, load_bot_structure_from_dict
from intentional_core.bot_structures.direct_to_llm import DirectToLLMBotStructure

__all__ = [
"BotStructure",
"load_bot_structure_from_dict",
"ContinuousStreamBotStructure",
"TurnBasedBotStructure",
"TextChatBotStructure",
"WebsocketBotStructure",
"BotStructure",
"DirectToLLMBotStructure",
]
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ class BotStructure(EventListener):
Tiny base class used to recognize Intentional bot structure classes.
The bot structure's name is meant to represent the **structure** of the bot. For example a bot that uses a direct
WebSocket connection to a model such as OpenAI's Realtime API could be called "RealtimeAPIBotStructure", one that
WebSocket connection to a LLM such as OpenAI's Realtime API could be called "RealtimeAPIBotStructure", one that
uses a VAD-STT-LLM-TTS stack could be called "AudioToTextBotStructure", and so on
In order for your bot structure to be usable, you need to assign a value to the `name` class variable in the bot
Expand Down Expand Up @@ -102,7 +102,7 @@ def add_event_handler(self, event_name: str, handler: Callable) -> None:

async def handle_event(self, event_name: str, event: Dict[str, Any]) -> None:
"""
Handle different types of events that the model may generate.
Handle different types of events that the LLM may generate.
"""
if "*" in self.event_handlers:
log.debug("Calling wildcard event handler", event_name=event_name)
Expand All @@ -115,18 +115,6 @@ async def handle_event(self, event_name: str, event: Dict[str, Any]) -> None:
log.debug("No event handler for event", event_name=event_name)


class ContinuousStreamBotStructure(BotStructure):
"""
Base class for structures that support continuous streaming of data, as opposed to turn-based message exchanges.
"""


class TurnBasedBotStructure(BotStructure):
"""
Base class for structures that support turn-based message exchanges, as opposed to continuous streaming of data.
"""


def load_bot_structure_from_dict(intent_router: IntentRouter, config: Dict[str, Any]) -> BotStructure:
"""
Load a bot structure from a dictionary configuration.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,20 @@
from typing import Any, Dict, AsyncGenerator

import structlog
from intentional_core.bot_structures.bot_structure import TurnBasedBotStructure
from intentional_core.model_client import TurnBasedModelClient, load_model_client_from_dict
from intentional_core.bot_structures.bot_structure import BotStructure
from intentional_core.llm_client import LLMClient, load_llm_client_from_dict
from intentional_core.intent_routing import IntentRouter


log = structlog.get_logger(logger_name=__name__)


class TextChatBotStructure(TurnBasedBotStructure):
class DirectToLLMBotStructure(BotStructure):
"""
Bot structure implementation for text chat.
"""

name = "text_chat"
name = "direct_to_llm"

def __init__(self, config: Dict[str, Any], intent_router: IntentRouter):
"""
Expand All @@ -33,24 +33,26 @@ def __init__(self, config: Dict[str, Any], intent_router: IntentRouter):
# Init the model client
llm_config = config.pop("llm", None)
if not llm_config:
raise ValueError(
f"{self.__class__.__name__} requires a 'llm' configuration key to know which model to use."
)
self.model: TurnBasedModelClient = load_model_client_from_dict(
parent=self, intent_router=intent_router, config=llm_config
)
raise ValueError(f"{self.__class__.__name__} requires a 'llm' configuration key.")
self.llm: LLMClient = load_llm_client_from_dict(parent=self, intent_router=intent_router, config=llm_config)

async def connect(self) -> None:
await self.model.connect()
"""
Initializes the model and connects to it as/if necessary.
"""
await self.llm.connect()

async def disconnect(self) -> None:
await self.model.disconnect()
"""
Disconnects from the model and unloads/closes it as/if necessary.
"""
await self.llm.disconnect()

async def run(self) -> None:
"""
Main loop for the bot.
"""
await self.model.run()
await self.llm.run()

async def send(self, data: Dict[str, Any]) -> AsyncGenerator[Dict[str, Any], None]:
"""
Expand All @@ -59,7 +61,7 @@ async def send(self, data: Dict[str, Any]) -> AsyncGenerator[Dict[str, Any], Non
Args:
data: The message to send to the model in OpenAI format, like {"role": "user", "content": "Hello!"}
"""
await self.model.send({"text_message": data})
await self.llm.send(data)

async def handle_interruption(self, lenght_to_interruption: int) -> None:
"""
Expand All @@ -70,4 +72,4 @@ async def handle_interruption(self, lenght_to_interruption: int) -> None:
This value could be number of characters, number of words, milliseconds, number of audio frames, etc.
depending on the bot structure that implements it.
"""
log.warning("TODO! Interruption not yet supported in text chat bot structure.")
await self.llm.handle_interruption(lenght_to_interruption)
Loading

0 comments on commit 76d36fd

Please sign in to comment.