Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic implementation of an plugin system for OA #2765

Merged
merged 19 commits into from
May 2, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,15 @@ async def abort_work(self, message_id: str, reason: str) -> models.DbMessage:
await self.session.refresh(message)
return message

async def complete_work(self, message_id: str, content: str) -> models.DbMessage:
async def complete_work(
self, message_id: str, content: str, work_parameters: inference.WorkParameters
) -> models.DbMessage:
logger.debug(f"Completing work on message {message_id}")
message = await self.get_assistant_message_by_id(message_id)
message.state = inference.MessageState.complete
message.work_end_at = datetime.datetime.utcnow()
message.content = content
message.work_parameters = work_parameters
draganjovanovich marked this conversation as resolved.
Show resolved Hide resolved
await self.session.commit()
logger.debug(f"Completed work on message {message_id}")
await self.session.refresh(message)
Expand Down
1 change: 1 addition & 0 deletions inference/server/oasst_inference_server/routes/chats.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ async def create_assistant_message(
work_parameters = inference.WorkParameters(
model_config=model_config,
sampling_parameters=request.sampling_parameters,
plugins=request.plugins,
)
assistant_message = await ucr.initiate_assistant_message(
parent_id=request.parent_id,
Expand Down
74 changes: 74 additions & 0 deletions inference/server/oasst_inference_server/routes/configs.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,23 @@
import json

import fastapi
import pydantic
import requests
import yaml
from loguru import logger
from oasst_inference_server.settings import settings
from oasst_shared import model_configs
from oasst_shared.schemas import inference

# NOTE: Replace this with plugins that we will provide out of the box
DUMMY_PLUGINS = [
inference.PluginEntry(
url="http://192.168.0.35:8085/ai-plugin.json",
enabled=False,
trusted=True,
),
]

router = fastapi.APIRouter(
prefix="/configs",
tags=["configs"],
Expand Down Expand Up @@ -73,3 +87,63 @@ async def get_model_configs() -> list[ModelConfigInfo]:
for model_config_name in model_configs.MODEL_CONFIGS
if (settings.allowed_model_config_names == "*" or model_config_name in settings.allowed_model_config_names_list)
]


@router.post("/plugin_config")
async def get_plugin_config(plugin: inference.PluginEntry) -> inference.PluginEntry | fastapi.HTTPException:
olliestanley marked this conversation as resolved.
Show resolved Hide resolved
plugin_config = None
try:
response = requests.get(plugin.url)
olliestanley marked this conversation as resolved.
Show resolved Hide resolved
response.raise_for_status()
except requests.exceptions.RequestException:
return fastapi.HTTPException(status_code=404, detail="Plugin not found")

config = {}
try:
content_type = response.headers.get("Content-Type")
if "application/json" in content_type or plugin.url.endswith(".json"):
config = json.loads(response.text)
elif (
"application/yaml" in content_type
or "application/x-yaml" in content_type
or plugin.url.endswith(".yaml")
or plugin.url.endswith(".yml")
):
config = yaml.safe_load(response.text)
else:
raise Exception(f"Unsupported content type: {content_type}. Only JSON and YAML are supported.")

plugin_config = inference.PluginConfig(**config)
except Exception as e:
return fastapi.HTTPException(status_code=404, detail="Failed to parse plugin config, error: " + str(e))

return inference.PluginEntry(url=plugin.url, enabled=plugin.enabled, plugin_config=plugin_config)


@router.get("/builtin_plugins")
async def get_builtin_plugins() -> list[inference.PluginEntry] | fastapi.HTTPException:
plugins = []

for plugin in DUMMY_PLUGINS:
try:
response = requests.get(plugin.url)
olliestanley marked this conversation as resolved.
Show resolved Hide resolved
response.raise_for_status()
except requests.exceptions.RequestException:
logger.warning(f"Failed to fetch plugin config from {plugin.url}")
continue

try:
plugin_config = inference.PluginConfig(**response.json())
except ValueError:
logger.warning(f"Failed to parse plugin config from {plugin.url}")
continue

final_plugin: inference.PluginEntry = inference.PluginEntry(
url=plugin.url,
enabled=plugin.enabled,
trusted=plugin.trusted,
plugin_config=plugin_config,
)
plugins.append(final_plugin)

return plugins
3 changes: 3 additions & 0 deletions inference/server/oasst_inference_server/routes/workers.py
Original file line number Diff line number Diff line change
Expand Up @@ -340,9 +340,12 @@ async def handle_generated_text_response(
message_id = work_response_container.message_id
async with deps.manual_create_session() as session:
cr = chat_repository.ChatRepository(session=session)
work_parameters = work_response_container.work_request.parameters
work_parameters = work_parameters.copy(update={"used_plugin": response.used_plugin})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does "used" here mean that the model actually chose to use it? if yes, could we store this somewhere else than in the work parameters? because this is more a response-value than a request-value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User turns on plugin on the frontend, and model/system can use it or not.
But we always want back "used_plugin" at the frontend, because it contains all important data, such as inner monologue etc... so user/devs can see all details with logging (how i used it).
For example if plugin was turned on, but model didn't use it , that "used_plugin" var still holding valuable info for the user/dev.
Currently, that is not rendered on the UI, but could be.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is a response value. I saw that some contributor, if remember correctly abdbarho added that work_parameters to MessageRead type so it can render all info in the final message, and I just added two more fields to the workparameters for plugins, i could move it else were, if you have a suggestion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add, we want that value only on the final message, it's not needed before the stream ends.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the work_parameters are meant as input, i.e. if I send them somewhere else I should get the same output, therefore we don't want response values in here. I see that this makes it easy to show in the frontend, but we'll have to find another way of doing that. I think we could store it directly in the message object and default to none, similar to the safety properties that have been added recently

message = await cr.complete_work(
message_id=message_id,
content=response.text,
work_parameters=work_parameters,
)
logger.info(f"Completed work for {message_id=}")
message_packet = inference.InternalFinishedMessageResponse(
Expand Down
2 changes: 2 additions & 0 deletions inference/server/oasst_inference_server/schemas/chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ class CreateAssistantMessageRequest(pydantic.BaseModel):
parent_id: str
model_config_name: str
sampling_parameters: inference.SamplingParameters = pydantic.Field(default_factory=inference.SamplingParameters)
plugins: list[inference.PluginEntry] = pydantic.Field(default_factory=list[inference.PluginEntry])
used_plugin: inference.PluginUsed | None = None


class PendingResponseEvent(pydantic.BaseModel):
Expand Down
68 changes: 68 additions & 0 deletions inference/worker/PLUGINS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Plugin system for OA

This is a basic implementation of support for external augmentation and
OpenAI/ChatGPT plugins into the Open-Assistant. In the current state, this is
more of a proof-of-concept and should be considered to be used behind some
experimental flag.

## Architecture

There is now some kind of middleware between work.py(worker) and the final
prompt that is passed to the inference server for generation and streaming. That
middleware is responsible for checking if there is an enabled plugin in the
userland/UI and if so, it will take over the job of creating curated pre-prompts
for plugin usage, as well as generating subsequent calls to LLM(inner
monologues) in order to generate the final externally **augmented** prompt, that
will be passed back to the worker and next to the inference, for final LLM
generation/streaming tokens to the frontend.
olliestanley marked this conversation as resolved.
Show resolved Hide resolved

## Plugins

Plugins are in essence just pretty wrappers around some kind of API-s and serve
a purpose to help LLM utilize it more precisely and reliably, so they can be
quite useful and powerful augmentation tools for Open-Assistant. Two main parts
of a plugin are the ai-plugin.json file, which is just the main descriptor of a
plugin, and the second part is OpenAPI specification of the plugin API-s.

Here is OpenAI plugins
[specification](https://platform.openai.com/docs/plugins/getting-started) that
is currently partially supported with this system.

For now, only non-authentication-based plugins are supported. Some of them
are: - https://www.klarna.com/.well-known/ai-plugin.json -
https://www.joinmilo.com/.well-known/ai-plugin.json

And quite a few of them can be found on this website
[plugin "store" wellknown.ai](https://www.wellknown.ai/)
olliestanley marked this conversation as resolved.
Show resolved Hide resolved

One of the ideas of the plugin system is that we can have some internal OA
plugins, which will be like out-of-the-box plugins, and there could be endless
third-party community-developed plugins as well.

There is one python based plugin called **calculator**, included in this system
for now, as a proof of concept and as a learning material of how one could
create own plugins.

### Notes regarding the reliability and performance and the limitations of the plugin system

Performance can vary a lot depending on the models and plugins used. Some of
them work better some worse, but that aspect should improve as we get better and
better models. One of the biggest limitations at the moment is context size and
instruction following capabilities. And that is combated with some prompt
tricks, truncations of the plugin OpenAPI descriptions and dynamically
including/excluding parts of the prompts in the internal processing of the
subsequent generations of intermediate texts (inner monologues). More of the
limitations and possible alternatives are explained in code comments.

The current approach is somewhat hybrid I would say, and relies on the zero-shot
capabilities of a model. There will be one more branch with the plugin system
that will be a bit different approach than this one as it will be utilizing
other smaller embedding transformer models and vector stores, so we can do A/B
testing of the system alongside new OA model releases.

## Relevant files for the inference side of the plugin system

- chat_chain.py
- chat*chain_utils.py *(tweaking tools/plugin description string generation can
help for some models)\_
- chat*chain_prompts.py *(tweaking prompts can help also)\_
Loading