-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
adde807
commit 62947b6
Showing
47 changed files
with
2,434 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
.databrickscfg |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
--- | ||
title: Project Legion - SQL Migration Assistant | ||
language: python | ||
author: Robert Whiffin | ||
date: 2024-08-28 | ||
|
||
tags: | ||
- SQL | ||
- Migration | ||
- copilot | ||
- GenAi | ||
|
||
--- | ||
|
||
# Project Legion - SQL Migration Assistant | ||
|
||
Legion is a Databricks field project to accelerate migrations on to Databricks leveraging the platform’s generative AI | ||
capabilities. It uses an LLM for code conversion and intent summarisation, presented to users in a front end web | ||
application. | ||
|
||
Legion provides a chatbot interface to users for translating input code (for example T-SQL to Databricks SQL) and | ||
summarising the intent and business purpose of the code. This intent is then embedded for serving in a Vector Search | ||
index for finding similar pieces of code. This presents an opportunity for increased collaboration (find out who is | ||
working on similar projects), rationalisation (identify duplicates based on intent) and discoverability (semantic search). | ||
|
||
Legion is a solution accelerator - it is *not* a fully baked solution. This is something for you the customer to take | ||
on and own. This allows you to present a project to upskill your employees, leverage GenAI for a real use case, | ||
customise the application to their needs and entirely own the IP. | ||
|
||
## Installation Videos | ||
|
||
https://github.com/user-attachments/assets/b43372fb-95ea-49cd-9a2c-aec8e0d6700f | ||
|
||
https://github.com/user-attachments/assets/fa622f96-a78c-40b8-9eb9-f6671c4d7b47 | ||
|
||
https://github.com/user-attachments/assets/1a58a1b5-2dcf-4624-b93f-214735162584 | ||
|
||
|
||
|
||
Setting Legion up is a simple and automated process. Ensure you have the [Databricks CLI] | ||
(https://docs.databricks.com/en/dev-tools/cli/index.html) installed and configured with the correct workspace. Install | ||
the [Databricks Labs Sandbox](https://github.com/databrickslabs/sandbox). | ||
|
||
First, navigate to where you have installed the Databricks Labs Sandbox. For example | ||
```bash | ||
cd /Documents/sandbox | ||
``` | ||
|
||
You'll need to install the python requirements in the `requirements.txt` file in the root of the project. | ||
You may wish to do this in a virtual environment. | ||
```bash | ||
pip install -r sql-migration-assistant/requirements.txt -q | ||
``` | ||
Run the following command to start the installation process, creating all the necessary resources in your workspace. | ||
```bash | ||
databricks labs sandbox sql-migration-assistant | ||
``` | ||
|
||
### What Legion needs - during setup above you will create or choose existing resources for the following: | ||
|
||
- A no-isolation shared cluster running the ML runtime (tested on DBR 15.0 ML) to host the front end application. | ||
- A catalog and schema in Unity Catalog. | ||
- A table to store the code intent statements and their embeddings. | ||
- A vector search endpoint and an embedding model: see docs | ||
https://docs.databricks.com/en/generative-ai/vector-search.html#how-to-set-up-vector-search | ||
- A chat LLM. Pay Per Token is recomended where available, but the set up will also allow for creation of | ||
a provisioned throughput endpoint. | ||
- A PAT stored in a secret scope chosen by you, under the key `sql-migration-pat`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
from sql_migration_assistant.utils.initialsetup import SetUpMigrationAssistant | ||
from databricks.sdk import WorkspaceClient | ||
from databricks.labs.blueprint.tui import Prompts | ||
import yaml | ||
|
||
|
||
def hello(): | ||
w = WorkspaceClient(product="sql_migration_assistant", product_version="0.0.1") | ||
p = Prompts() | ||
setter_upper = SetUpMigrationAssistant() | ||
final_config = setter_upper.setup_migration_assistant(w, p) | ||
with open("sql_migration_assistant/config.yml", "w") as f: | ||
yaml.dump(final_config, f) | ||
setter_upper.upload_files(w) | ||
setter_upper.launch_review_app(w, final_config) |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
import logging | ||
|
||
from databricks.sdk import WorkspaceClient | ||
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole | ||
|
||
w = WorkspaceClient() | ||
foundation_llm_name = "databricks-meta-llama-3-1-405b-instruct" | ||
max_token = 4096 | ||
messages = [ | ||
ChatMessage(role=ChatMessageRole.SYSTEM, content="You are an unhelpful assistant"), | ||
ChatMessage(role=ChatMessageRole.USER, content="What is RAG?"), | ||
] | ||
|
||
|
||
class LLMCalls: | ||
def __init__(self, foundation_llm_name, max_tokens): | ||
self.w = WorkspaceClient() | ||
self.foundation_llm_name = foundation_llm_name | ||
self.max_tokens = int(max_tokens) | ||
|
||
def call_llm(self, messages): | ||
""" | ||
Function to call the LLM model and return the response. | ||
:param messages: list of messages like | ||
messages=[ | ||
ChatMessage(role=ChatMessageRole.SYSTEM, content="You are an unhelpful assistant"), | ||
ChatMessage(role=ChatMessageRole.USER, content="What is RAG?"), | ||
ChatMessage(role=ChatMessageRole.ASSISTANT, content="A type of cloth?") | ||
] | ||
:return: the response from the model | ||
""" | ||
response = self.w.serving_endpoints.query( | ||
name=foundation_llm_name, max_tokens=max_token, messages=messages | ||
) | ||
message = response.choices[0].message.content | ||
return message | ||
|
||
def convert_chat_to_llm_input(self, system_prompt, chat): | ||
# Convert the chat list of lists to the required format for the LLM | ||
messages = [ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt)] | ||
for q, a in chat: | ||
messages.extend( | ||
[ | ||
ChatMessage(role=ChatMessageRole.USER, content=q), | ||
ChatMessage(role=ChatMessageRole.ASSISTANT, content=a), | ||
] | ||
) | ||
return messages | ||
|
||
################################################################################ | ||
# FUNCTION FOR TRANSLATING CODE | ||
################################################################################ | ||
|
||
# this is called to actually send a request and receive response from the llm endpoint. | ||
|
||
def llm_translate(self, system_prompt, input_code): | ||
messages = [ | ||
ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt), | ||
ChatMessage(role=ChatMessageRole.USER, content=input_code), | ||
] | ||
|
||
# call the LLM end point. | ||
llm_answer = self.call_llm(messages=messages) | ||
# Extract the code from in between the triple backticks (```), since LLM often prints the code like this. | ||
# Also removes the 'sql' prefix always added by the LLM. | ||
translation = llm_answer # .split("Final answer:\n")[1].replace(">>", "").replace("<<", "") | ||
return translation | ||
|
||
def llm_chat(self, system_prompt, query, chat_history): | ||
messages = self.convert_chat_to_llm_input(system_prompt, chat_history) | ||
messages.append(ChatMessage(role=ChatMessageRole.USER, content=query)) | ||
# call the LLM end point. | ||
llm_answer = self.call_llm(messages=messages) | ||
return llm_answer | ||
|
||
def llm_intent(self, system_prompt, input_code): | ||
messages = [ | ||
ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt), | ||
ChatMessage(role=ChatMessageRole.USER, content=input_code), | ||
] | ||
|
||
# call the LLM end point. | ||
llm_answer = self.call_llm(messages=messages) | ||
return llm_answer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
from databricks.sdk import WorkspaceClient | ||
from databricks.labs.lsql.core import StatementExecutionExt | ||
|
||
|
||
class SimilarCode: | ||
|
||
def __init__( | ||
self, | ||
workspace_client: WorkspaceClient, | ||
see: StatementExecutionExt, | ||
catalog, | ||
schema, | ||
code_intent_table_name, | ||
VS_index_name, | ||
VS_endpoint_name, | ||
): | ||
self.w = workspace_client | ||
self.see = see | ||
self.catalog = catalog | ||
self.schema = schema | ||
self.code_intent_table_name = code_intent_table_name | ||
self.vs_index_name = VS_index_name | ||
self.vs_endpoint_name = VS_endpoint_name | ||
|
||
def save_intent(self, code, intent): | ||
code_hash = hash(code) | ||
_ = self.see.execute( | ||
f'INSERT INTO {self.catalog}.{self.schema}.{self.code_intent_table_name} VALUES ({code_hash}, "{code}", "{intent}")', | ||
) | ||
|
||
def get_similar_code(self, chat_history): | ||
intent = chat_history[-1][1] | ||
results = self.w.vector_search_indexes.query_index( | ||
index_name=f"{self.catalog}.{self.schema}.{self.vs_index_name}", | ||
columns=["code", "intent"], | ||
query_text=intent, | ||
num_results=1, | ||
) | ||
docs = results.result.data_array | ||
return (docs[0][0], docs[0][1]) |
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = source | ||
BUILDDIR = _build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
Oops, something went wrong.