-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text rewriter #149
base: Develop
Are you sure you want to change the base?
Text rewriter #149
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work so far. Please, solve the given comments.
For better guidance in standardization, you can check how the Lesson Plan Generator was made: https://github.com/marvelai-org/marvel-ai-backend/tree/Develop/app/tools/lesson_plan_generator
Hey @timothydrew92, please let me know when you update your Notes Generator proposal in base of the given guidelines. So far from that, good work in the approach. |
unnecessary file
removed unnecessary files
removed unnecessary files
@AaronSosaRamos Good Morning and Happy New Year! I've made changes based on your feedback and I'm interested to receive more. Please let me know if I utilized your previous feedback in the way you intended. Thank you for your help! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @timothydrew92, good work so far. You have improved a lot with the given guidelines. Please, solve the given comments.
@@ -0,0 +1,31 @@ | |||
# Use Python 3.12 slim image | |||
FROM python:3.12-slim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timothydrew92 Double check if the Docker Container is working when you use that Python version. If the container is working appropriately, then there is no dependency issue and is a good proposal for updating our Docker Image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, use the current metadata.json
structure for implementing the Text Rewriter inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AaronSosaRamos Is there a metedata.json file provided like the router.py file? I can't find a provided file... or do you just mean make my metadata file actually contribute to the text rewriter inputs?
except Exception as e: | ||
raise ValueError(f"Error in rewrite_tool_handler: {str(e)}") | ||
|
||
def load_metadata(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed as it exists in the tools_config.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AaronSosaRamos Where is the tools_config.py file located?
with open(METADATA_FILE, "r") as f: | ||
return json.load(f) | ||
|
||
def validate_inputs(inputs: dict, metadata: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed as it exists in the tools_config.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the tools_config.py file located?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work @timothydrew92 :), you have improved a lot.
The tools_config.py file is located at /app/tools/utils/tools_config.py
. It is not required to change that file, just load the correct inputs at metadata.json
and in core.py
. Just remove those metadata loading functions as they are already configured in our backend.
Thank you :)
|
||
logger = setup_logger("executor") | ||
|
||
def execute_text_rewriter(text: str, instructions: str) -> Dict[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please rename this function as executor
?
from typing import Dict | ||
from app.services.logger import setup_logger | ||
|
||
logger = setup_logger("executor") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be: logger = setup_logger()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be refactored with the given structure from other metadata.json
files of other tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add more options and dimensions for these few shot examples? For improving the model's inference in different scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another FastAPI instance is not required for testing your core.py
. You can guide and refactor this file with any of the existent test_core.py
files from other tools.
api_key = os.getenv("GOOGLE_API_KEY") | ||
project_id = os.getenv("PROJECT_ID") | ||
|
||
if not api_key or not project_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API key validation is not required as the env. variables are loaded in main.py
by using load_dotenv(find_dotenv())
if not api_key or not project_id: | ||
raise ValueError("API key or project ID is missing in environment variables.") | ||
|
||
def create_input_schema(metadata: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace this function by creating a Pydantic schema and locating it in /app/services/schemas.py
} | ||
return create_model(metadata["name"] + "InputSchema", **fields) | ||
|
||
def create_output_schema(metadata: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace this function by creating a Pydantic schema and locating it in /app/services/schemas.py
model="gemini-1.5-pro", | ||
temperature=0.7, | ||
max_output_tokens=1024, | ||
api_key=api_key # Using the globally loaded API key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed as the env. variables are actually loaded.
model = GoogleGenerativeAI( | ||
model="gemini-1.5-pro", | ||
temperature=0.7, | ||
max_output_tokens=1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this max_output_tokens
required? What can happen if the user's text is greater than the limit?
@yunusj this is ready for your review |
Description
Provide a concise summary of the changes in this PR. Include the motivation behind these changes and any relevant context.
Related Issue
If this PR addresses an issue, link to it here.
fixes issue #158 - EPIC 2.10 Text Rewriter
Type of Change
Please select the type(s) of change that apply and delete those that do not.
Proposed Solution
Describe your code changes in detail. Explain how you implemented your solution and any design decisions you made.
I've attached a working pipeline using Google Gemini as the model for the text rewriting. I will add langchain abilities and continue testing and researching benchmark information for optimizing the model but I didn't want to over engineer. I'm looking for some feedback on what I have so far. I would love feedback on the organization of the files, I feel like my information isn't well organized into layers/pathways, but it may be on my local machine only and I'm curious to know what the PR looks like formatting wise. Please provide feedback on initial working pipeline and coding, file management for further development. Thank you! -Tim
How to Test
Provide instructions on how to test these changes. Include details on test configurations, test cases, and expected outcomes.
input text and instructions ({
"text": "banana, orange, apple, french toast, carrots",
"instructions": "please reorder items into alphabetical order"
}
rewritten text provided as output ({
"rewritten_text": "apple, banana, carrots, french toast, orange\n"
})
Unit Tests
List the unit tests added or modified to verify your changes.
Documentation Updates
Indicate whether documentation needs to be updated due to this PR.
If yes, describe what documentation updates are needed and link to the relevant documentation.
Checklist
Additional Information
Add any other information that might be useful for the reviewers.
Pipeline works in the docker file when I run it. Please let me know areas to improve coding and file management. I have ideas for increasing performance of model using T5 model and langchain.