Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text rewriter #149

Open
wants to merge 16 commits into
base: Develop
Choose a base branch
from

Conversation

timothydrew92
Copy link

Description

Provide a concise summary of the changes in this PR. Include the motivation behind these changes and any relevant context.

  • PR includes docker file to run Fast API with working pipeline for text rewriting. Looking for feedback on initial pipeline before continuing to test for comparative benchmark tools and models.

Related Issue

If this PR addresses an issue, link to it here.

fixes issue #158 - EPIC 2.10 Text Rewriter

Type of Change

Please select the type(s) of change that apply and delete those that do not.

  • Bug fix: A non-breaking change that fixes an issue.
  • New feature: A non-breaking change that adds functionality.
  • Breaking change: A change that causes existing functionality to not work as expected.
  • Documentation update: Changes or updates to documentation.
  • Code style update: Changes that do not affect the meaning of the code (e.g., formatting).
  • Refactoring: A code change that neither fixes a bug nor adds a feature.
  • Performance improvement: A change that improves performance.
  • Test enhancement: Adding or updating tests; no production code change.
  • Chore: Changes to the build process or auxiliary tools; no production code change.
  • Other: (please describe)

Proposed Solution

Describe your code changes in detail. Explain how you implemented your solution and any design decisions you made.

I've attached a working pipeline using Google Gemini as the model for the text rewriting. I will add langchain abilities and continue testing and researching benchmark information for optimizing the model but I didn't want to over engineer. I'm looking for some feedback on what I have so far. I would love feedback on the organization of the files, I feel like my information isn't well organized into layers/pathways, but it may be on my local machine only and I'm curious to know what the PR looks like formatting wise. Please provide feedback on initial working pipeline and coding, file management for further development. Thank you! -Tim

How to Test

Provide instructions on how to test these changes. Include details on test configurations, test cases, and expected outcomes.

input text and instructions ({
"text": "banana, orange, apple, french toast, carrots",
"instructions": "please reorder items into alphabetical order"
}

rewritten text provided as output ({
"rewritten_text": "apple, banana, carrots, french toast, orange\n"
})

Unit Tests

List the unit tests added or modified to verify your changes.

Documentation Updates

Indicate whether documentation needs to be updated due to this PR.

  • Yes
  • [ x] No

If yes, describe what documentation updates are needed and link to the relevant documentation.

Checklist

  • I have performed a self-review of my code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • [x ] New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.

Additional Information

Add any other information that might be useful for the reviewers.

Pipeline works in the docker file when I run it. Please let me know areas to improve coding and file management. I have ideas for increasing performance of model using T5 model and langchain.

@AaronSosaRamos AaronSosaRamos self-requested a review December 27, 2024 14:50
@AaronSosaRamos AaronSosaRamos added type:enhancement For minor updates or changes that improve an existing feature or process. TOOL This is a tool that is currently being worked on Text Rewriter For the Text Rewriter tool labels Dec 27, 2024
Copy link
Contributor

@AaronSosaRamos AaronSosaRamos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work so far. Please, solve the given comments.
For better guidance in standardization, you can check how the Lesson Plan Generator was made: https://github.com/marvelai-org/marvel-ai-backend/tree/Develop/app/tools/lesson_plan_generator

app/features/text_rewriter/.gitignore Outdated Show resolved Hide resolved
app/features/text_rewriter/.gitignore Outdated Show resolved Hide resolved
app/features/text_rewriter/Dockerfile Outdated Show resolved Hide resolved
app/features/text_rewriter/Dockerfile Outdated Show resolved Hide resolved
app/features/text_rewriter/Dockerfile Outdated Show resolved Hide resolved
app/features/text_rewriter/requirements.txt Show resolved Hide resolved
app/features/text_rewriter/router.py Outdated Show resolved Hide resolved
app/features/text_rewriter/text_rewriter.py Outdated Show resolved Hide resolved
app/features/text_rewriter/tools.py Show resolved Hide resolved
output.txt Outdated Show resolved Hide resolved
@AaronSosaRamos
Copy link
Contributor

Hey @timothydrew92, please let me know when you update your Notes Generator proposal in base of the given guidelines. So far from that, good work in the approach.

@timothydrew92
Copy link
Author

@AaronSosaRamos Good Morning and Happy New Year! I've made changes based on your feedback and I'm interested to receive more. Please let me know if I utilized your previous feedback in the way you intended. Thank you for your help!

Copy link
Contributor

@AaronSosaRamos AaronSosaRamos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @timothydrew92, good work so far. You have improved a lot with the given guidelines. Please, solve the given comments.

@@ -0,0 +1,31 @@
# Use Python 3.12 slim image
FROM python:3.12-slim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothydrew92 Double check if the Docker Container is working when you use that Python version. If the container is working appropriately, then there is no dependency issue and is a good proposal for updating our Docker Image.

app/features/text_rewriter/core.py Outdated Show resolved Hide resolved
app/features/text_rewriter/core.py Outdated Show resolved Hide resolved
app/features/text_rewriter/core.py Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use the current metadata.json structure for implementing the Text Rewriter inputs.

Copy link
Author

@timothydrew92 timothydrew92 Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronSosaRamos Is there a metedata.json file provided like the router.py file? I can't find a provided file... or do you just mean make my metadata file actually contribute to the text rewriter inputs?

app/features/text_rewriter/text_rewriter.py Outdated Show resolved Hide resolved
app/features/text_rewriter/tools.py Outdated Show resolved Hide resolved
app/features/text_rewriter/tools.py Show resolved Hide resolved
except Exception as e:
raise ValueError(f"Error in rewrite_tool_handler: {str(e)}")

def load_metadata():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed as it exists in the tools_config.py

Copy link
Author

@timothydrew92 timothydrew92 Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronSosaRamos Where is the tools_config.py file located?

with open(METADATA_FILE, "r") as f:
return json.load(f)

def validate_inputs(inputs: dict, metadata: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed as it exists in the tools_config.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the tools_config.py file located?

Copy link
Contributor

@AaronSosaRamos AaronSosaRamos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @timothydrew92 :), you have improved a lot.
The tools_config.py file is located at /app/tools/utils/tools_config.py. It is not required to change that file, just load the correct inputs at metadata.json and in core.py. Just remove those metadata loading functions as they are already configured in our backend.
Thank you :)


logger = setup_logger("executor")

def execute_text_rewriter(text: str, instructions: str) -> Dict[str, str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rename this function as executor?

from typing import Dict
from app.services.logger import setup_logger

logger = setup_logger("executor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be: logger = setup_logger()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be refactored with the given structure from other metadata.json files of other tools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add more options and dimensions for these few shot examples? For improving the model's inference in different scenarios.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another FastAPI instance is not required for testing your core.py. You can guide and refactor this file with any of the existent test_core.py files from other tools.

api_key = os.getenv("GOOGLE_API_KEY")
project_id = os.getenv("PROJECT_ID")

if not api_key or not project_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API key validation is not required as the env. variables are loaded in main.py by using load_dotenv(find_dotenv())

if not api_key or not project_id:
raise ValueError("API key or project ID is missing in environment variables.")

def create_input_schema(metadata: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace this function by creating a Pydantic schema and locating it in /app/services/schemas.py

}
return create_model(metadata["name"] + "InputSchema", **fields)

def create_output_schema(metadata: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace this function by creating a Pydantic schema and locating it in /app/services/schemas.py

model="gemini-1.5-pro",
temperature=0.7,
max_output_tokens=1024,
api_key=api_key # Using the globally loaded API key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed as the env. variables are actually loaded.

model = GoogleGenerativeAI(
model="gemini-1.5-pro",
temperature=0.7,
max_output_tokens=1024,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this max_output_tokens required? What can happen if the user's text is greater than the limit?

@stevenrayhinojosa-gmail-com

@yunusj this is ready for your review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Text Rewriter For the Text Rewriter tool TOOL This is a tool that is currently being worked on type:enhancement For minor updates or changes that improve an existing feature or process.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants