Text rewriter #149

timothydrew92 · 2024-12-26T13:01:19Z

Description

Provide a concise summary of the changes in this PR. Include the motivation behind these changes and any relevant context.

PR includes docker file to run Fast API with working pipeline for text rewriting. Looking for feedback on initial pipeline before continuing to test for comparative benchmark tools and models.

Related Issue

If this PR addresses an issue, link to it here.

fixes issue #158 - EPIC 2.10 Text Rewriter

Type of Change

Please select the type(s) of change that apply and delete those that do not.

Proposed Solution

Describe your code changes in detail. Explain how you implemented your solution and any design decisions you made.

I've attached a working pipeline using Google Gemini as the model for the text rewriting. I will add langchain abilities and continue testing and researching benchmark information for optimizing the model but I didn't want to over engineer. I'm looking for some feedback on what I have so far. I would love feedback on the organization of the files, I feel like my information isn't well organized into layers/pathways, but it may be on my local machine only and I'm curious to know what the PR looks like formatting wise. Please provide feedback on initial working pipeline and coding, file management for further development. Thank you! -Tim

How to Test

Provide instructions on how to test these changes. Include details on test configurations, test cases, and expected outcomes.

input text and instructions ({
"text": "banana, orange, apple, french toast, carrots",
"instructions": "please reorder items into alphabetical order"
}

rewritten text provided as output ({
"rewritten_text": "apple, banana, carrots, french toast, orange\n"
})

Unit Tests

List the unit tests added or modified to verify your changes.

Documentation Updates

Indicate whether documentation needs to be updated due to this PR.

Yes
[ x] No

If yes, describe what documentation updates are needed and link to the relevant documentation.

Checklist

I have performed a self-review of my code.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have added tests that prove my fix is effective or that my feature works.
[x ] New and existing unit tests pass locally with my changes.
Any dependent changes have been merged and published in downstream modules.

Additional Information

Add any other information that might be useful for the reviewers.

Pipeline works in the docker file when I run it. Please let me know areas to improve coding and file management. I have ideas for increasing performance of model using T5 model and langchain.

AaronSosaRamos

Good work so far. Please, solve the given comments.
For better guidance in standardization, you can check how the Lesson Plan Generator was made: https://github.com/marvelai-org/marvel-ai-backend/tree/Develop/app/tools/lesson_plan_generator

app/features/text_rewriter/.gitignore

app/features/text_rewriter/Dockerfile

app/features/text_rewriter/requirements.txt

app/features/text_rewriter/router.py

app/features/text_rewriter/text_rewriter.py

app/features/text_rewriter/tools.py

output.txt

AaronSosaRamos · 2025-01-01T18:01:00Z

Hey @timothydrew92, please let me know when you update your Notes Generator proposal in base of the given guidelines. So far from that, good work in the approach.

unnecessary file

removed unnecessary files

timothydrew92 · 2025-01-02T12:43:18Z

@AaronSosaRamos Good Morning and Happy New Year! I've made changes based on your feedback and I'm interested to receive more. Please let me know if I utilized your previous feedback in the way you intended. Thank you for your help!

AaronSosaRamos

Hey @timothydrew92, good work so far. You have improved a lot with the given guidelines. Please, solve the given comments.

AaronSosaRamos · 2025-01-02T14:13:51Z

app/features/text_rewriter/Dockerfile

@@ -0,0 +1,31 @@
+# Use Python 3.12 slim image
+FROM python:3.12-slim


@timothydrew92 Double check if the Docker Container is working when you use that Python version. If the container is working appropriately, then there is no dependency issue and is a good proposal for updating our Docker Image.

app/features/text_rewriter/core.py

AaronSosaRamos · 2025-01-02T14:16:25Z

app/features/text_rewriter/metadata.json

Please, use the current metadata.json structure for implementing the Text Rewriter inputs.

@AaronSosaRamos Is there a metedata.json file provided like the router.py file? I can't find a provided file... or do you just mean make my metadata file actually contribute to the text rewriter inputs?

app/features/text_rewriter/text_rewriter.py

app/features/text_rewriter/tools.py

AaronSosaRamos · 2025-01-02T14:19:40Z

app/features/text_rewriter/tools.py

+    except Exception as e:
+        raise ValueError(f"Error in rewrite_tool_handler: {str(e)}")
+
+def load_metadata():


This is not needed as it exists in the tools_config.py

@AaronSosaRamos Where is the tools_config.py file located?

AaronSosaRamos · 2025-01-02T14:19:46Z

app/features/text_rewriter/tools.py

+    with open(METADATA_FILE, "r") as f:
+        return json.load(f)
+
+def validate_inputs(inputs: dict, metadata: dict):


This is not needed as it exists in the tools_config.py

Where is the tools_config.py file located?

AaronSosaRamos

Good work @timothydrew92 :), you have improved a lot.
The tools_config.py file is located at /app/tools/utils/tools_config.py. It is not required to change that file, just load the correct inputs at metadata.json and in core.py. Just remove those metadata loading functions as they are already configured in our backend.
Thank you :)

AaronSosaRamos · 2025-01-08T14:40:18Z

app/features/text_rewriter/core.py

+
+logger = setup_logger("executor")
+
+def execute_text_rewriter(text: str, instructions: str) -> Dict[str, str]:


Could you please rename this function as executor?

AaronSosaRamos · 2025-01-08T14:40:38Z

app/features/text_rewriter/core.py

+from typing import Dict
+from app.services.logger import setup_logger 
+
+logger = setup_logger("executor")


This must be: logger = setup_logger()

AaronSosaRamos · 2025-01-08T14:42:23Z

app/features/text_rewriter/metadata.json

This must be refactored with the given structure from other metadata.json files of other tools.

AaronSosaRamos · 2025-01-08T14:43:26Z

app/features/text_rewriter/prompt/few_shot_examples.txt

Could you please add more options and dimensions for these few shot examples? For improving the model's inference in different scenarios.

AaronSosaRamos · 2025-01-08T14:44:42Z

app/features/text_rewriter/test_core.py

Another FastAPI instance is not required for testing your core.py. You can guide and refactor this file with any of the existent test_core.py files from other tools.

AaronSosaRamos · 2025-01-08T14:45:57Z

app/features/text_rewriter/tools.py

+api_key = os.getenv("GOOGLE_API_KEY")
+project_id = os.getenv("PROJECT_ID")
+
+if not api_key or not project_id:


This API key validation is not required as the env. variables are loaded in main.py by using load_dotenv(find_dotenv())

AaronSosaRamos · 2025-01-08T14:46:38Z

app/features/text_rewriter/tools.py

+if not api_key or not project_id:
+    raise ValueError("API key or project ID is missing in environment variables.")
+
+def create_input_schema(metadata: dict):


You can replace this function by creating a Pydantic schema and locating it in /app/services/schemas.py

AaronSosaRamos · 2025-01-08T14:46:56Z

app/features/text_rewriter/tools.py

+    }
+    return create_model(metadata["name"] + "InputSchema", **fields)
+
+def create_output_schema(metadata: dict):


You can replace this function by creating a Pydantic schema and locating it in /app/services/schemas.py

AaronSosaRamos · 2025-01-08T14:47:20Z

app/features/text_rewriter/tools.py

+            model="gemini-1.5-pro",
+            temperature=0.7,
+            max_output_tokens=1024,
+            api_key=api_key  # Using the globally loaded API key


This is not needed as the env. variables are actually loaded.

AaronSosaRamos · 2025-01-08T14:50:31Z

app/features/text_rewriter/tools.py

+        model = GoogleGenerativeAI(
+            model="gemini-1.5-pro",
+            temperature=0.7,
+            max_output_tokens=1024,


Why is this max_output_tokens required? What can happen if the user's text is greater than the limit?

stevenrayhinojosa-gmail-com · 2025-01-23T17:21:15Z

@yunusj this is ready for your review

timothydrew92 added 7 commits December 22, 2024 08:46

added all required files to project

5347c9c

added .gitignore to ignore sensitive and unnecessary files

942cfbf

cleaned staging and added updates to text writer functionality

e7383b9

stop tracking .env file

a0c1cee

Removed text_rewriter.env from tracking and updated .gitignore

3de02b9

update files with working piplinegit status

a1c721c

adding dockerfile for PR and updated requirements

8f22a79

AaronSosaRamos self-requested a review December 27, 2024 14:50

AaronSosaRamos added type:enhancement For minor updates or changes that improve an existing feature or process. TOOL This is a tool that is currently being worked on Text Rewriter For the Text Rewriter tool labels Dec 27, 2024

AaronSosaRamos suggested changes Dec 27, 2024

View reviewed changes

timothydrew92 added 2 commits December 30, 2024 08:50

reviewed and updated files based on PR feedback

baa4d9a

updated Dockerfile

24c200a

timothydrew92 added 5 commits January 2, 2025 06:18

updated files with suggested changes

9bb88f5

updated core and tools file per feedback

7751fc0

Delete app/features/text_rewriter/document_loaders.py

f76835b

unnecessary file

Delete app/features/text_rewriter/prompts/rewrite_prompt.txt

5113133

removed unnecessary files

Delete output.txt

47a3acb

removed unnecessary files

AaronSosaRamos suggested changes Jan 2, 2025

View reviewed changes

timothydrew92 added 2 commits January 3, 2025 08:25

updated files with lab manager feedback

5e97b34

added prompt file and worked on other files based on feedback

4ec5ecf

AaronSosaRamos suggested changes Jan 8, 2025

View reviewed changes

stevenrayhinojosa-gmail-com approved these changes Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text rewriter #149

Text rewriter #149

timothydrew92 commented Dec 26, 2024

AaronSosaRamos left a comment •

edited

Loading

AaronSosaRamos commented Jan 1, 2025

timothydrew92 commented Jan 2, 2025

AaronSosaRamos left a comment

AaronSosaRamos Jan 2, 2025

AaronSosaRamos Jan 2, 2025

timothydrew92 Jan 4, 2025 •

edited

Loading

AaronSosaRamos Jan 2, 2025

timothydrew92 Jan 4, 2025 •

edited

Loading

AaronSosaRamos Jan 2, 2025

timothydrew92 Jan 4, 2025

AaronSosaRamos left a comment

AaronSosaRamos Jan 8, 2025

AaronSosaRamos Jan 8, 2025

AaronSosaRamos Jan 8, 2025

AaronSosaRamos Jan 8, 2025

AaronSosaRamos Jan 8, 2025

AaronSosaRamos Jan 8, 2025

AaronSosaRamos Jan 8, 2025

AaronSosaRamos Jan 8, 2025

AaronSosaRamos Jan 8, 2025

AaronSosaRamos Jan 8, 2025

stevenrayhinojosa-gmail-com commented Jan 23, 2025

		@@ -0,0 +1,31 @@
		# Use Python 3.12 slim image
		FROM python:3.12-slim


		logger = setup_logger("executor")

		def execute_text_rewriter(text: str, instructions: str) -> Dict[str, str]:

Text rewriter #149

Are you sure you want to change the base?

Text rewriter #149

Conversation

timothydrew92 commented Dec 26, 2024

Description

Related Issue

Type of Change

Proposed Solution

How to Test

Unit Tests

Documentation Updates

Checklist

Additional Information

AaronSosaRamos left a comment • edited Loading

Choose a reason for hiding this comment

AaronSosaRamos commented Jan 1, 2025

timothydrew92 commented Jan 2, 2025

AaronSosaRamos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timothydrew92 Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timothydrew92 Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AaronSosaRamos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenrayhinojosa-gmail-com commented Jan 23, 2025

AaronSosaRamos left a comment •

edited

Loading

timothydrew92 Jan 4, 2025 •

edited

Loading

timothydrew92 Jan 4, 2025 •

edited

Loading