feat: allow users to configure a base_url for the vectorizer OpenAI embedder #351

smoya · 2025-01-09T15:24:02Z

This PR introduces the ability to configure the base_url in the OpenAI embedded for vectorizers. This enables greater flexibility for connecting to custom OpenAI API endpoints. This change adds support not only for private API deployments or testing environments, but future implementation of embedders that use the same API as OpenAI, such as Azure OpenAI.

The changes include adding a new class BaseURLMixin which can be used in any other compatible embedder.

projects/extension/sql/idempotent/008-embedding.sql

projects/pgai/tests/vectorizer/test_vectorizer_cli.py

JamesGuthrie · 2025-01-09T16:45:49Z

...ts/vectorizer/cassettes/openai-character_text_splitter-chunk_value-items=1-batch_size=1.yaml

How did you get this content into the cassette file? I don't see how I would easily be able to reproduce this test if I need to at a later date.

I changed back to what I used when creating this file. I ran a mitmproxy in my local env (localhost:8000) rewriting the request to OpenAI API endpoint. Made this by running this oneliner:

mitmproxy --listen-port 8000 --mode reverse:https://api.openai.com

Do you believe it's worth to add a comment in the test with this? Do you think there is a better alternative though? I though on rather create the reverse proxy by code in python but since this is just "one time" for creating the cassette, I discarded it.

I think at the very least we should comment it, but ideally we would programatically set up the proxy in the test, so that we don't need manual setup when re-generating the cassette.

I tried to write a simple reverse proxy in the conftest.py file and, even though it works, VCR intercepts the call and the cassette ends up thinking the final URL is the openai one. Also entered in a infinite loop but that's most probably related to the fact I'm running the proxy in a separate thread (didn't spent more time on it).

Unless you know a simplest alternative, I will go with adding a comment with the mitmproxy command as the way to do it.

I managed to make it work integrating mitmproxy in the test code. Now It runs a proxy on that test, making the cassette be reproducible again.

…mbedder

projects/pgai/tests/vectorizer/conftest.py

projects/pgai/tests/vectorizer/test_vectorizer_cli.py

adolsalamanca

Looks good, left a couple of comments

adolsalamanca · 2025-01-13T14:58:12Z

projects/extension/sql/incremental/010-drop-embedding-openai-outdated-function.sql

@@ -0,0 +1,3 @@
+
+-- dropping in favour of the new signature (adding base_url param)
+drop function if exists ai.embedding_openai(text,integer,text,text);


nit, missing end of line new line

projects/pgai/tests/vectorizer/test_vectorizer_cli.py

projects/pgai/pgai/vectorizer/embeddings.py

AldoFusterTurpin

Just a couple of questions/comments in case those make sense/can help 🙂

projects/pgai/pgai/vectorizer/embeddings.py

projects/pgai/tests/vectorizer/conftest.py

AldoFusterTurpin

LGTM! Thanks!
(Not formal "approve" as I am missing context)

projects/pgai/pgai/vectorizer/embeddings.py

smoya requested a review from a team as a code owner January 9, 2025 15:24

smoya force-pushed the smoya/ai-232-support-base_url-in-vectorizer branch 4 times, most recently from dddc80b to a7d3164 Compare January 9, 2025 16:17

JamesGuthrie reviewed Jan 9, 2025

View reviewed changes

projects/extension/sql/idempotent/008-embedding.sql Outdated Show resolved Hide resolved

JamesGuthrie reviewed Jan 9, 2025

View reviewed changes

projects/pgai/tests/vectorizer/test_vectorizer_cli.py Show resolved Hide resolved

JamesGuthrie reviewed Jan 9, 2025

View reviewed changes

smoya force-pushed the smoya/ai-232-support-base_url-in-vectorizer branch from a7d3164 to bf9dd5a Compare January 9, 2025 19:00

JamesGuthrie mentioned this pull request Jan 10, 2025

[Feature]: Support host changes for openai calls #352

Closed

smoya force-pushed the smoya/ai-232-support-base_url-in-vectorizer branch 5 times, most recently from 4cff968 to 58f0a43 Compare January 10, 2025 12:01

feat: allow users to configure a base_url for the vectorizer OpenAI e…

7931136

…mbedder

smoya force-pushed the smoya/ai-232-support-base_url-in-vectorizer branch from 58f0a43 to 7931136 Compare January 10, 2025 12:56

smoya requested review from jgpruitt, alejandrodnm, adolsalamanca and JamesGuthrie January 10, 2025 13:02

chore: use BaseURLMixin in Ollama embedder class

cdfc49e

smoya force-pushed the smoya/ai-232-support-base_url-in-vectorizer branch 3 times, most recently from a1e6ce3 to 98f0537 Compare January 13, 2025 12:49

jgpruitt reviewed Jan 13, 2025

View reviewed changes

projects/pgai/tests/vectorizer/conftest.py Show resolved Hide resolved

jgpruitt approved these changes Jan 13, 2025

View reviewed changes

smoya commented Jan 13, 2025

View reviewed changes

projects/pgai/tests/vectorizer/test_vectorizer_cli.py Outdated Show resolved Hide resolved

ci: create a reverse proxy for the OpenAI tests

999a70a

smoya force-pushed the smoya/ai-232-support-base_url-in-vectorizer branch from 98f0537 to 999a70a Compare January 13, 2025 15:00

jgpruitt approved these changes Jan 13, 2025

View reviewed changes

adolsalamanca approved these changes Jan 13, 2025

View reviewed changes

JamesGuthrie approved these changes Jan 13, 2025

View reviewed changes

AldoFusterTurpin reviewed Jan 13, 2025

View reviewed changes

projects/pgai/pgai/vectorizer/embeddings.py Outdated Show resolved Hide resolved

projects/pgai/tests/vectorizer/conftest.py Outdated Show resolved Hide resolved

projects/pgai/tests/vectorizer/conftest.py Show resolved Hide resolved

ci: minor review fixes

d26603b

smoya force-pushed the smoya/ai-232-support-base_url-in-vectorizer branch from 5e1efae to d26603b Compare January 13, 2025 17:35

AldoFusterTurpin reviewed Jan 13, 2025

View reviewed changes

projects/pgai/pgai/vectorizer/embeddings.py Show resolved Hide resolved

smoya merged commit 66ceb3d into main Jan 14, 2025
5 checks passed

smoya deleted the smoya/ai-232-support-base_url-in-vectorizer branch January 14, 2025 09:38

github-actions bot mentioned this pull request Jan 13, 2025

chore(main): release pgai 0.5.0 #355

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow users to configure a base_url for the vectorizer OpenAI embedder #351

feat: allow users to configure a base_url for the vectorizer OpenAI embedder #351

smoya commented Jan 9, 2025 •

edited

Loading

JamesGuthrie Jan 9, 2025

smoya Jan 10, 2025

JamesGuthrie Jan 10, 2025

smoya Jan 10, 2025

smoya Jan 13, 2025

adolsalamanca left a comment

adolsalamanca Jan 13, 2025

AldoFusterTurpin left a comment

AldoFusterTurpin left a comment

		@@ -0,0 +1,3 @@

		-- dropping in favour of the new signature (adding base_url param)
		drop function if exists ai.embedding_openai(text,integer,text,text);

feat: allow users to configure a base_url for the vectorizer OpenAI embedder #351

feat: allow users to configure a base_url for the vectorizer OpenAI embedder #351

Conversation

smoya commented Jan 9, 2025 • edited Loading

JamesGuthrie Jan 9, 2025

Choose a reason for hiding this comment

smoya Jan 10, 2025

Choose a reason for hiding this comment

JamesGuthrie Jan 10, 2025

Choose a reason for hiding this comment

smoya Jan 10, 2025

Choose a reason for hiding this comment

smoya Jan 13, 2025

Choose a reason for hiding this comment

adolsalamanca left a comment

Choose a reason for hiding this comment

adolsalamanca Jan 13, 2025

Choose a reason for hiding this comment

AldoFusterTurpin left a comment

Choose a reason for hiding this comment

AldoFusterTurpin left a comment

Choose a reason for hiding this comment

smoya commented Jan 9, 2025 •

edited

Loading