Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I run the Ollama [Bug]: #2777

Open
FaizanMunsaf opened this issue Jun 28, 2024 · 14 comments
Open

How do I run the Ollama [Bug]: #2777

FaizanMunsaf opened this issue Jun 28, 2024 · 14 comments
Labels
bug Something isn't working

Comments

@FaizanMunsaf
Copy link

What happened?

I am trying to run Ollama there's no specific command I found that run Ollama private model.
There's any further guidance available with current repo?

Relevant log output

Just looking forward to Ollama show into my bot, I will generate my answer using Ollama llama3-8b model!

Twitter / LinkedIn details

No response

@FaizanMunsaf FaizanMunsaf added the bug Something isn't working label Jun 28, 2024
Copy link

linear bot commented Jun 28, 2024

Copy link

dosubot bot commented Jun 28, 2024

Hey @FaizanMunsaf! I'm here to help you with the Ollama issue you're facing. I'm a bot designed to assist with bugs, questions, and contributions. Let's work on resolving this together.

To run the Ollama private model using the current repository, follow these steps:

  1. Install Ollama:
    Install Ollama from their website.

  2. Run Ollama in the background:
    Execute the following command to run Ollama:

    ollama run llama2
  3. Run Migrations:
    Move the migration file and reset the database:

    mv supabase/migrations/local_20240107152745_ollama.sql supabase/migrations/20240107152745_ollama.sql
    supabase db reset
  4. Add Ollama Model to Quivr:
    Update the user_settings table in Supabase to include the Ollama models:

    [
      "ollama/llama2",
      "ollama/mistral"
    ]

    If you want to add it to your user only, you can add it to the models column in the user_settings table. If you want to set it as default for all new users, drop the entire table with the following command:

    DELETE TABLE user_settings;
  5. Configure .env File:
    Update your .env file to activate the Ollama API by uncommenting and setting the OLLAMA_API_BASE_URL:

    # LOCAL
    OLLAMA_API_BASE_URL=http://host.docker.internal:11434
  6. Integrate with Quivr:
    Ensure that the QuivrRAG class in quivr_rag.py is properly configured to use the Ollama model. The ChatLiteLLM class should be instantiated with the api_base parameter set to the OLLAMA_API_BASE_URL if the model starts with "ollama". This is handled in the _create_llm method of the QuivrRAG class:

    def _create_llm(
        self,
        callbacks,
        model,
        streaming=False,
        temperature=0,
    ) -> BaseLLM:
        if streaming and callbacks is None:
            raise ValueError(
                "Callbacks must be provided when using streaming language models"
            )
    
        api_base = None
        if self.brain_settings.ollama_api_base_url and model.startswith("ollama"):
            api_base = self.brain_settings.ollama_api_base_url
    
        return ChatLiteLLM(
            temperature=temperature,
            max_tokens=self.max_tokens,
            model=model,
            streaming=streaming,
            verbose=False,
            callbacks=callbacks,
            api_base=api_base,
        )

By following these steps, you will have the Ollama model running and integrated with Quivr [1][2][3].

To continue talking to Dosu, mention @dosu.

@roaidev
Copy link

roaidev commented Jun 30, 2024

real problem is getting it to work with llama3. how do we do that?

@tomgeorgekap
Copy link

how to run with llama3

@c4801725870
Copy link

c4801725870 commented Jul 28, 2024

If you look at the "brains" table there is a "model" column and they don't give you a way to change the model initially - the code will default to gpt-3.5-turbo-0125 if this value is NULL. so put "ollama/llama2" in the row for your brain under the model column. unfortunately i get a new error, openai.NotFoundError: 404 page not found... it looks like it is still trying to use openAI for some reason.

edit: the reason was the API base url is different for actually chatting with the bot. look at my post below for the solution. i don't know if this was a coincidence because ollama2 API is now compatible with openai format. And at first glance it looks like the code is still hardcoded to use ChatOpenAI, see backend/core/quivr_core/llm/llm_endpoint.py
_llm = ChatOpenAI( model=config.model, api_key=SecretStr(config.llm_api_key) if config.llm_api_key else None, base_url=config.llm_base_url, )

it works anyways, hopefully the authors don't intentionally keep breaking the ollama functionality.

@c4801725870
Copy link

I got further along, the 404 error is from quivr_api/modules/rag_service/rag_service.py, change line 92 and add + "/v1" to ollama_url in the llm_base_url assignment.

            llm_base_url=ollama_url + "/v1",

now I get a different error, it just says the model is not found

| openai.NotFoundError: Error code: 404 - {'error': {'message': 'model "ollama/llama2" not found, try pulling it first', 'type': 'api_error', 'param': None, 'code': None}}

so the API working, we just sent it a bad model. it probably expects "llama2"

so change line 91

            model="llama2",

hardcode it, who cares for now.

thus, now, is working.

it is complete

image

@c4801725870
Copy link

real problem is getting it to work with llama3. how do we do that?

as long as the API works the same you literally just type in llama3 wherever you put llama2 before.

@PFLigthart
Copy link

@c4801725870 I have tried and still cannot seem to get it to work. Here is my complete process I just followed:

  • Already have ollama installed and have llama3.1 locally which works and is running in a terminal

  • Download docker desktop (I got 4.32.0)

  • git clone https://github.com/quivrhq/quivr.git && cd quivr

  • cp .env.example .env

  • vim .env

  • Comment OPENAI_API_KEY, uncomment Ollama

  • vim docker-compose.yml

  • Change line 53 from backend-base:latest -->stangirard/quivr-backend-prebuilt:latest

  • cd backend && supabase start

  • cd ../

  • docker compose pull

  • docker compose up

  • Go to localhost:3000

  • Sign in with admin@quivr.app -> password: admin

  • Create username and fill in the details on the Welcome to quivr popup

  • Go to localhost:54323

  • Update user_settings table models column to read ["gpt-3.5-turbo-0125", "gpt-4-0125-preview", "ollama/llama3.1"]

  • Go to quivr/api/quivr_api/modules/rag_service/rag_service.py, change line 92 and add + "/v1" to ollama_url in the llm_base_url assignment. (as indicated above)

  • Also change line 91 to read model="llama3.1"

  • Create a new brain using the auto-popup after the first sign-in

  • Brain seems to be created correctly with a green popup confirmation (did change the brain to use ollama/llama3.1 from drop down)

  • Data upload also seems to work correctly with a green popup

  • Asking a question results in a Red popup "TypeError: network error"

  • Start of terminal error:
    backend-core | [INFO] quivr_api.modules.chat.controller.chat_routes [chat_routes.py:212]: Creating question for chat 9c1eb*814b with brain 926c6a70 of type <class 'uuid.UUID'>
    backend-core | [INFO] quivr_api.modules.chat.controller.chat.utils [utils.py:79]: Model to use: name='gpt-3.5-turbo-0125' price=1 max_input=4000 max_output=1000
    backend-core | [INFO] quivr_api.modules.user.service.user_usage [user_usage.py:97]: User admin@quivr.app request count updated to 5

Seems to still want to use gpt-3.5. Any help is appreciated.
Cheers,

@c4801725870
Copy link

@c4801725870 I have tried and still cannot seem to get it to work. Here is my complete process I just followed:

  • Already have ollama installed and have llama3.1 locally which works and is running in a terminal
  • Download docker desktop (I got 4.32.0)
  • git clone https://github.com/quivrhq/quivr.git && cd quivr
  • cp .env.example .env
  • vim .env
  • Comment OPENAI_API_KEY, uncomment Ollama
  • vim docker-compose.yml
  • Change line 53 from backend-base:latest -->stangirard/quivr-backend-prebuilt:latest
  • cd backend && supabase start
  • cd ../
  • docker compose pull
  • docker compose up
  • Go to localhost:3000
  • Sign in with admin@quivr.app -> password: admin
  • Create username and fill in the details on the Welcome to quivr popup
  • Go to localhost:54323
  • Update user_settings table models column to read ["gpt-3.5-turbo-0125", "gpt-4-0125-preview", "ollama/llama3.1"]
  • Go to quivr/api/quivr_api/modules/rag_service/rag_service.py, change line 92 and add + "/v1" to ollama_url in the llm_base_url assignment. (as indicated above)
  • Also change line 91 to read model="llama3.1"
  • Create a new brain using the auto-popup after the first sign-in
  • Brain seems to be created correctly with a green popup confirmation (did change the brain to use ollama/llama3.1 from drop down)
  • Data upload also seems to work correctly with a green popup
  • Asking a question results in a Red popup "TypeError: network error"
  • Start of terminal error:
    backend-core | [INFO] quivr_api.modules.chat.controller.chat_routes [chat_routes.py:212]: Creating question for chat 9c1eb*814b with brain 926c6a70 of type <class 'uuid.UUID'>
    backend-core | [INFO] quivr_api.modules.chat.controller.chat.utils [utils.py:79]: Model to use: name='gpt-3.5-turbo-0125' price=1 max_input=4000 max_output=1000
    backend-core | [INFO] quivr_api.modules.user.service.user_usage [user_usage.py:97]: User admin@quivr.app request count updated to 5

Seems to still want to use gpt-3.5. Any help is appreciated. Cheers,

my user_settings only includes ollama

image

also, in quivr_api/modules/chat/controller/chat/utils.py change line 45

default_model = "ollama/llama2"
or whatever you want.

as you can see they hardcoded it to the gpt-3.5 and there is no consideration for other models.

@c4801725870
Copy link

@PFLigthart i took another look to see if i can make it so i can switch models. Unfortunately we run into the issue that the "default model" referenced in quivr_api/modules/chat/controller/chat/utils.py is called upon brain creation, because the row in the Brains table defaults "model" to NULL. Also I thought it would be worth mentioning, you need to add the model to the models table, just adding a row for the new model.

I proved this by changing the code to use "brain_model" rather than "default_model".
I created a new brain, but it submitted the data to the wrong model(it used the default model)

`# If brain.model is None, set it to the default_model
if brain_model is None:
brain_model = default_model

model_to_use = LLMModel(  # TODO Implement default models in database
    name=brain_model, price=1, max_input=4000, max_output=1000
)

logger.debug("Brain model: %s", brain_model)


is_brain_model_available = any(
    brain_model == model_dict.get("name") for model_dict in models_settings
)

is_user_allowed_model = brain_model in user_settings.get(
    "models", [brain_model]
)  # Checks if the model is available in the list of models

logger.debug(f"Brain model: {brain_model}")
logger.debug(f"User models: {user_settings.get('models', [])}")
logger.debug(f"Model available: {is_brain_model_available}")
logger.debug(f"User allowed model: {is_user_allowed_model}")

if is_brain_model_available and is_user_allowed_model:
    # Use the model from the brain
    model_to_use.name = brain_model
    for model_dict in models_settings:
        if model_dict.get("name") == model_to_use.name:
            model_to_use.price = model_dict.get("price")
            model_to_use.max_input = model_dict.get("max_input")
            model_to_use.max_output = model_dict.get("max_output")
            break

logger.info(f"Model to use: {model_to_use}")

return model_to_use`

for the quivr_api/modules/rag_service/rag_service.py change line 91 to
model=self.model_to_use.name.split('/')[1],
this last part will eliminate having to modify this code in the future. Now you just need to edit the default model when creating a new brain.

The bad news I have so far is the new code seems really bad at tokenizing the data input when the brain is created. This project needs a lot of work.

@PFLigthart
Copy link

PFLigthart commented Aug 5, 2024

@c4801725870 this worked. Thank you so much. I was also able to get it to work with llama3 and llama3.1 by hardcoding the relevant entries. Appreciate it.
Cheers.

@caishanli
Copy link

@c4801725870 I have tried and still cannot seem to get it to work. Here is my complete process I just followed:

  • Already have ollama installed and have llama3.1 locally which works and is running in a terminal
  • Download docker desktop (I got 4.32.0)
  • git clone https://github.com/quivrhq/quivr.git && cd quivr
  • cp .env.example .env
  • vim .env
  • Comment OPENAI_API_KEY, uncomment Ollama
  • vim docker-compose.yml
  • Change line 53 from backend-base:latest -->stangirard/quivr-backend-prebuilt:latest
  • cd backend && supabase start
  • cd ../
  • docker compose pull
  • docker compose up
  • Go to localhost:3000
  • Sign in with admin@quivr.app -> password: admin
  • Create username and fill in the details on the Welcome to quivr popup
  • Go to localhost:54323
  • Update user_settings table models column to read ["gpt-3.5-turbo-0125", "gpt-4-0125-preview", "ollama/llama3.1"]
  • Go to quivr/api/quivr_api/modules/rag_service/rag_service.py, change line 92 and add + "/v1" to ollama_url in the llm_base_url assignment. (as indicated above)
  • Also change line 91 to read model="llama3.1"
  • Create a new brain using the auto-popup after the first sign-in
  • Brain seems to be created correctly with a green popup confirmation (did change the brain to use ollama/llama3.1 from drop down)
  • Data upload also seems to work correctly with a green popup
  • Asking a question results in a Red popup "TypeError: network error"
  • Start of terminal error:
    backend-core | [INFO] quivr_api.modules.chat.controller.chat_routes [chat_routes.py:212]: Creating question for chat 9c1eb*814b with brain 926c6a70 of type <class 'uuid.UUID'>
    backend-core | [INFO] quivr_api.modules.chat.controller.chat.utils [utils.py:79]: Model to use: name='gpt-3.5-turbo-0125' price=1 max_input=4000 max_output=1000
    backend-core | [INFO] quivr_api.modules.user.service.user_usage [user_usage.py:97]: User admin@quivr.app request count updated to 5

Seems to still want to use gpt-3.5. Any help is appreciated. Cheers,

my user_settings only includes ollama

image

also, in quivr_api/modules/chat/controller/chat/utils.py change line 45

default_model = "ollama/llama2" or whatever you want.

as you can see they hardcoded it to the gpt-3.5 and there is no consideration for other models.

the lastest version has no models column in user_settings table, but has a new table named models. So I add ollama mistral model to this table, but when I send ask a question in quivr, the error is "TypeError : network error", and in ollama log, it received a post request with uri of "/chat/completions", post data is ok with model name "mistra", but as [https://github.com/ollama/ollama?tab=readme-ov-file#chat-with-a-model](ollama chat-with-a-model) says, quivr should post to uri of "/api/chat". So is the uri "/chat/completions" correct?

@c4801725870
Copy link

@caishanli need to make sure the model in models table has ollama/ as part of the string, literally ollama/mistral for example. While reading the code i notice it only detects ollama if it actually finds ollama as part of the model string. Simply putting mistral for model will detect as GPT and probably use wrong API which you experienced. This is why line 91 need to use string.split method to extract model type. If I just changed to llama3 instead of ollama/llama3 it would not hit that branch of code... Again this project needs a lot of work and you need to carefully examine the code to get it functioning. It is not in the author's interest to produce a repo that allows full offline functionality out of the box for free.

@caishanli
Copy link

@caishanli need to make sure the model in models table has ollama/ as part of the string, literally ollama/mistral for example. While reading the code i notice it only detects ollama if it actually finds ollama as part of the model string. Simply putting mistral for model will detect as GPT and probably use wrong API which you experienced. This is why line 91 need to use string.split method to extract model type. If I just changed to llama3 instead of ollama/llama3 it would not hit that branch of code... Again this project needs a lot of work and you need to carefully examine the code to get it functioning. It is not in the author's interest to produce a repo that allows full offline functionality out of the box for free.

Thank you for your reply! I tested ollama/mistral, the result is the same. The main problem is that quivr send wrong post data to ollama, shortly quivr use the wrong ollama rest api, so ollama can't reply correctly.
And code is not same as the , so how did the pull requester test this code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants