How do I run the Ollama [Bug]: #2777

FaizanMunsaf · 2024-06-28T13:04:15Z

What happened?

I am trying to run Ollama there's no specific command I found that run Ollama private model.
There's any further guidance available with current repo?

Relevant log output

Just looking forward to Ollama show into my bot, I will generate my answer using Ollama llama3-8b model!

Twitter / LinkedIn details

No response

linear · 2024-06-28T13:04:23Z

TECH-61 How do I run the Ollama [Bug]:

dosubot · 2024-06-28T13:06:12Z

Hey @FaizanMunsaf! I'm here to help you with the Ollama issue you're facing. I'm a bot designed to assist with bugs, questions, and contributions. Let's work on resolving this together.

To run the Ollama private model using the current repository, follow these steps:

Install Ollama:
Install Ollama from their website.
Run Ollama in the background:
Execute the following command to run Ollama:
```
ollama run llama2
```

Run Migrations:
Move the migration file and reset the database:

mv supabase/migrations/local_20240107152745_ollama.sql supabase/migrations/20240107152745_ollama.sql
supabase db reset

Add Ollama Model to Quivr:
Update the user_settings table in Supabase to include the Ollama models:
```
[
  "ollama/llama2",
  "ollama/mistral"
]
```
If you want to add it to your user only, you can add it to the models column in the user_settings table. If you want to set it as default for all new users, drop the entire table with the following command:
```
DELETE TABLE user_settings;
```
Configure .env File:
Update your .env file to activate the Ollama API by uncommenting and setting the OLLAMA_API_BASE_URL:
```
# LOCAL
OLLAMA_API_BASE_URL=http://host.docker.internal:11434
```

Integrate with Quivr:
Ensure that the QuivrRAG class in quivr_rag.py is properly configured to use the Ollama model. The ChatLiteLLM class should be instantiated with the api_base parameter set to the OLLAMA_API_BASE_URL if the model starts with "ollama". This is handled in the _create_llm method of the QuivrRAG class:

def _create_llm(
    self,
    callbacks,
    model,
    streaming=False,
    temperature=0,
) -> BaseLLM:
    if streaming and callbacks is None:
        raise ValueError(
            "Callbacks must be provided when using streaming language models"
        )

    api_base = None
    if self.brain_settings.ollama_api_base_url and model.startswith("ollama"):
        api_base = self.brain_settings.ollama_api_base_url

    return ChatLiteLLM(
        temperature=temperature,
        max_tokens=self.max_tokens,
        model=model,
        streaming=streaming,
        verbose=False,
        callbacks=callbacks,
        api_base=api_base,
    )

By following these steps, you will have the Ollama model running and integrated with Quivr ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

roaidev · 2024-06-30T22:13:08Z

real problem is getting it to work with llama3. how do we do that?

tomgeorgekap · 2024-07-10T14:47:03Z

how to run with llama3

c4801725870 · 2024-07-28T04:31:31Z

If you look at the "brains" table there is a "model" column and they don't give you a way to change the model initially - the code will default to gpt-3.5-turbo-0125 if this value is NULL. so put "ollama/llama2" in the row for your brain under the model column. unfortunately i get a new error, openai.NotFoundError: 404 page not found... it looks like it is still trying to use openAI for some reason.

edit: the reason was the API base url is different for actually chatting with the bot. look at my post below for the solution. i don't know if this was a coincidence because ollama2 API is now compatible with openai format. And at first glance it looks like the code is still hardcoded to use ChatOpenAI, see backend/core/quivr_core/llm/llm_endpoint.py
_llm = ChatOpenAI( model=config.model, api_key=SecretStr(config.llm_api_key) if config.llm_api_key else None, base_url=config.llm_base_url, )

it works anyways, hopefully the authors don't intentionally keep breaking the ollama functionality.

c4801725870 · 2024-07-28T04:59:05Z

I got further along, the 404 error is from quivr_api/modules/rag_service/rag_service.py, change line 92 and add + "/v1" to ollama_url in the llm_base_url assignment.

            llm_base_url=ollama_url + "/v1",

now I get a different error, it just says the model is not found

| openai.NotFoundError: Error code: 404 - {'error': {'message': 'model "ollama/llama2" not found, try pulling it first', 'type': 'api_error', 'param': None, 'code': None}}

so the API working, we just sent it a bad model. it probably expects "llama2"

so change line 91

            model="llama2",

hardcode it, who cares for now.

thus, now, is working.

it is complete

c4801725870 · 2024-07-28T05:17:49Z

real problem is getting it to work with llama3. how do we do that?

as long as the API works the same you literally just type in llama3 wherever you put llama2 before.

PFLigthart · 2024-07-29T15:28:04Z

@c4801725870 I have tried and still cannot seem to get it to work. Here is my complete process I just followed:

Already have ollama installed and have llama3.1 locally which works and is running in a terminal
Download docker desktop (I got 4.32.0)
git clone https://github.com/quivrhq/quivr.git && cd quivr
cp .env.example .env
vim .env
Comment OPENAI_API_KEY, uncomment Ollama
vim docker-compose.yml
Change line 53 from backend-base:latest -->stangirard/quivr-backend-prebuilt:latest
cd backend && supabase start
cd ../
docker compose pull
docker compose up
Go to localhost:3000
Sign in with admin@quivr.app -> password: admin
Create username and fill in the details on the Welcome to quivr popup
Go to localhost:54323
Update user_settings table models column to read ["gpt-3.5-turbo-0125", "gpt-4-0125-preview", "ollama/llama3.1"]
Go to quivr/api/quivr_api/modules/rag_service/rag_service.py, change line 92 and add + "/v1" to ollama_url in the llm_base_url assignment. (as indicated above)
Also change line 91 to read model="llama3.1"
Create a new brain using the auto-popup after the first sign-in
Brain seems to be created correctly with a green popup confirmation (did change the brain to use ollama/llama3.1 from drop down)
Data upload also seems to work correctly with a green popup
Asking a question results in a Red popup "TypeError: network error"
Start of terminal error:
backend-core | [INFO] quivr_api.modules.chat.controller.chat_routes [chat_routes.py:212]: Creating question for chat 9c1eb*814b with brain 926c6a70 of type <class 'uuid.UUID'>
backend-core | [INFO] quivr_api.modules.chat.controller.chat.utils [utils.py:79]: Model to use: name='gpt-3.5-turbo-0125' price=1 max_input=4000 max_output=1000
backend-core | [INFO] quivr_api.modules.user.service.user_usage [user_usage.py:97]: User admin@quivr.app request count updated to 5

Seems to still want to use gpt-3.5. Any help is appreciated.
Cheers,

c4801725870 · 2024-07-29T22:41:40Z

@c4801725870 I have tried and still cannot seem to get it to work. Here is my complete process I just followed:

Already have ollama installed and have llama3.1 locally which works and is running in a terminal

Download docker desktop (I got 4.32.0)

git clone https://github.com/quivrhq/quivr.git && cd quivr

cp .env.example .env

vim .env

Comment OPENAI_API_KEY, uncomment Ollama

vim docker-compose.yml

Change line 53 from backend-base:latest -->stangirard/quivr-backend-prebuilt:latest

cd backend && supabase start

cd ../

docker compose pull

docker compose up

Go to localhost:3000

Sign in with admin@quivr.app -> password: admin

Create username and fill in the details on the Welcome to quivr popup

Go to localhost:54323

Update user_settings table models column to read ["gpt-3.5-turbo-0125", "gpt-4-0125-preview", "ollama/llama3.1"]

Go to quivr/api/quivr_api/modules/rag_service/rag_service.py, change line 92 and add + "/v1" to ollama_url in the llm_base_url assignment. (as indicated above)

Also change line 91 to read model="llama3.1"

Create a new brain using the auto-popup after the first sign-in

Brain seems to be created correctly with a green popup confirmation (did change the brain to use ollama/llama3.1 from drop down)

Data upload also seems to work correctly with a green popup

Asking a question results in a Red popup "TypeError: network error"

Start of terminal error:
backend-core | [INFO] quivr_api.modules.chat.controller.chat_routes [chat_routes.py:212]: Creating question for chat 9c1eb*814b with brain 926c6a70 of type <class 'uuid.UUID'>
backend-core | [INFO] quivr_api.modules.chat.controller.chat.utils [utils.py:79]: Model to use: name='gpt-3.5-turbo-0125' price=1 max_input=4000 max_output=1000
backend-core | [INFO] quivr_api.modules.user.service.user_usage [user_usage.py:97]: User admin@quivr.app request count updated to 5

Seems to still want to use gpt-3.5. Any help is appreciated. Cheers,

my user_settings only includes ollama

also, in quivr_api/modules/chat/controller/chat/utils.py change line 45

default_model = "ollama/llama2"
or whatever you want.

as you can see they hardcoded it to the gpt-3.5 and there is no consideration for other models.

c4801725870 · 2024-07-30T04:47:08Z

@PFLigthart i took another look to see if i can make it so i can switch models. Unfortunately we run into the issue that the "default model" referenced in quivr_api/modules/chat/controller/chat/utils.py is called upon brain creation, because the row in the Brains table defaults "model" to NULL. Also I thought it would be worth mentioning, you need to add the model to the models table, just adding a row for the new model.

I proved this by changing the code to use "brain_model" rather than "default_model".
I created a new brain, but it submitted the data to the wrong model(it used the default model)

`# If brain.model is None, set it to the default_model
if brain_model is None:
brain_model = default_model

model_to_use = LLMModel(  # TODO Implement default models in database
    name=brain_model, price=1, max_input=4000, max_output=1000
)

logger.debug("Brain model: %s", brain_model)


is_brain_model_available = any(
    brain_model == model_dict.get("name") for model_dict in models_settings
)

is_user_allowed_model = brain_model in user_settings.get(
    "models", [brain_model]
)  # Checks if the model is available in the list of models

logger.debug(f"Brain model: {brain_model}")
logger.debug(f"User models: {user_settings.get('models', [])}")
logger.debug(f"Model available: {is_brain_model_available}")
logger.debug(f"User allowed model: {is_user_allowed_model}")

if is_brain_model_available and is_user_allowed_model:
    # Use the model from the brain
    model_to_use.name = brain_model
    for model_dict in models_settings:
        if model_dict.get("name") == model_to_use.name:
            model_to_use.price = model_dict.get("price")
            model_to_use.max_input = model_dict.get("max_input")
            model_to_use.max_output = model_dict.get("max_output")
            break

logger.info(f"Model to use: {model_to_use}")

return model_to_use`

for the quivr_api/modules/rag_service/rag_service.py change line 91 to
model=self.model_to_use.name.split('/')[1],
this last part will eliminate having to modify this code in the future. Now you just need to edit the default model when creating a new brain.

The bad news I have so far is the new code seems really bad at tokenizing the data input when the brain is created. This project needs a lot of work.

PFLigthart · 2024-08-05T10:34:46Z

@c4801725870 this worked. Thank you so much. I was also able to get it to work with llama3 and llama3.1 by hardcoding the relevant entries. Appreciate it.
Cheers.

caishanli · 2024-08-20T16:58:05Z

@c4801725870 I have tried and still cannot seem to get it to work. Here is my complete process I just followed:

Already have ollama installed and have llama3.1 locally which works and is running in a terminal

Download docker desktop (I got 4.32.0)

git clone https://github.com/quivrhq/quivr.git && cd quivr

cp .env.example .env

vim .env

Comment OPENAI_API_KEY, uncomment Ollama

vim docker-compose.yml

Change line 53 from backend-base:latest -->stangirard/quivr-backend-prebuilt:latest

cd backend && supabase start

cd ../

docker compose pull

docker compose up

Go to localhost:3000

Sign in with admin@quivr.app -> password: admin

Create username and fill in the details on the Welcome to quivr popup

Go to localhost:54323

Update user_settings table models column to read ["gpt-3.5-turbo-0125", "gpt-4-0125-preview", "ollama/llama3.1"]

Go to quivr/api/quivr_api/modules/rag_service/rag_service.py, change line 92 and add + "/v1" to ollama_url in the llm_base_url assignment. (as indicated above)

Also change line 91 to read model="llama3.1"

Create a new brain using the auto-popup after the first sign-in

Brain seems to be created correctly with a green popup confirmation (did change the brain to use ollama/llama3.1 from drop down)

Data upload also seems to work correctly with a green popup

Asking a question results in a Red popup "TypeError: network error"

Start of terminal error:
backend-core | [INFO] quivr_api.modules.chat.controller.chat_routes [chat_routes.py:212]: Creating question for chat 9c1eb*814b with brain 926c6a70 of type <class 'uuid.UUID'>
backend-core | [INFO] quivr_api.modules.chat.controller.chat.utils [utils.py:79]: Model to use: name='gpt-3.5-turbo-0125' price=1 max_input=4000 max_output=1000
backend-core | [INFO] quivr_api.modules.user.service.user_usage [user_usage.py:97]: User admin@quivr.app request count updated to 5

Seems to still want to use gpt-3.5. Any help is appreciated. Cheers,

my user_settings only includes ollama

also, in quivr_api/modules/chat/controller/chat/utils.py change line 45

default_model = "ollama/llama2" or whatever you want.

as you can see they hardcoded it to the gpt-3.5 and there is no consideration for other models.

the lastest version has no models column in user_settings table, but has a new table named models. So I add ollama mistral model to this table, but when I send ask a question in quivr, the error is "TypeError : network error", and in ollama log, it received a post request with uri of "/chat/completions", post data is ok with model name "mistra", but as [https://github.com/ollama/ollama?tab=readme-ov-file#chat-with-a-model](ollama chat-with-a-model) says, quivr should post to uri of "/api/chat". So is the uri "/chat/completions" correct?

c4801725870 · 2024-08-20T19:28:22Z

@caishanli need to make sure the model in models table has ollama/ as part of the string, literally ollama/mistral for example. While reading the code i notice it only detects ollama if it actually finds ollama as part of the model string. Simply putting mistral for model will detect as GPT and probably use wrong API which you experienced. This is why line 91 need to use string.split method to extract model type. If I just changed to llama3 instead of ollama/llama3 it would not hit that branch of code... Again this project needs a lot of work and you need to carefully examine the code to get it functioning. It is not in the author's interest to produce a repo that allows full offline functionality out of the box for free.

caishanli · 2024-08-24T06:19:08Z

@caishanli need to make sure the model in models table has ollama/ as part of the string, literally ollama/mistral for example. While reading the code i notice it only detects ollama if it actually finds ollama as part of the model string. Simply putting mistral for model will detect as GPT and probably use wrong API which you experienced. This is why line 91 need to use string.split method to extract model type. If I just changed to llama3 instead of ollama/llama3 it would not hit that branch of code... Again this project needs a lot of work and you need to carefully examine the code to get it functioning. It is not in the author's interest to produce a repo that allows full offline functionality out of the box for free.

Thank you for your reply! I tested ollama/mistral, the result is the same. The main problem is that quivr send wrong post data to ollama, shortly quivr use the wrong ollama rest api, so ollama can't reply correctly.
And code is not same as the , so how did the pull requester test this code?

FaizanMunsaf added the bug Something isn't working label Jun 28, 2024

PFLigthart mentioned this issue Jul 22, 2024

[Bug]: Error response from daemon: No such image: backend-base:latest #2885

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I run the Ollama [Bug]: #2777

How do I run the Ollama [Bug]: #2777

FaizanMunsaf commented Jun 28, 2024

linear bot commented Jun 28, 2024

dosubot bot commented Jun 28, 2024

roaidev commented Jun 30, 2024

tomgeorgekap commented Jul 10, 2024

c4801725870 commented Jul 28, 2024 •

edited

Loading

c4801725870 commented Jul 28, 2024

c4801725870 commented Jul 28, 2024

PFLigthart commented Jul 29, 2024

c4801725870 commented Jul 29, 2024

c4801725870 commented Jul 30, 2024

PFLigthart commented Aug 5, 2024 •

edited by linear bot

Loading

caishanli commented Aug 20, 2024

c4801725870 commented Aug 20, 2024

caishanli commented Aug 24, 2024

How do I run the Ollama [Bug]: #2777

How do I run the Ollama [Bug]: #2777

Comments

FaizanMunsaf commented Jun 28, 2024

What happened?

Relevant log output

Twitter / LinkedIn details

linear bot commented Jun 28, 2024

dosubot bot commented Jun 28, 2024

roaidev commented Jun 30, 2024

tomgeorgekap commented Jul 10, 2024

c4801725870 commented Jul 28, 2024 • edited Loading

c4801725870 commented Jul 28, 2024

c4801725870 commented Jul 28, 2024

PFLigthart commented Jul 29, 2024

c4801725870 commented Jul 29, 2024

c4801725870 commented Jul 30, 2024

PFLigthart commented Aug 5, 2024 • edited by linear bot Loading

caishanli commented Aug 20, 2024

c4801725870 commented Aug 20, 2024

caishanli commented Aug 24, 2024

c4801725870 commented Jul 28, 2024 •

edited

Loading

PFLigthart commented Aug 5, 2024 •

edited by linear bot

Loading