Skip to content

Project to test RAG with Ollama, PGVector and some AWS systems

Notifications You must be signed in to change notification settings

kappaj2/customerai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spring AI RAG implementation and related projects

This project will be the core project where we will be testing various features of the Spring AI framework. Will will be listening to an AWS SQS queue to consume messages. These will be unpacked and saved into our Vector database. This will allow us to use RAG to enhance our customer query data.

Getting Your Development Environment Setup

Recommended Versions

Recommended Reference Notes
Java 23 JDK sdk install java 23-zulu Java 23 will be used in these projects
IntelliJ 2024 or Higher Download Ultimate Edition recommended. Students can get a free 120 trial license here
Maven 3.9.6 or higher Download Installation Instructions
Docker Installation Instructions
Ollama Download Installation Instructions
Weather Service Installation Instructions

Setup the local environment to run this application

Start Ollama with local LLM of llama3.2

 ollama run llama3.2

Once the Ollama is running, you can start the Spring Boot application. Also - in the ~/.ollama/logs folder, you have a server.log file. Open this up and look for the following bits:

llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 3B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
llama_model_loader: - kv   5:                         general.size_label str              = 3B
llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   8:                          llama.block_count u32              = 28
llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
llama_model_loader: - kv  10:                     llama.embedding_length u32              = 3072

Here we have the embedding_length. This must be set the same when the VectorDatase properties is set in the application.yaml file, else you will have chunk sizing issues reading the data again. Take note that this value is set when the vector_store database table is created! Another way to get these parameters is to use a crul command:

curl http://localhost:11434/api/show -d '{
  "name": "llama3.2"
}'

Format the response to proper Json and you will find these properties:

  ],
    "general.type": "model",
    "llama.attention.head_count": 24,
    "llama.attention.head_count_kv": 8,
    "llama.attention.key_length": 128,
    "llama.attention.layer_norm_rms_epsilon": 0.00001,
    "llama.attention.value_length": 128,
    "llama.block_count": 28,
    "llama.context_length": 131072,
    "llama.embedding_length": 3072,
    "llama.feed_forward_length": 8192,
    "llama.rope.dimension_count": 128,
    "llama.rope.freq_base": 500000,
    "llama.vocab_size": 128256,
   

Choosing the right model is important. One of the limitations is the chunking size in the PGVector DB. For this we need to have the right embedding model. In this app we will be using the mxbai-embed-large model. Reference on the following page: Ollama Embedding Models

The embedding size is now 1024. This is well withing the 2000 PGVector limit and the model is supported by Ollama. To pull the model for use in Ollama, use the following command:

ollama pull mxbai-embed-large

You can test the embedding model by using the following command:

  "model": "mxbai-embed-large",
  "prompt": "Llamas are members of the camelid family"
}'

Start Open WebUI so we can get the Ollama UI

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Start PGVector DB

docker run -d --name postgres -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres pgvector/pgvector:0.7.4-pg16

Configure the PGVector DB indexes: PGVector details

We also have a JPA entity that maps onto the vector table. So run the following to initialize the table so JPA is happy.

create table public.vector_store
(
    id        uuid default uuid_generate_v4() not null primary key,
    content   text,
    metadata  json,
    embedding vector(1024)
);

alter table public.vector_store
    owner to customerai;

create index spring_ai_vector_index
    on public.vector_store using hnsw (embedding public.vector_cosine_ops);

Configure customerai user in PostgreSQL

For this we are simply using local PostgreSQL instance, so not using encrypted passwords and fancy stuff.
The following commands will sort out datasource for us:

CREATE DATABASE customerai;
CREATE USER customerai WITH PASSWORD 'customerai';
CREATE SCHEMA IF NOT EXISTS customerai AUTHORIZATION customerai;
GRANT ALL PRIVILEGES ON SCHEMA customerai TO customerai;
ALTER ROLE customerai WITH LOGIN;

https://docs.spring.io/spring-ai/reference/api/vectordbs/pgvector.html

Setting up SpringAI with Ollama

Setting up Functions to be used by the LLM

The first function will use the weather api to retrieve current weather information. This is a free service where we can get an api key to call it.

Advisors in Spring AI

Spring AI has the concept of advisors. Advisors are used to provide additional information to the LLM. They are also used to transform the input and output of the LLM. One big use case is for sharing context accross multiple calls.
Below is a very good article on advisors:
Advisor Implementation

Reference Documentation

For further reference, please consider the following sections:

Testcontainers support

This project uses Testcontainers at development time.

Testcontainers has been configured to use the following Docker images:

Maven Parent overrides

Due to Maven's design, elements are inherited from the parent POM to the project POM. While most of the inheritance is fine, it also inherits unwanted elements like <license> and <developers> from the parent. To prevent this, the project POM contains empty overrides for these elements. If you manually switch to a different parent and actually want the inheritance, you need to remove those overrides.

About

Project to test RAG with Ollama, PGVector and some AWS systems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published