Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't deploy on multi-node cluster #5

Open
mausch opened this issue Apr 26, 2024 · 2 comments
Open

Can't deploy on multi-node cluster #5

mausch opened this issue Apr 26, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@mausch
Copy link

mausch commented Apr 26, 2024

When deploying on a multi-node cluster (EKS in my case but I guess it could be any other), there's a PVC clash between the model store and the model pod.
The model pod gets this error and so it cannot start:

Multi-Attach error for volume "pvc-63d894e9-1945-4ec7-988f-0fc6a08adc1a" Volume is already used by pod(s) ollama-models-store-0-x-ollama-operator-system-x-vcl-4497a69570
@nekomeowww nekomeowww self-assigned this Apr 26, 2024
@nekomeowww nekomeowww added the bug Something isn't working label Apr 26, 2024
@aep
Copy link

aep commented Jun 28, 2024

as far as i understand, the shared storage is required because one pod downloads the models, and the other runs it.

RWX storage is commonly NFS which is slow and buggy

a quick and easy solution might be to make the model storage a daemonset,
and contact the node-local one from the model pod

@ilyapaff
Copy link

The same problem.

Here it is necessary either to prohibit the use of RWO or to make a restriction in the documentation that it works only within one node.

Using a shared RWO is initially a mistake, since a Kubernetes cluster usually consists of several nodes.

The solution may be to get the model from the storage over the network. (without a shared disk)

Another solution is to store the model in the Model workload, without deploying a separate repository.
Each Models will have its own pvc, downloading it there at the first launch and saving the pvc after deleting the CR Model (or not saving it, it's unclear why we need the cached model if we deleted it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants