Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

feat: multiple sessions with thread-local loaded models #134

Closed
jon-chuang opened this issue Apr 12, 2023 · 0 comments
Closed

feat: multiple sessions with thread-local loaded models #134

jon-chuang opened this issue Apr 12, 2023 · 0 comments

Comments

@jon-chuang
Copy link
Contributor

jon-chuang commented Apr 12, 2023

Best supported with mmapped ggjt for shared read-only access to model weights. Will be enabled by: #129

Use case:

  1. Low-resource server-side inference of multiple user contexts
    • I think this could save people a lot of money, if they don't have to perform inference for server-side on GPUs.

See: #95 (comment)

@jon-chuang jon-chuang closed this as not planned Won't fix, can't repro, duplicate, stale Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant