This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

feat: multiple sessions with thread-local loaded models #134

Closed

jon-chuang opened this issue Apr 12, 2023 · 0 comments

Contributor

jon-chuang commented Apr 12, 2023 •

edited

Loading

Best supported with mmapped ggjt for shared read-only access to model weights. Will be enabled by: #129

Use case:

Low-resource server-side inference of multiple user contexts
- I think this could save people a lot of money, if they don't have to perform inference for server-side on GPUs.

See: #95 (comment)

jon-chuang closed this as not planned

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.