Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to cache_resource for Document Index #54

Merged
merged 1 commit into from
May 20, 2024

Conversation

JoepdeJong
Copy link
Contributor

This PR solves the issue of a missing _model in the Index after loading it from cache.

It seems that it is better to use another caching decorator.

As described in https://docs.streamlit.io/develop/concepts/architecture/caching

st.cache_resource is the recommended way to cache global resources like ML models or database connections – unserializable objects that you don't want to load multiple times. Using it, you can share these resources across all reruns and sessions of an app without copying or duplication. Note that any mutations to the cached return value directly mutate the object in the cache (more details below).

Closes #53

@JoepdeJong JoepdeJong changed the title Swich to cache_resource for Document Index Switch to cache_resource for Document Index May 20, 2024
@jonfairbanks jonfairbanks self-assigned this May 20, 2024
@jonfairbanks
Copy link
Owner

Thank you for this PR! Caching has definitely been a headache here.

@jonfairbanks jonfairbanks merged commit a7a6808 into jonfairbanks:develop May 20, 2024
@jonfairbanks
Copy link
Owner

jonfairbanks commented May 24, 2024

Actually since we are using _documents here, the underscore tells Streamlit to not cache that particular resource. Removing the underscore will result in an error from Streamlit.

I'll merge this up to the main branch but technically nothing is being cached in this function.

@JoepdeJong
Copy link
Contributor Author

Actually since we are using _documents here, the underscore tells Streamlit to not cache that particular resource. Removing the underscore will result in an error from Streamlit.

I'll merge this up to the main branch but technically nothing is being cached in this function.

Placing an underscore in front of a parameter to exclude it from caching works, as far as i know, only for hashable objects (https://docs.streamlit.io/develop/concepts/architecture/caching#excluding-input-parameters).

Since cache_resource does not create a copy, but returns the same value every time, no hashing is required for this decorator.

Not creating a copy means there's just one global instance of the cached return object, which saves memory, e.g. when using a large ML model. In computer science terms, we create a singleton. https://docs.streamlit.io/develop/concepts/architecture/caching#behavior-1

This should also explain why _model is missing when using @st.cache_data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError: 'HuggingFaceEmbedding' object has no attribute '_model'
2 participants