-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plugin: cache hook with diskcache
#684
Merged
Merged
Changes from 19 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
37c4fb4
diskcache hook added
9d15a7a
tests added
51be118
examples/cache_hook added
961aa5f
moved hash_callable to graph_utils
ed46976
sf-hamilton[diskcache] added
5dfe25d
pre-commit hooks applied
d92e78e
added test requirements
f68ba55
test import fixed
162babe
pre-commit fixed
dc330c3
strips comments and docstring before hashing source code
d9e5205
add dynamic module to linecache; inspect can be used
47cf088
strip feature for hashing source code
e195e1f
strip function indent fixed
4040708
pre-commit hook
e48f085
fixed type annotations for 3.8
f1fac0a
docstring remover now uses AST; added tests; added documentation
c0c3747
compare CacheHook to CachingGraphAdapter
1604247
strip whitespaces in attempt to fix 3.8 errors
315c4e7
pre-commit fix (again)
894de21
skip tests in Python 3.8
68ff666
pytest version notation fixed
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Cache hook | ||
This hook uses the [diskcache](https://grantjenks.com/docs/diskcache/tutorial.html) to cache node execution on disk. The cache key is a tuple of the function's `(source code, input a, ..., input n)`. | ||
|
||
> 💡 This can be a great tool for developing inside a Jupyter notebook or other interactive environments. | ||
|
||
Disk cache has great features to: | ||
- set maximum cache size | ||
- set automated eviction policy once maximum size is reached | ||
- allow custom `Disk` implementations to change the serialization protocol (e.g., pickle, JSON) | ||
|
||
> ⚠ The default `Disk` serializes objects using the `pickle` module. Changing Python or library versions could break your cache (both keys and values). Learn more about [caveats](https://grantjenks.com/docs/diskcache/tutorial.html#caveats). | ||
|
||
> ❓ To store artifacts robustly, please use Hamilton materializers or the [CachingGraphAdapter](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/caching_nodes) instead. The `CachingGraphAdapter` stores tagged nodes directly on the file system using common formats (JSON, CSV, Parquet, etc.). However, it isn't aware of your function version and requires you to manually manage your disk space. | ||
|
||
|
||
# How to use it | ||
## Use the hook | ||
Find it under plugins at `hamilton.plugins.h_diskcache` and add it to your Driver definition. | ||
|
||
```python | ||
from hamilton import driver | ||
from hamilton.plugins import h_diskcache | ||
import functions | ||
|
||
dr = ( | ||
driver.Builder() | ||
.with_modules(functions) | ||
.with_adapters(h_diskcache.CacheHook()) | ||
.build() | ||
) | ||
``` | ||
|
||
## Inspect the hook | ||
To inspect the caching behavior in real-time, you can get the logger: | ||
|
||
```python | ||
logger = logging.getLogger("hamilton.plugins.h_diskcache") | ||
logger.setLevel(logging.DEBUG) # or logging.INFO | ||
logger.addHandler(logging.StreamHandler()) | ||
``` | ||
- INFO will only return the total cache after executing the Driver | ||
- DEBUG will return inputs for each node and specify if the value is `from cache` or `executed` | ||
|
||
## Clear cache | ||
The utility function `h_diskcache.evict_except_driver` allows you to clear cached values for all nodes except those in the passed driver. This is an efficient tool to clear old artifacts as your project evolves. | ||
|
||
```python | ||
from hamilton import driver | ||
from hamilton.plugins import h_diskcache | ||
import functions | ||
|
||
dr = ( | ||
driver.Builder() | ||
.with_modules(functions) | ||
.with_adapters(h_diskcache.CacheHook()) | ||
.build() | ||
) | ||
h_diskcache_evict_except_driver(dr) | ||
``` | ||
|
||
## Cache settings | ||
Find all the cache settings in the [diskcache docs](https://grantjenks.com/docs/diskcache/api.html#constants). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
def A(external: int) -> int: | ||
return external % 7 + 1 | ||
|
||
|
||
def B(A: int) -> float: | ||
return A / 4 | ||
|
||
|
||
def C(A: int, B: float) -> float: | ||
return A**2 + B |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from hamilton import driver\n", | ||
"from hamilton.plugins import h_diskcache\n", | ||
"\n", | ||
"import functions" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import logging\n", | ||
"\n", | ||
"# get the plugin logger\n", | ||
"logger = logging.getLogger(\"hamilton.plugins.h_diskcache\")\n", | ||
"logger.setLevel(logging.DEBUG) # set logging.INFO for less info\n", | ||
"logger.addHandler(logging.StreamHandler())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"A {'external': 10}: from cache\n", | ||
"B {'A': 4}: from cache\n", | ||
"C {'A': 4, 'B': 1.0}: from cache\n", | ||
"Cache size: 0.03 MB\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"dr = (\n", | ||
" driver.Builder()\n", | ||
" .with_modules(functions)\n", | ||
" .with_adapters(h_diskcache.CacheHook())\n", | ||
" .build()\n", | ||
")\n", | ||
"# if you ran `run.py`, you should see the nodes being\n", | ||
"# read from cache\n", | ||
"results = dr.execute([\"C\"], inputs=dict(external=10))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.9" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
sf-hamilton[diskcache] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
import logging | ||
|
||
import functions | ||
|
||
from hamilton import driver | ||
from hamilton.plugins import h_diskcache | ||
|
||
|
||
def main(): | ||
dr = driver.Builder().with_modules(functions).with_adapters(h_diskcache.CacheHook()).build() | ||
results = dr.execute(["C"], inputs=dict(external=10)) | ||
print(results) | ||
|
||
|
||
if __name__ == "__main__": | ||
logger = logging.getLogger("hamilton.plugins.h_diskcache") | ||
logger.setLevel(logging.DEBUG) | ||
logger.addHandler(logging.StreamHandler()) | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this function, the
CacheHook
doesn't work in notebooks because it won't be able to useinspect
to get the source code and create a hash.The alternative would be to use the functions's
__code__
attribute, but this would dependent on the Python version. It might not be an issue because the cache's pickling is already Python version dependentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used? Think I'm missing it... But that makes sense I think. You're talking about temporary modules, right?
With temporary modules you'd have to special-case them, go through every function and use
inspect
on them.