-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure actual spilled bytes, not output of sizeof() #5805
Conversation
1f18c30
to
1889810
Compare
2f1611f
to
a897741
Compare
@crusaderky it looks like we are having a related failure in a |
@crusaderky With the changes on this PR, Does this documentation need extra information? |
updated |
This is... a major pain. The problem is that Python 3.7/3.8 CI installs neither lz4 nor snappy, so Edit: worked around for now. Tracked at #5807 |
6088682
to
905defa
Compare
All green except issues already spotted in parent PR. Ready for review (but not merge!) |
I think we are missing renaming some things in the dashboard to be consistent with the changes done in this PR, unless we think it is clear enough that "managed" is "managed_in_memory".
|
I've renamed the variables for clarity. Note that the measure being shown was already managed_in_memory. The choice to just write "managed" in the UI is a conscious one to avoid confusing users. |
@shwina given you recent work with visualizing RMM data, this PR may be of interest as it illuminates issues with how spilled data is visualized |
727ea10
to
85faa5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the comments/requests were addressed in 905defa
IMPORTANT NOTE🚨 🚨 🚨 🚨 🚨 🚨 |
Closes #5364
#5543 changed the measure of spilled memory from the output of
sizeof()
to the actual number of bytes written to disk. This introduced a substantial difference in case ofsizeof()
returns an inaccurate output*Compression upon spill requires lz4 or snappy to be installed. Notably, in our CI they're exclusively installed on Python 3.9. See #5807.
The same PR introduced a regression, where this new difference would add or detract to the
managed_in_memory
measure, which in turn would detract or add to theunmanaged
memory.Example
"x" * (50 * 2**20)
occupies 50 MiB in RAM, but only 200 kiB on disk.Before #5543, you would see 50 MiB on the dashboard both if the value was in RAM or spilled to disk.
After #5543, when the value was spilled to disk you would read on the dashboard:
This PR removes the above artifact.
Demo
Before #5543:
After both PRs: