-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: offset indices in sparse tensor #3725
base: main
Are you sure you want to change the base?
feat: offset indices in sparse tensor #3725
Conversation
…s-indices-to-offsets
CodSpeed Performance ReportMerging #3725 will improve performances by 82.23%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3725 +/- ##
==========================================
+ Coverage 77.31% 77.86% +0.54%
==========================================
Files 734 735 +1
Lines 93737 93128 -609
==========================================
+ Hits 72475 72513 +38
+ Misses 21262 20615 -647
|
…ithub.com/itzhakstern/Daft into itzhaks/sparse-tensors-indices-to-offsets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still combing through the PR, but just have a minor nit that I'd like to get out of the way first.
src/daft-core/src/array/ops/cast.rs
Outdated
non_zero_values.push(data); | ||
non_zero_indices.push(indices); | ||
non_zero_indices.push(ofssets_indices_arr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: typo here. ofssets_indices_arr
should be offsets_indices_arr
.
@itzhakstern I know the PR is still in draft, but if you don't mind, could you please add a description on what this PR is trying to achieve? I can gather that it's enabling registering index offsets in sparse tensors, but having a succinct description would be helpful for readability into the PR for other reviewers as well. Thanks in advance! |
…s-indices-to-offsets
Change the indices of sparse tensors from positions of non-zero elements to offsets.
For example, if we have a tensor:
[0, 29, 0, 0, 17, 6]
its sparse representation would be:
Instead, we want to store the differences (offsets) between consecutive indices:
This approach seems to significantly reduce the size of compressed Parquet files, as the indices diffs contain repetitive patterns, which compress well.