Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RowFeatureIndex Optimization #531

Merged
merged 13 commits into from
Dec 17, 2024
Merged

Conversation

polinabinder1
Copy link
Collaborator

With this PR, if an identical dataframe is appended to RowFeatureIndex as the last one it, instead of storing it, the counter corresponding to the previous dataframe is incremented. This becomes an issue in very large datasets.

Previously, if we had dataframe A corresponding to row [0,2] and we wanted to add the same dataframe A corresponding to 4 more rows, we would store dataframe A twice, with the first copy corresponding to rows [0,2] and the second to [2,6].

Now, we would store dataframe A once and it would correspond to rows [0,6].

@polinabinder1
Copy link
Collaborator Author

/build-ci

@skothenhill-nv
Copy link
Collaborator

Is the Megatron-LM change intentional? or do we need to do the git submodule update recursive thing. it shows a diff again main, which seems weird.

polinabinder1 and others added 3 commits December 16, 2024 13:45
…ndex.py

Co-authored-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
@polinabinder1
Copy link
Collaborator Author

/build-ci

@polinabinder1
Copy link
Collaborator Author

/build-ci

@polinabinder1 polinabinder1 enabled auto-merge (squash) December 16, 2024 23:30
@polinabinder1
Copy link
Collaborator Author

/build-ci

@polinabinder1
Copy link
Collaborator Author

/build-ci

@polinabinder1
Copy link
Collaborator Author

/build-ci

@polinabinder1
Copy link
Collaborator Author

/build-ci

@polinabinder1 polinabinder1 merged commit ff5ce98 into main Dec 17, 2024
4 checks passed
@polinabinder1 polinabinder1 deleted the polinabinder/small_row_feat_index branch December 17, 2024 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants