-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set min seq len by default #621
Conversation
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TY!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this change to finetune Datamodule here:
https://github.com/NVIDIA/bionemo-framework/blob/main/sub-packages/bionemo-esm2/src/bionemo/esm2/model/finetune/datamodule.py
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #621 +/- ##
=======================================
Coverage 86.62% 86.62%
=======================================
Files 116 116
Lines 6961 6961
=======================================
Hits 6030 6030
Misses 931 931 ☔ View full report in Codecov by Sentry. |
### Description In https://nvbugspro.nvidia.com/bug/5060664 they notice a warning message about performance when pretraining with variable sequence lengths. This is largely an oversight since our test scripts didn't set both minimum and maximum seq_lens. We should have the default if min_seq_length is omitted be to just pad to the maximum sequence length for performance reasons. ### Type of changes <!-- Mark the relevant option with an [x] --> - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [x] I have tested these changes locally - [x] I have updated the documentation accordingly - [x] I have added/updated tests as needed - [x] All existing tests pass successfully Signed-off-by: Peter St. John <pstjohn@nvidia.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>
Description
In https://nvbugspro.nvidia.com/bug/5060664 they notice a warning message about performance when pretraining with variable sequence lengths. This is largely an oversight since our test scripts didn't set both minimum and maximum seq_lens. We should have the default if min_seq_length is omitted be to just pad to the maximum sequence length for performance reasons.
Type of changes
CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:
Note
By default, the notebooks validation tests are skipped unless explicitly enabled.
Usage
Pre-submit Checklist