Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set min seq len by default #621

Merged
merged 1 commit into from
Jan 18, 2025
Merged

Conversation

pstjohn
Copy link
Collaborator

@pstjohn pstjohn commented Jan 17, 2025

Description

In https://nvbugspro.nvidia.com/bug/5060664 they notice a warning message about performance when pretraining with variable sequence lengths. This is largely an oversight since our test scripts didn't set both minimum and maximum seq_lens. We should have the default if min_seq_length is omitted be to just pad to the maximum sequence length for performance reasons.

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Usage

TODO: Add code snippet

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Copy link
Collaborator

@sichu2023 sichu2023 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY!

Copy link
Collaborator

@farhadrgh farhadrgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pstjohn pstjohn enabled auto-merge January 17, 2025 20:58
@pstjohn pstjohn disabled auto-merge January 17, 2025 20:58
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.62%. Comparing base (7f9dd97) to head (addc0ad).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #621   +/-   ##
=======================================
  Coverage   86.62%   86.62%           
=======================================
  Files         116      116           
  Lines        6961     6961           
=======================================
  Hits         6030     6030           
  Misses        931      931           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pstjohn pstjohn added this pull request to the merge queue Jan 18, 2025
Merged via the queue into NVIDIA:main with commit 0c990a7 Jan 18, 2025
7 of 15 checks passed
@pstjohn pstjohn deleted the pstjohn/set-min-seq-len branch January 18, 2025 02:36
polinabinder1 pushed a commit that referenced this pull request Jan 22, 2025
### Description
In https://nvbugspro.nvidia.com/bug/5060664 they notice a warning
message about performance when pretraining with variable sequence
lengths. This is largely an oversight since our test scripts didn't set
both minimum and maximum seq_lens. We should have the default if
min_seq_length is omitted be to just pad to the maximum sequence length
for performance reasons.

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [x] I have tested these changes locally
 - [x] I have updated the documentation accordingly
 - [x] I have added/updated tests as needed
 - [x] All existing tests pass successfully

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Polina Binder <pbinder@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants