-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hotfix: Expect more possible metadata columns when parsing ES&S CVRs #1954
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
arsalansufi
approved these changes
Aug 13, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
Previously, we expected exactly 3 metadata columns. When given a file with more metadata columns, they were treated like contest columns. This caused a bloating of the contest metadata and major performance slowdown.
jonahkagan
force-pushed
the
hotfix-ess-cvr-parsing
branch
from
August 13, 2024 18:54
97c5113
to
07292d7
Compare
jonahkagan
added a commit
that referenced
this pull request
Aug 14, 2024
We added a hotfix in #1954 to allow different sets of metadata columsn in the CVR file. However, we still may see unknown columns we haven't seen before. Since there's no good way to differentiate those columsn from contest columns, our current approach may silently fail in that case. The consequence would be that metadata columns are treated as contests. This may or may not cause downstream issues. To reduce the likelihood of that happening, we change our method of searching for the dividing line between metadata columns and contest columns to look for the _last_ known metadata header. That way, we'll get it right in every case except the case where the dividing line is a header we haven't seen before, which is much less likely to occur.
jonahkagan
added a commit
that referenced
this pull request
Aug 14, 2024
We added a hotfix in #1954 to allow different sets of metadata columsn in the CVR file. However, we still may see unknown columns we haven't seen before. Since there's no good way to differentiate those columsn from contest columns, our current approach may silently fail in that case. The consequence would be that metadata columns are treated as contests. This may or may not cause downstream issues. To reduce the likelihood of that happening, we change our method of searching for the dividing line between metadata columns and contest columns to look for the _last_ known metadata header. That way, we'll get it right in every case except the case where the dividing line is a header we haven't seen before, which is much less likely to occur.
jonahkagan
added a commit
that referenced
this pull request
Aug 14, 2024
We added a hotfix in #1954 to allow different sets of metadata columsn in the CVR file. However, we still may see unknown columns we haven't seen before. Since there's no good way to differentiate those columsn from contest columns, our current approach may silently fail in that case. The consequence would be that metadata columns are treated as contests. This may or may not cause downstream issues. To reduce the likelihood of that happening, we change our method of searching for the dividing line between metadata columns and contest columns to look for the _last_ known metadata header. That way, we'll get it right in every case except the case where the dividing line is a header we haven't seen before, which is much less likely to occur.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, we expected exactly 3 metadata columns. When given a file with more metadata columns, they were treated like contest columns. This caused a bloating of the contest metadata and major performance slowdown.
Manually tested this hotfix, will add regression test separately.