Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix: Expect more possible metadata columns when parsing ES&S CVRs #1954

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

jonahkagan
Copy link
Contributor

Previously, we expected exactly 3 metadata columns. When given a file with more metadata columns, they were treated like contest columns. This caused a bloating of the contest metadata and major performance slowdown.

Manually tested this hotfix, will add regression test separately.

Copy link
Contributor

@arsalansufi arsalansufi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

@jonahkagan jonahkagan enabled auto-merge (squash) August 13, 2024 18:45
Previously, we expected exactly 3 metadata columns. When given a file
with more metadata columns, they were treated like contest columns. This
caused a bloating of the contest metadata and major performance
slowdown.
@jonahkagan jonahkagan enabled auto-merge (squash) August 13, 2024 18:54
@jonahkagan jonahkagan merged commit 0e87ef2 into main Aug 13, 2024
4 of 5 checks passed
@jonahkagan jonahkagan deleted the hotfix-ess-cvr-parsing branch August 13, 2024 19:02
jonahkagan added a commit that referenced this pull request Aug 14, 2024
We added a hotfix in #1954 to allow different sets of metadata columsn
in the CVR file. However, we still may see unknown columns we haven't
seen before. Since there's no good way to differentiate those columsn
from contest columns, our current approach may silently fail in that
case. The consequence would be that metadata columns are treated as
contests. This may or may not cause downstream issues.

To reduce the likelihood of that happening, we change our method of
searching for the dividing line between metadata columns and contest
columns to look for the _last_ known metadata header. That way, we'll
get it right in every case except the case where the dividing line is a
header we haven't seen before, which is much less likely to occur.
jonahkagan added a commit that referenced this pull request Aug 14, 2024
We added a hotfix in #1954 to allow different sets of metadata columsn
in the CVR file. However, we still may see unknown columns we haven't
seen before. Since there's no good way to differentiate those columsn
from contest columns, our current approach may silently fail in that
case. The consequence would be that metadata columns are treated as
contests. This may or may not cause downstream issues.

To reduce the likelihood of that happening, we change our method of
searching for the dividing line between metadata columns and contest
columns to look for the _last_ known metadata header. That way, we'll
get it right in every case except the case where the dividing line is a
header we haven't seen before, which is much less likely to occur.
jonahkagan added a commit that referenced this pull request Aug 14, 2024
We added a hotfix in #1954 to allow different sets of metadata columsn
in the CVR file. However, we still may see unknown columns we haven't
seen before. Since there's no good way to differentiate those columsn
from contest columns, our current approach may silently fail in that
case. The consequence would be that metadata columns are treated as
contests. This may or may not cause downstream issues.

To reduce the likelihood of that happening, we change our method of
searching for the dividing line between metadata columns and contest
columns to look for the _last_ known metadata header. That way, we'll
get it right in every case except the case where the dividing line is a
header we haven't seen before, which is much less likely to occur.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants