-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: gnomad vcf is classified as delins/sub/ref agree #489
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
korikuzma
added
priority:medium
Medium priority
technical debt
A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup"
labels
Aug 10, 2023
jsstevenson
approved these changes
Aug 10, 2023
korikuzma
added a commit
that referenced
this pull request
Aug 25, 2023
- Refactor app (#474) - Mainly focused on cleanup related to to_vrs and normalize endpoints. Did not really look at gnomad_vcf_to_protein or copy_number_variation modules - Remove to canonical variation (no longer support) - Combined tests for tokenizers/classifiers/validators/translators into one module - Removed amino_acids.csv (accidentally left in) - Names changes - Coding DNA → cDNA - Polypeptide truncation → Protein Stop Gain - Silent Mutation → Reference Agree - Uncertain/Range → Ambiguous - HGVSDupDelModeEnum → HGVSDupDelModeOption - Validators no longer do any kind of translations to VRS representations. Translators will do this work - Classifier only returns exact matches and only returns a single classification rather than a list - Use regex patterns (in variation/regex.py) rather than multiple if/else conditions - Remove unused code - Create variation schemas for supported variation types. Uses consistent field naming - Cleaning up instance variables in classes - Only run fully justified allele normalization on VRS Alleles. Do not run on VRS Copy Number - Pulled tokenize, classify, validate, translate outside of subdirectories (variation/tokenizers, variation/classifiers, variation/validators, variation/translators) and moved to app root - baseline_copies is required in /hgvs_to_copy_number_count - cool-seq-tool update - Removes file path params from QueryHandler, can set these via environment variables - QueryHandler accepts only uta_db_url as param and removes uta_db_pwd - new dependencies for linting - ruff (replaced flake8) - black - Add more support for gnomad vcf expressions in normalize (#479, #489) - Remove pyliftover from deps (covered by cool-seq-tool) (#480) - Fix default mode for hgvs dup del mode wrt rse (#482) - Fix default HGVS dup del mode - dels should be allele w lse (#484) - Use cool-seq-tool AnnotationLayer and rm CoordinateType (#485) - Remove structural type from varaition descriptor (#487)
korikuzma
added a commit
that referenced
this pull request
Sep 22, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
priority:medium
Medium priority
technical debt
A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup"
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#231 will remove this temp work, but we need to think more about this. So for now, we only classify gnomad vcf as delins, substitution, or reference agree. I added a todo to cleanup the gnomad vcf to protein work since I didn't really look at it. Not sure if that will be done in the refactor or if we'll do it after. Depends on how much time I have. We may end up cleaning it up once we add support for more complex deletions and insertions.