refactor: gnomad vcf is classified as delins/sub/ref agree #489

korikuzma · 2023-08-10T13:28:29Z

#231 will remove this temp work, but we need to think more about this. So for now, we only classify gnomad vcf as delins, substitution, or reference agree. I added a todo to cleanup the gnomad vcf to protein work since I didn't really look at it. Not sure if that will be done in the refactor or if we'll do it after. Depends on how much time I have. We may end up cleaning it up once we add support for more complex deletions and insertions.

consider them delins

- Refactor app (#474) - Mainly focused on cleanup related to to_vrs and normalize endpoints. Did not really look at gnomad_vcf_to_protein or copy_number_variation modules - Remove to canonical variation (no longer support) - Combined tests for tokenizers/classifiers/validators/translators into one module - Removed amino_acids.csv (accidentally left in) - Names changes - Coding DNA → cDNA - Polypeptide truncation → Protein Stop Gain - Silent Mutation → Reference Agree - Uncertain/Range → Ambiguous - HGVSDupDelModeEnum → HGVSDupDelModeOption - Validators no longer do any kind of translations to VRS representations. Translators will do this work - Classifier only returns exact matches and only returns a single classification rather than a list - Use regex patterns (in variation/regex.py) rather than multiple if/else conditions - Remove unused code - Create variation schemas for supported variation types. Uses consistent field naming - Cleaning up instance variables in classes - Only run fully justified allele normalization on VRS Alleles. Do not run on VRS Copy Number - Pulled tokenize, classify, validate, translate outside of subdirectories (variation/tokenizers, variation/classifiers, variation/validators, variation/translators) and moved to app root - baseline_copies is required in /hgvs_to_copy_number_count - cool-seq-tool update - Removes file path params from QueryHandler, can set these via environment variables - QueryHandler accepts only uta_db_url as param and removes uta_db_pwd - new dependencies for linting - ruff (replaced flake8) - black - Add more support for gnomad vcf expressions in normalize (#479, #489) - Remove pyliftover from deps (covered by cool-seq-tool) (#480) - Fix default mode for hgvs dup del mode wrt rse (#482) - Fix default HGVS dup del mode - dels should be allele w lse (#484) - Use cool-seq-tool AnnotationLayer and rm CoordinateType (#485) - Remove structural type from varaition descriptor (#487)

korikuzma added 6 commits August 8, 2023 20:40

wip: rm gnomad vcf deletions

68545b6

consider them delins

wip: just use delins + sub

5f34a28

Merge branch 'refactor' into rm-gnomad-vcf-dels

847ef81

fix gnomad vcf to protein

5ca4e7f

Merge branch 'refactor' into rm-gnomad-vcf-dels

6097143

more cleanup

000c448

korikuzma added priority:medium Medium priority technical debt A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup" labels Aug 10, 2023

korikuzma requested a review from jsstevenson August 10, 2023 13:28

korikuzma self-assigned this Aug 10, 2023

jsstevenson approved these changes Aug 10, 2023

View reviewed changes

korikuzma merged commit dd7a785 into refactor Aug 17, 2023

korikuzma deleted the rm-gnomad-vcf-dels branch August 17, 2023 12:41

korikuzma mentioned this pull request Aug 21, 2023

refactor!: clean up app (metaschema) #494

Merged

korikuzma added a commit that referenced this pull request Sep 22, 2023

refactor: gnomad vcf is classified as delins/sub/ref agree (#489)

39e6841

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: gnomad vcf is classified as delins/sub/ref agree #489

refactor: gnomad vcf is classified as delins/sub/ref agree #489

korikuzma commented Aug 10, 2023

refactor: gnomad vcf is classified as delins/sub/ref agree #489

refactor: gnomad vcf is classified as delins/sub/ref agree #489

Conversation

korikuzma commented Aug 10, 2023