Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: gnomad vcf is classified as delins/sub/ref agree #489

Merged
merged 6 commits into from
Aug 17, 2023

Conversation

korikuzma
Copy link
Member

#231 will remove this temp work, but we need to think more about this. So for now, we only classify gnomad vcf as delins, substitution, or reference agree. I added a todo to cleanup the gnomad vcf to protein work since I didn't really look at it. Not sure if that will be done in the refactor or if we'll do it after. Depends on how much time I have. We may end up cleaning it up once we add support for more complex deletions and insertions.

@korikuzma korikuzma added priority:medium Medium priority technical debt A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup" labels Aug 10, 2023
@korikuzma korikuzma self-assigned this Aug 10, 2023
@korikuzma korikuzma merged commit dd7a785 into refactor Aug 17, 2023
10 checks passed
@korikuzma korikuzma deleted the rm-gnomad-vcf-dels branch August 17, 2023 12:41
korikuzma added a commit that referenced this pull request Aug 25, 2023
- Refactor app (#474)
  - Mainly focused on cleanup related to to_vrs and normalize endpoints. Did not really look at gnomad_vcf_to_protein or copy_number_variation modules
  - Remove to canonical variation (no longer support)
  - Combined tests for tokenizers/classifiers/validators/translators into one module
  - Removed amino_acids.csv (accidentally left in)
  - Names changes
    - Coding DNA → cDNA
    - Polypeptide truncation → Protein Stop Gain
    - Silent Mutation → Reference Agree
    - Uncertain/Range → Ambiguous
    - HGVSDupDelModeEnum → HGVSDupDelModeOption
  - Validators no longer do any kind of translations to VRS representations. Translators will do this work
  - Classifier only returns exact matches and only returns a single classification rather than a list
  - Use regex patterns (in variation/regex.py) rather than multiple if/else conditions
  - Remove unused code
  - Create variation schemas for supported variation types. Uses consistent field naming
  - Cleaning up instance variables in classes
  - Only run fully justified allele normalization on VRS Alleles. Do not run on VRS Copy Number
  - Pulled tokenize, classify, validate, translate outside of subdirectories (variation/tokenizers, variation/classifiers, variation/validators, variation/translators) and moved to app root 
  - baseline_copies is required in /hgvs_to_copy_number_count
  - cool-seq-tool update
      - Removes file path params from QueryHandler, can set these via environment variables
      - QueryHandler accepts only uta_db_url as param and removes uta_db_pwd
   - new dependencies for linting
    - ruff (replaced flake8)
    - black 
- Add more support for gnomad vcf expressions in normalize (#479, #489)
- Remove pyliftover from deps (covered by cool-seq-tool) (#480) 
- Fix default mode for hgvs dup del mode wrt rse (#482)
- Fix default HGVS dup del mode - dels should be allele w lse (#484)
- Use cool-seq-tool AnnotationLayer and rm CoordinateType (#485)
- Remove structural type from varaition descriptor (#487)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:medium Medium priority technical debt A feature/requirement implemented in a sub-optimal way & must be re-written. Contrast to "cleanup"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants