-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Study import QC (ref genome validation + reported trait cleaning) #331
Conversation
(will be imported/used by the study import pipeline)
curation/imports/scoring_file.py
Outdated
else: | ||
match_rate: float = float(match) / (match + mismatch) | ||
report_func(f'Match rate: {match_rate}') | ||
if match_rate < 0.9: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we store the threshold (i.e. 0.9
) in a variable, preferably in the config file ?
curation/scripts/qc_ref_genome.py
Outdated
|
||
|
||
def get_variation_from_ensembl(rsids: list[str], ref_genome): | ||
url = 'https://grch37.rest.ensembl.org/variation/human/' if ref_genome == '37' else 'https://rest.ensembl.org/variation/human/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we store the 2 Ensembl URLs in a dictionnary instead ?
e.g.:
servers = {
'37': 'https://grch37.rest.ensembl.org',
'38': 'https://rest.ensembl.org/'
}
The root url is also stored in server
line 12
Do we have any README or flowchart describing how it works? |
No description provided.