-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jbeck/ag 1144/transform gene info testing #121
Jbeck/ag 1144/transform gene info testing #121
Conversation
… code to test the passing scenario
…tep were both false
…ow errors, created failure case test code
…ins transform to conform to the function definition
…ead of a dict, biodomains now drops rows with NaN biodomain names, added check for boolean values in tep_info. Updated test to work with these fixes
…ed tests for duplicate Ensembl IDs, fixed issue with null ensembl_info in the gene_info transform
drop_columns=["ensembl_gene_id"], | ||
nested_field_is_list=False, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI The above chunk of code has been moved further down the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a couple NITs
…to jbeck/AG-1144/transform_gene_info_testing
Co-authored-by: Brad Macdonald <52762200+BWMac@users.noreply.github.com>
…to jbeck/AG-1144/transform_gene_info_testing
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR adds integration testing for the
gene_info
transform. I also fixed several bugs/missing handling in the transform that I discovered while writing the tests.There are 2 'passing' cases with good data, using two sets of values for
adjusted_p_value_threshold
andprotein_level_threshold
, which are specified in the config file.There are 10 'failure' cases where we expect the code to throw errors:
tep_adi_info
tep_adi_info
I tried to document a description of each of the data sets and what conditions would pass/fail for each one, at the top of the test_gene_info.py file.
Unfortunately, the
gene_metadata
test files have to be in feather format, not csv, due to having lists of items inside thealias
andensembl_possible_replacements
fields. So those files are not human-readable.Changes made to the transform itself:
is_adi
andis_tep
fields are boolean (or missing, which is also ok)ensembl_info
ends up asnull
instead of a dictionary of null values. Moving thenest_fields
call for this field from before the merge withgene_info
to after all other data sets were merged, and ensuring that allensembl_possible_replacements
values were a list or empty list and notnull
, solves the issue.