Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jbeck/ag 1144/transform gene info testing #121

Merged
merged 15 commits into from
Feb 26, 2024

Conversation

jaclynbeck-sage
Copy link
Contributor

@jaclynbeck-sage jaclynbeck-sage commented Feb 10, 2024

This PR adds integration testing for the gene_info transform. I also fixed several bugs/missing handling in the transform that I discovered while writing the tests.
There are 2 'passing' cases with good data, using two sets of values for adjusted_p_value_threshold and protein_level_threshold, which are specified in the config file.
There are 10 'failure' cases where we expect the code to throw errors:

  • (3) tests for duplicate Ensembl IDs in 3 of the input datasets
  • (4) tests for non-numeric values in key columns from 4 input data sets
  • (1) test for missing HGNC symbol in tep_adi_info
  • (2) tests for non-boolean values in 2 fields of tep_adi_info

I tried to document a description of each of the data sets and what conditions would pass/fail for each one, at the top of the test_gene_info.py file.

Unfortunately, the gene_metadata test files have to be in feather format, not csv, due to having lists of items inside the alias and ensembl_possible_replacements fields. So those files are not human-readable.

Changes made to the transform itself:

  • biodomains data set drops rows with NA "biodomain" values to avoid "nan" ending up in the list of biodomains for that Ensembl ID
  • tep_adi_info data set now has a check to make sure all the values in the is_adi and is_tep fields are boolean (or missing, which is also ok)
  • added handling for edge case where ensembl_info ends up as null instead of a dictionary of null values. Moving the nest_fields call for this field from before the merge with gene_info to after all other data sets were merged, and ensuring that all ensembl_possible_replacements values were a list or empty list and not null, solves the issue.
  • some reformatting by Black

drop_columns=["ensembl_gene_id"],
nested_field_is_list=False,
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI The above chunk of code has been moved further down the file.

@jaclynbeck-sage jaclynbeck-sage requested review from JessterB, thomasyu888 and BWMac and removed request for JessterB and thomasyu888 February 13, 2024 00:08
@jaclynbeck-sage jaclynbeck-sage marked this pull request as ready for review February 13, 2024 00:09
Copy link
Contributor

@BWMac BWMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a couple NITs

src/agoradatatools/etl/transform/gene_info.py Outdated Show resolved Hide resolved
tests/transform/test_gene_info.py Outdated Show resolved Hide resolved
tests/transform/test_gene_info.py Outdated Show resolved Hide resolved
Copy link

sonarqubecloud bot commented Feb 23, 2024

Quality Gate Passed Quality Gate passed

Issues
2 New issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link
Contributor

@JessterB JessterB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jaclynbeck-sage jaclynbeck-sage merged commit efe6eec into dev Feb 26, 2024
9 checks passed
@jaclynbeck-sage jaclynbeck-sage deleted the jbeck/AG-1144/transform_gene_info_testing branch February 26, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants