Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gold_standard): add traitFromSourceMappedId to schema #924

Merged
merged 6 commits into from
Nov 27, 2024

Conversation

ireneisdoomed
Copy link
Contributor

@ireneisdoomed ireneisdoomed commented Nov 22, 2024

✨ Context

This PR adds the EFO ID to the gold standard so that we can split the training set by EFO/Gene during cross validation.
The change in the schema doesnt have implication in FE/BE as discussed.
Bare in mind this will create some explosion in the gold standard.

🛠 What does this PR implement

🙈 Missing

A following PR will make more thorough changes in the schema of the gold standard, and in the code to accommodate this dataset to the definition of a effector gene list.
The new field is set to nullable = True for the time being, so that tests don't complain. This field should be mandatory in the future.

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@@ -36,7 +36,7 @@ def __init__(
with_gold_standard (bool): Whether to include the gold standard set in the feature matrix.
"""
self.with_gold_standard = with_gold_standard
self.fixed_cols = ["studyLocusId", "geneId"]
self.fixed_cols = ["studyLocusId", "geneId", "traitFromSourceMappedId"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature matrix doesn't have a schema.
But I need to define that this column is expected as one of the fixed columns to pivot on.

@ireneisdoomed ireneisdoomed merged commit 4837a4b into dev Nov 27, 2024
5 checks passed
@ireneisdoomed ireneisdoomed deleted the il-gold-standard-schema-simple branch November 27, 2024 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants