Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ New sample-relationships API #665

Merged
merged 15 commits into from
May 29, 2024
Merged

✨ New sample-relationships API #665

merged 15 commits into from
May 29, 2024

Conversation

znatty22
Copy link
Member

@znatty22 znatty22 commented May 15, 2024

Motivation

CBTN has complex sample data where a group of samples actually have a hierarchical or tree like structure (Blood -> White Blood Cells -> DNA, RNA). There is currently no way to capture this information in Dataservice and therefore no way to get the information into the FHIR service and the Portal.

Approach

Add a SampleRelationship table to capture the sample tree. The table has the following columns:

  • parent_id (FK to Sample)
  • child_id (FK to Sample)
  • external_parent_id (string)
  • external_child_id (string)
  • notes (string)

Add validation rules in the API:

  • Cannot create a relationship where the parent or child do not exist in the Sample table
  • Cannot create a relationship where parent = child
  • Cannot create a duplicate relationship
  • A child sample cannot have more than 1 parent
  • If a sample is a parent of a child sample, then the reverse relationship cannot exist (e.g. if the relationship SA1 -> SA2 exists, then SA2 -> SA1 cannot be created)

API
The endpoint and search parameters follow the same pattern as all other endpoints. Here are the examples

# Get all samples
/sample-relationships

# Get all samples in a study
/sample-relationships?study_id=foo

@znatty22 znatty22 added the feature New functionality label May 15, 2024
@znatty22 znatty22 self-assigned this May 15, 2024
@znatty22 znatty22 force-pushed the sample-relationships branch from 7de38c2 to 156384e Compare May 15, 2024 19:23
@znatty22 znatty22 force-pushed the sample-relationships branch from 156384e to 46811f5 Compare May 16, 2024 15:10
@znatty22 znatty22 force-pushed the sample-relationships branch from 46811f5 to 91dfae2 Compare May 16, 2024 15:11
@znatty22 znatty22 force-pushed the sample-relationships branch from 2044070 to 2ab07cc Compare May 21, 2024 16:49
@znatty22 znatty22 force-pushed the sample-relationships branch from cabe3df to 5c0b60a Compare May 22, 2024 16:44
@znatty22 znatty22 force-pushed the sample-relationships branch from 9fa7fdd to 957e135 Compare May 23, 2024 13:39
Copy link
Contributor

@chris-s-friedman chris-s-friedman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things I'm curious about:

  1. can a sample have multiple parents? (or are we explicitly allowing/ disallowing that?)
  2. if the answer to the multiple parent question above is "No" then why is this not just a column in the sample table itself?

Comment on lines +22 to +23
:param external_parent_id: Name given to parent sample by contributor
:param external_child_id: Name given to child sample by contributor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being populated in the relationship table? shouldn't this just be coming from sample?

Copy link
Member Author

@znatty22 znatty22 May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit redundant but I think its a tradeoff with ease of implementation + complexity of the query when fetching/searching for relationships with the external sample IDs.

If we didn't have these additional columns on the relationship table, then in order to get all relationships or search for specific relationships using the external parent or child sample IDs, we'd have to join the relationships table with the sample table on both the parent_id and child_id FK columns.

We'd also have to break the pattern we follow in the schema layer of the API in order to return the parent Sample.external_id and the child Sample.external_id along with each sample relationship. We could do this but I didn't think it was worth figuring this out with the old documentation and time constraints

dataservice/api/sample_relationship/models.py Show resolved Hide resolved
@chris-s-friedman
Copy link
Contributor

chris-s-friedman commented May 23, 2024

If a sample is a parent of a child sample, then the reverse relationship cannot exist (e.g. if the relationship SA1 -> SA2 exists, then SA2 -> SA1 cannot be created)

Just to clarify, the -> directionality of the arrow is meaningful here, correct? The arrow means parent -> child. So what this validation is validating is that:

If

  • SA1 is the parent
  • SA2 is the child
  • there is a record in sample_relationship where SA1 is parent and SA2 is child

Then we are validating that a new record cannot be SA2 is parent and SA1 is child, correct?

@znatty22 znatty22 marked this pull request as ready for review May 23, 2024 13:57
@znatty22 znatty22 requested a review from a team as a code owner May 23, 2024 13:57
@znatty22
Copy link
Member Author

If a sample is a parent of a child sample, then the reverse relationship cannot exist (e.g. if the relationship SA1 -> SA2 exists, then SA2 -> SA1 cannot be created)

Just to clarify, the -> directionality of the arrow is meaningful here, correct? The arrow means parent -> child. So what this validation is validating is that:

If

  • SA1 is the parent
  • SA2 is the child
  • there is a record in sample_relationship where SA1 is parent and SA2 is child

Then we are validating that a new record cannot be SA2 is parent and SA1 is child, correct?

Correct, we're validating that a child sample cannot also be the parent of its own parent sample, and a parent sample cannot also be the child of its own child sample. Basically, we're avoiding cycles

@znatty22
Copy link
Member Author

znatty22 commented May 23, 2024

Some things I'm curious about:

  1. can a sample have multiple parents? (or are we explicitly allowing/ disallowing that?)
  2. if the answer to the multiple parent question above is "No" then why is this not just a column in the sample table itself?
  1. No a sample cannot have multiple parents - good catch. I might need to add a validation rule for this or change the unique constraint to be on the child_id column

  2. I don't have great answer for this, but the short answer is that we could add the parent_id to the sample table itself but other things get complicated and break our current Dataservice patterns. The the easiest way to implement self-referential data structure is with the secondary many-to-many table. There are definitely other approaches including adding the foreign key to sample to the sample table.

@znatty22 znatty22 force-pushed the sample-relationships branch from 1702b56 to 935f9cd Compare May 29, 2024 15:34
@znatty22 znatty22 merged commit 51995c3 into master May 29, 2024
5 checks passed
znatty22 added a commit that referenced this pull request Jun 5, 2024
## Release 1.28.0

### Summary

- Emojis: ✨ x2, ? x1
- Categories: Additions x2, Other Changes x1

### New features and changes

- [#665](#665) - ✨ New sample-relationships API - [51995c3](51995c3) by [znatty22](https://github.com/znatty22)
- [#661](#661) -  🗃️ Migration for Biospecimen.specimen_status - [83e2ed4](83e2ed4) by [znatty22](https://github.com/znatty22)
- [#660](#660) - ✨ Add Biospecimen.specimen_status - [0d3f1aa](0d3f1aa) by [znatty22](https://github.com/znatty22)
@znatty22 znatty22 mentioned this pull request Jun 5, 2024
znatty22 added a commit that referenced this pull request Jun 5, 2024
## Release 1.28.0

### Summary

- Emojis: ♻️ x1, ✨ x2, ? x1
- Categories: Additions x3, Other Changes x1

### New features and changes

- [#667](#667) - ♻️ Don't delete samples with 0 biospecimens - [6eddc7c](6eddc7c) by [znatty22](https://github.com/znatty22)
- [#665](#665) - ✨ New sample-relationships API - [51995c3](51995c3) by [znatty22](https://github.com/znatty22)
- [#661](#661) -  🗃️ Migration for Biospecimen.specimen_status - [83e2ed4](83e2ed4) by [znatty22](https://github.com/znatty22)
- [#660](#660) - ✨ Add Biospecimen.specimen_status - [0d3f1aa](0d3f1aa) by [znatty22](https://github.com/znatty22)
@znatty22 znatty22 deleted the sample-relationships branch June 12, 2024 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants