Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Auto-populate Sample, Container on Biospecimen create/update #645

Closed
wants to merge 7 commits into from

Conversation

znatty22
Copy link
Member

@znatty22 znatty22 commented Jan 23, 2024

Motivation

#643 introduced the Sample and Container tables in order to address the shortcomings of the Biospecimen table. Now we need a way to populate these tables. And since a sample and container may be derived from a biospecimen, we can auto-populate them.

Approach

Each time a biospecimen is created or updated via an HTTP POST/PATCH, derive the sample and container from the input biospecimen and update sample/container tables.

Sample, Container Management

sample-container-flow(1)

  1. Find Sample - check if a sample already exists for this biospecimen

    • Use a specific set of biospecimen attributes to uniquely identify the sample:
    • sample_event_key = concat(participant_id, external_sample_id, age_at_event_days)
    • analyte_type
    • composition
    • source_text_tissue_type
    • source_text_anatomic_site
    • preservation_method
    • method_of_sample_procurement
    • concentration_mg_per_ml
  2. Create Sample - if the sample does not exist - create it using the relevant subset of biospecimen attributes

    • All parameters above, plus:
    • participant_id
    • external_sample_id
    • volume_ul
  3. Update Sample - if the sample exists - update it using the relevant subset of biospecimen attributes

  4. Find Container - check if a container already exists for this biospecimen

    • Use a specific set of biospecimen attributes to uniquely identify the container:
    • biospecimen_id
  5. Create Container - if the container does not exist - create it using the relevant subset of biospecimen attributes

    • All parameters above, plus:
    • sample_id
    • specimen_status
    • volume_ul
  6. Update Container - if the container exists - update it using the relevant subset of biospecimen attributes

  7. Sum Volume - update the the sample's volume_ul field with the sum of it's container volumes

@znatty22 znatty22 added the feature New functionality label Jan 23, 2024
@znatty22 znatty22 self-assigned this Jan 23, 2024
@znatty22 znatty22 force-pushed the populate-sample-container branch 2 times, most recently from 9200f36 to a3f4ca4 Compare January 23, 2024 19:22
@znatty22 znatty22 force-pushed the populate-sample-container branch from a3f4ca4 to f342275 Compare January 23, 2024 19:25
@znatty22 znatty22 changed the title ✨ Auto-populate Sample and Container on Biospecimen create/update ✨ Auto-populate Sample, Container on Biospecimen create/update Jan 23, 2024
@znatty22 znatty22 mentioned this pull request Jan 23, 2024
2 tasks
@znatty22 znatty22 marked this pull request as ready for review January 23, 2024 21:21
@znatty22 znatty22 requested a review from a team as a code owner January 23, 2024 21:21
@znatty22 znatty22 marked this pull request as draft January 23, 2024 21:21
@znatty22 znatty22 requested a review from calkinsh January 23, 2024 21:22
params = _get_sample_identifier(biospecimen)
# Add remaining sample attributes
params.update(
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything in this PR makes sense to me except the idea that we might update a participant ID. I think participant ID should be part of the defining characteristics of a sample so I'm struggling to understand how we could both identify an existing sample (which implies the participant ID on the sample matches that on the specimen being registered) but then update the sample participant ID field (which implies the participant ID does not match the specimen being registered).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hold on... is this related to participant.kf_id really being the primary ID for particpant and participant_id being a sort of secondary/external ID? So we are updating the external ID if it changes but relying on the kf_id/PK for confirming the sample already exists?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calkinsh Yep, the primary key for participant is participant.kf_id and the sample has a foreign key to it sample.participant_id so I think that does make it a defining characteristic of the sample.

Copy link
Member Author

@znatty22 znatty22 Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we may not need the Sample.participant_id or Sample.external_id bc they are captured in the Sample.sample_event_key but I included them in the Sample table in case we want to populate the sample event key with something else and bc I felt it would be ok to have some redundancy to gain some clarity on which participant the sample came from and what the original biospecimen's external sample ID was

@znatty22 znatty22 force-pushed the populate-sample-container branch from dfa88d7 to 1074988 Compare January 25, 2024 19:49
@znatty22 znatty22 force-pushed the populate-sample-container branch from 1074988 to b9aa356 Compare January 25, 2024 21:21
return container


def _upsert_sample(biospecimen):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change this approach. Read, modify, write is an anti-pattern and doesn't work with concurrent requests. Use postgresql internal upsert (update on conflict)

@znatty22
Copy link
Member Author

znatty22 commented Feb 1, 2024

Closing for now. New approach is to implement the Sample table only. This is an MVP to meet Portal Beta requirements. Will try to autopopulate the Sample table from Biospecimens similar to approach here

@znatty22 znatty22 closed this Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants