-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Auto-populate Sample, Container on Biospecimen create/update #645
Closed
Closed
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
c390603
:sparkles: Add Biospecimen manager to create/update samp/ctn in API
znatty22 f342275
:recycle: Create/update sample, container on biosp create/update
znatty22 e3de58f
:recycle: Only use concentration if analyte_type is DNA, RNA
znatty22 08c07da
:bug: Check for orphaned samples on ctner delete and update
znatty22 96bf11a
:bug: Return updated sample from bs smp cnter manager
znatty22 0553e37
:white_check_mark: Test sample/container management on bs create/update
znatty22 b9aa356
:white_check_mark: Fix broken tests - use sample/container mgr to cre…
znatty22 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,314 @@ | ||
""" | ||
Module to help manage the create/update lifecycle of Samples and Containers | ||
|
||
The main method in this module is the `manage_sample_containers` which is | ||
responsible for create/updating Samples and Containers every time a Biospecimen | ||
is created or updated. It gets called in | ||
dataservice.api.biospecimen.resources.py | ||
|
||
Background: | ||
The current Biospecimen table does not adequately model the hierarchical | ||
relationship between specimen groups and specimens. The Sample and | ||
Container tables have been created to fill in this gap. | ||
|
||
A Sample is a biologically equivalent group of specimens. A Sample has | ||
one or more Containers and a Container essentially mirrors the Biospecimen. | ||
|
||
The Sample and Container tables were created in order to minimize any | ||
changes to the existing Biospecimen table. | ||
""" | ||
|
||
from marshmallow import ValidationError | ||
from flask import abort | ||
from dataservice.extensions import db | ||
from dataservice.api.sample.models import Sample | ||
from dataservice.api.container.models import Container | ||
from dataservice.api.sample.schemas import ( | ||
SampleSchema, | ||
) | ||
from dataservice.api.container.schemas import ( | ||
ContainerSchema, | ||
) | ||
|
||
|
||
def _create_sample_event_key(biospecimen): | ||
""" | ||
Create a sample event identifier from specific fields on the Biospecimen | ||
|
||
Use: | ||
participant_id | ||
external_sample_id | ||
age_at_event_days | ||
|
||
Key format: <participant_id>-<external_sample_id>-<age_at_event_days> | ||
|
||
If age_at_event_days is null, then use the value "Not Reported" | ||
""" | ||
components = [ | ||
biospecimen.participant_id, | ||
biospecimen.external_sample_id, | ||
biospecimen.age_at_event_days | ||
] | ||
|
||
return "-".join([str(c) if c else "Not Reported" for c in components]) | ||
|
||
|
||
def _create_concentration(biospecimen): | ||
""" | ||
Create the sample concentration given the biospecimen concentration | ||
|
||
Only use the concentration value when the analyte_type is DNA or RNA | ||
""" | ||
if biospecimen.analyte_type in ["DNA", "RNA"]: | ||
return biospecimen.concentration_mg_per_ml | ||
else: | ||
return None | ||
|
||
|
||
def _get_sample_identifier(biospecimen): | ||
""" | ||
Helper to extract specific Biospecimen attributes to uniquely | ||
identify a Sample | ||
""" | ||
return { | ||
"sample_event_key": _create_sample_event_key(biospecimen), | ||
"composition": biospecimen.composition, | ||
"tissue_type": biospecimen.source_text_tissue_type, | ||
"analyte_type": biospecimen.analyte_type, | ||
"anatomical_location": biospecimen.source_text_anatomical_site, | ||
"method_of_sample_procurement": | ||
biospecimen.method_of_sample_procurement, | ||
"preservation_method": biospecimen.preservation_method, | ||
"concentration_mg_per_ml": _create_concentration(biospecimen) | ||
} | ||
|
||
|
||
def _get_container_identifier(biospecimen): | ||
""" | ||
Helper to extract specific Biospecimen attributes to uniquely identify | ||
a Container | ||
""" | ||
return { | ||
"biospecimen_id": biospecimen.kf_id, | ||
} | ||
|
||
|
||
def _get_visibility_params(biospecimen): | ||
""" | ||
Helper method to get dict of visibility parameters from the Biospecimen | ||
""" | ||
return { | ||
"visible": biospecimen.visible, | ||
"visibility_reason": biospecimen.visibility_reason, | ||
"visibility_comment": biospecimen.visibility_comment | ||
} | ||
|
||
|
||
def _create_sample(biospecimen): | ||
""" | ||
Create Sample from specific Biospecimen attributes. Validate Sample | ||
""" | ||
# Extract the parameters that uniquely identify a sample | ||
params = _get_sample_identifier(biospecimen) | ||
# Add remaining sample attributes | ||
params.update( | ||
{ | ||
"participant_id": biospecimen.participant_id, | ||
"external_id": biospecimen.external_sample_id, | ||
"volume_ul": biospecimen.volume_ul, | ||
} | ||
) | ||
# Set visibility params based on Biospecimen which represents both | ||
# sample and containers | ||
params.update( | ||
_get_visibility_params(biospecimen) | ||
) | ||
# Validate sample parameters and create sample | ||
try: | ||
sample = SampleSchema(strict=True).load(params).data | ||
# Params not valid | ||
except ValidationError as e: | ||
abort(400, 'could not create sample: {}'.format(e.messages)) | ||
|
||
return sample | ||
|
||
|
||
def _update_sample(current_sample, biospecimen): | ||
""" | ||
Update Sample using specific Biospecimen attributes. Validate Sample | ||
""" | ||
# Extract the parameters that uniquely identify a sample | ||
params = _get_sample_identifier(biospecimen) | ||
# Add remaining sample attributes | ||
params.update( | ||
{ | ||
"participant_id": biospecimen.participant_id, | ||
"external_id": biospecimen.external_sample_id, | ||
"volume_ul": biospecimen.volume_ul, | ||
} | ||
) | ||
# Set visibility params based on Biospecimen which represents both | ||
# sample and containers | ||
params.update( | ||
_get_visibility_params(biospecimen) | ||
) | ||
try: | ||
sample = SampleSchema(strict=True).load( | ||
params, instance=current_sample, partial=True | ||
).data | ||
except ValidationError as e: | ||
abort(400, 'could not update sample: {}'.format(e.messages)) | ||
|
||
return sample | ||
|
||
|
||
def _create_container(biospecimen, sample): | ||
""" | ||
Create Container using specific Biospecimen attributes. | ||
Link Container to its associated biospecimen and sample | ||
Validate Container | ||
""" | ||
# Extract the parameters that uniquely identify a sample | ||
params = _get_container_identifier(biospecimen) | ||
# Add remaining sample attributes | ||
params.update( | ||
{ | ||
"biospecimen_id": biospecimen.kf_id, | ||
"sample_id": sample.kf_id, | ||
"volume_ul": biospecimen.volume_ul, | ||
"external_id": biospecimen.external_aliquot_id, | ||
} | ||
) | ||
# Set visibility params based on Biospecimen which represents both | ||
# sample and containers | ||
params.update( | ||
_get_visibility_params(biospecimen) | ||
) | ||
try: | ||
container = ContainerSchema(strict=True).load(params).data | ||
except ValidationError as e: | ||
abort(400, 'could not create container: {}'.format(e.messages)) | ||
|
||
return container | ||
|
||
|
||
def _update_container(current_container, biospecimen, sample): | ||
""" | ||
Update Container using specific Biospecimen attributes. | ||
Link Container to its associated biospecimen and sample | ||
Validate Container | ||
""" | ||
# Extract the parameters that uniquely identify a container | ||
params = _get_container_identifier(biospecimen) | ||
# Add remaining container attributes | ||
params.update( | ||
{ | ||
"biospecimen_id": biospecimen.kf_id, | ||
"sample_id": sample.kf_id, | ||
"volume_ul": biospecimen.volume_ul, | ||
"external_id": biospecimen.external_aliquot_id, | ||
} | ||
) | ||
# Set visibility params based on Biospecimen which represents both | ||
# sample and containers | ||
params.update( | ||
_get_visibility_params(biospecimen) | ||
) | ||
try: | ||
container = ContainerSchema(strict=True).load( | ||
params, instance=current_container, partial=True | ||
).data | ||
except ValidationError as e: | ||
abort(400, 'could not update container: {}'.format(e.messages)) | ||
|
||
return container | ||
|
||
|
||
def _upsert_sample(biospecimen): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to change this approach. Read, modify, write is an anti-pattern and doesn't work with concurrent requests. Use postgresql internal upsert (update on conflict) |
||
""" | ||
Upsert Sample from specific Biospecimen attributes | ||
|
||
Try to find exisiting Sample first | ||
If it exists, update it using the Biospecimen attributes | ||
If it does not exist, create Sample using the Biospecimen attributes | ||
""" | ||
# Extract biospecimen attributes that uniquely identify a sample | ||
sample_query_params = _get_sample_identifier(biospecimen) | ||
|
||
# Find sample if it exists | ||
sample = Sample.query.filter_by(**sample_query_params).first() | ||
|
||
# Sample does not exist, create it | ||
if not sample: | ||
sample = _create_sample(biospecimen) | ||
# Sample exists, update it | ||
else: | ||
sample = _update_sample(sample, biospecimen) | ||
|
||
db.session.add(sample) | ||
db.session.commit() | ||
|
||
return sample | ||
|
||
|
||
def _upsert_container(biospecimen, sample): | ||
""" | ||
Upsert Container from specific Biospecimen attributes and link Container | ||
to its associated Sample | ||
|
||
Try to find existing Container first | ||
If it exists, update it using the Biospecimen attributes | ||
If it does not exist, create Container using the Biospecimen attributes | ||
""" | ||
# Extract biospecimen attributes that uniquely identify a container | ||
container_query_params = _get_container_identifier(biospecimen) | ||
|
||
# Find sample if it exists | ||
container = Container.query.filter_by(**container_query_params).first() | ||
|
||
# Container does not exist - create it | ||
if not container: | ||
container = _create_container(biospecimen, sample) | ||
# Container exists - update it | ||
else: | ||
container = _update_container(container, biospecimen, sample) | ||
|
||
db.session.add(container) | ||
db.session.commit() | ||
|
||
return container | ||
|
||
|
||
def _update_sample_volume(sample_id): | ||
""" | ||
Update Sample's volume with the sum of all of its container volumes | ||
""" | ||
# Accumulate container volumes and update sample volume | ||
sample_with_containers = Sample.query.get(sample_id) | ||
total_volume = None | ||
for ct in sample_with_containers.containers: | ||
if ct.volume_ul is None: | ||
continue | ||
if total_volume is None: | ||
total_volume = ct.volume_ul | ||
else: | ||
total_volume += ct.volume_ul | ||
|
||
sample_with_containers.volume_ul = total_volume | ||
|
||
db.session.add(sample_with_containers) | ||
db.session.commit() | ||
|
||
return sample_with_containers | ||
|
||
|
||
def manage_sample_containers(biospecimen): | ||
""" | ||
Upsert a Sample and Container from the input Biospecimen | ||
Update the sample's volume with the sum of the container volumes | ||
""" | ||
sample = _upsert_sample(biospecimen) | ||
_upsert_container(biospecimen, sample) | ||
sample = _update_sample_volume(sample.kf_id) | ||
|
||
return sample |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything in this PR makes sense to me except the idea that we might update a participant ID. I think participant ID should be part of the defining characteristics of a sample so I'm struggling to understand how we could both identify an existing sample (which implies the participant ID on the sample matches that on the specimen being registered) but then update the sample participant ID field (which implies the participant ID does not match the specimen being registered).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hold on... is this related to participant.kf_id really being the primary ID for particpant and participant_id being a sort of secondary/external ID? So we are updating the external ID if it changes but relying on the kf_id/PK for confirming the sample already exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@calkinsh Yep, the primary key for participant is
participant.kf_id
and the sample has a foreign key to itsample.participant_id
so I think that does make it a defining characteristic of the sample.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we may not need the
Sample.participant_id
orSample.external_id
bc they are captured in theSample.sample_event_key
but I included them in the Sample table in case we want to populate the sample event key with something else and bc I felt it would be ok to have some redundancy to gain some clarity on which participant the sample came from and what the original biospecimen's external sample ID was