Skip to content
This repository has been archived by the owner on Oct 28, 2022. It is now read-only.

Metadata integration - bio_characteristics and location, mostly #875

Closed
wants to merge 62 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
beeeb2f
skeleton implementation of BioCharacteristics objects
Oct 7, 2016
c6df995
inline documentation for BioCharacteristics objects
mbaudis Oct 7, 2016
50ebd00
cleanup
mbaudis Oct 7, 2016
bb3a832
documentation
mbaudis Oct 7, 2016
a50b5da
whitespace
mbaudis Oct 7, 2016
b4ebfa2
Merge branch 'master' into metadata-bio-characteristics
Oct 16, 2016
0bc6659
BioCharacteristics update
Oct 17, 2016
678fd50
BioSample consistency
Oct 17, 2016
0f96a31
character fix
Oct 17, 2016
053c685
Merge pull request #725 from ga4gh/metadata-bio-characteristics
mbaudis Oct 18, 2016
8b490ca
Merge branch 'master' into metadata-integration
mbaudis Oct 27, 2016
2a105c4
Merge branch 'master' into metadata-integration
mbaudis Nov 16, 2016
ed01f6f
Merge branch 'master' into metadata-integration
mbaudis Jan 13, 2017
65267bb
Merge branch 'master' into metadata-integration
mbaudis Jan 13, 2017
077c2c7
Merge branch 'master' into metadata-integration
mbaudis Jan 24, 2017
0972225
Merge branch 'master' into metadata-integration
Feb 3, 2017
8c62893
Merge fix
Feb 3, 2017
c0c7db6
Integrating external identifiers
Feb 6, 2017
a8ce630
fixing numbering SNAFU
mbaudis Feb 6, 2017
0537b24
Merge pull request #807 from ga4gh/metadata-external_identifiers
mbaudis Feb 18, 2017
a2ed1c6
Merge branch 'master' into metadata-integration
mbaudis Mar 2, 2017
080ffaf
refactoring BioCharacteristics
mbaudis Mar 2, 2017
daef5a8
snake_case
mbaudis Mar 2, 2017
e374be9
Merge branch 'metadata-integration' into metadata-modify-biocharacter…
mbaudis Mar 2, 2017
8e00293
Merge branch 'master' into metadata-integration
mbaudis Mar 14, 2017
6e7a1ef
Merge branch 'metadata-integration' into metadata-modify-biocharacter…
mbaudis Mar 14, 2017
a57f8c5
geodata object and implementations
mbaudis Mar 14, 2017
4c52659
attribute type fix
Mar 21, 2017
a0715db
fixing example address
mbaudis Mar 24, 2017
146308f
geodata object and implementations (#864)
mbaudis Mar 29, 2017
4262078
Merge branch 'master' into metadata-integration
Mar 31, 2017
a67c815
Commit cleanup
Mar 31, 2017
990aa1a
renaming label
mbaudis Apr 11, 2017
25cd87c
Merge branch 'master' into metadata-geoobjects
mbaudis Apr 11, 2017
6ae798f
Adding location to Biosample
mbaudis Apr 11, 2017
f4c1d69
Merge branch 'metadata-integration' into metadata-geoobjects
mbaudis Apr 11, 2017
3a4ecda
fixed documentation
mbaudis Apr 11, 2017
7ee2f11
added "source" to doc.
mbaudis Apr 11, 2017
423af58
BioCharacteristic scope
mbaudis Apr 12, 2017
432e8e4
duplicate location attribute fix
mbaudis Apr 12, 2017
284a6cc
numbering re-alignment
mbaudis Apr 12, 2017
6b4e23c
Merge pull request #872 from ga4gh/metadata-geoobjects
mbaudis Apr 12, 2017
542daeb
Merge pull request #836 from ga4gh/metadata-modify-biocharacteristics
mbaudis Apr 12, 2017
a3824a0
documentation adjustment
mbaudis Apr 12, 2017
4b6d127
Merge branch 'master' into metadata-modify-biocharacteristics
mbaudis Apr 12, 2017
c52c5d1
consistency of BioCharacteristic use
mbaudis Apr 12, 2017
0a4f454
documentation change
mbaudis Apr 12, 2017
e8ecedd
remove broken link
mbaudis Apr 12, 2017
c518b76
fixed BioCharacteristic use
mbaudis Apr 21, 2017
eeb62a8
Merge branch 'metadata-integration' into metadata-modify-biocharacter…
mbaudis Apr 21, 2017
4ae1b62
Merge pull request #873 from ga4gh/metadata-modify-biocharacteristics
mbaudis Apr 21, 2017
6424014
Updating the "characteristics" description in "Individual"
mbaudis Apr 21, 2017
a9c8f35
Numbering fix
Apr 25, 2017
12d6e73
Merge branch 'master' into metadata-integration
Apr 25, 2017
3e3d1a8
bio_characteristics instead of characteristics
Apr 26, 2017
6e413de
Rectifying ExternalIdentifier notes
May 1, 2017
705e0ff
Documentation fix
May 1, 2017
d48d936
documentation
mbaudis May 17, 2017
b4fb718
Merge branch 'master' into metadata-integration
mbaudis Sep 4, 2017
e907f2e
GeoLocation modification
mbaudis Sep 5, 2017
70bed83
some metagenomic notes
mbaudis Sep 7, 2017
d2b75b0
Merge branch 'master' into metadata-integration
mbaudis Sep 8, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -84,3 +84,5 @@ target/

#********* Java artifacts *******
*.class

*.py
27 changes: 22 additions & 5 deletions doc/source/api/biometadata.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
.. _biometadata:

.. image:: /_static/biometadata_schema.svg
:width: 184 px
:align: right

.. _biometadata_biosample:

*******************************
Expand Down Expand Up @@ -37,7 +33,7 @@ Attribute Notes
*name* * a human readable object label/identifier
* not to be used for referencing
*description* * additional, unstructured information about this Biosample
*disease* * OntologyTerm annotating the disease of the sample
*bio_characteristics* * contains lists of phenotypes, diseases and other information associated with this Biosample, in the form of BioCharacteristic objects
*individualId* * the *id* of the *Individual* this Biosample was derived from
*created* * the time the record was created, in ISO8601
*updated* * the time the record was updated, in ISO8601
Expand All @@ -57,6 +53,11 @@ An *Individual* is a GA4GH data object representing a biological instance
(most commonly a human being or other individual organism) on whose *Biosamples*
experimental analyses are performed.

In the case of metagenome analyses (i.e. when the biosample consists of
material like patient derived sputum, analyzed for its microbial content), the
"species" context would still be the host; identified guest species would
be described in analysis results.

Individual attributes
=====================

Expand All @@ -69,9 +70,25 @@ Attribute Notes
*name* * a human readable object label/identifier
* not to be used for referencing
*description* * additional, unstructured information about this Individual
*bio_characteristics* * contains lists of phenotypes, diseases and other information associated with this Individual, in the form of BioCharacteristic objects
*species* * OntologyTerm representing the species (NCBITaxon:9606)
*sex* * OntologyTerm for the genetic sex of this individual.
*created* * the time the record was created, in ISO8601
*updated* * the time the record was updated, in ISO8601
*attributes* * additional, structured information
===================== ==========================================================

.. _biometadata_BioCharacteristic:

***************************************
BioMetadata: *BioCharacteristic* Object
***************************************

BioCharacteristic in the GA4GH Schema
-------------------------------------

A BioCharacteristic is an object, defining a single phenotype or diagnosis
through the use of a free text description and a representation by one or
more "ontologyTerms" objects as well as zero or more "negatedOntologyTerms".
An additional "scope" attribute allows to limit queries e.g. to "disease" type
objects.
33 changes: 13 additions & 20 deletions doc/source/appendix/ontologies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ What is the minimum attribute requirement for OntologyTerm in GA4GH?

Conceptually (and consistent with the metadata branch)

:termId:
:term_id:
required and implemented as CURIE
we assume this resolves to a meaningful document, e.g. http://purl.obolibrary.org/obo/SO_0000147, using a prefix mapper, e.g. SO: <=> http://purl.obolibrary.org/obo/SO_
:term:
Expand Down Expand Up @@ -80,45 +80,41 @@ Examples
Genotypic sex
=============

:termId:
:term_id:
"PATO:0020001",
:term:
"male genotypic sex" ,



Sequence Ontology
=================

:termId:
:term_id:
"SO:0001583",
:term:
"missense_variant",



Human Phenotype ontology
========================

:termId:
:term_id:
"HP:0000819",
:term:
"Diabetes mellitus",

"Diabetes mellitus",

----

:termId:
"HP:0012059",
:term_id:
"HP:0012059",
:term:
"Lentigo maligna melanoma",

"Lentigo maligna melanoma",


Body part (Uberon)
==================

:termId:
:term_id:
"UBERON:0003403",
:term:
"skin of forearm",
Expand All @@ -127,7 +123,7 @@ Body part (Uberon)
Human disease ontology
======================

:termId:
:term_id:
"DOID:9351",
:term:
"diabetes mellitus",
Expand All @@ -136,26 +132,23 @@ Human disease ontology
Experimental factor ontology
============================

:termId:
:term_id:
"EFO:0000400",
:term:
"diabetes mellitus",


----

:termId:
:term_id:
"EFO:0004422",
:term:
"exome",



Unit Ontology
=============

:termId:
:term_id:
"UO:0000016",
:term:
"millimetre",

107 changes: 100 additions & 7 deletions src/main/proto/ga4gh/bio_metadata.proto
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,17 @@ message Individual {
// The "description" attributes should not contain any structured data.
string description = 4;

// The bio_characteristics attribute uses lists of BioCharacteristic objects to describe
// diseases, phenotypes ... associated with this Individual. The values here may
// overlap with values recorded in related records. As general rule, "germline"
// or non-organ specific afflictions ("Li-fraumeni Syndrome", "Diabetes mellitus"
// should be recorded here, whereas "Infiltrating duct carcinoma" may be both
// recorded here (as known disease history)
// as well as in the "Biosample.bio_characteristics" attribute.
// For querying all disease contexts related to an individual, therefore queries
// should cover ($in) [Individual.bio_characteristics OR Biosample.bio_characteristics].
repeated BioCharacteristic bio_characteristics = 9;

// The :ref:`ISO 8601<metadata_date_time>` time at which this Individual's record
// was created.
string created = 5;
Expand All @@ -46,9 +57,19 @@ message Individual {
// sex : { term_id: "PATO:0020000", term : "female genetic sex" }
OntologyTerm sex = 8;

// The address coded as geolocation where this individual originated from.
// It is recommended that this reflects the place of birth or main place of
// living, not necessarily a current address.
GeoLocation location = 13;

// A map of additional information regarding the Individual.
Attributes attributes = 10;

// External identifiers representing this individual. These are considered
// different representation of the same record, not records which are in some
// other relation with the record at hand.
repeated ExternalIdentifier external_identifiers = 11;

}


Expand Down Expand Up @@ -77,14 +98,15 @@ message Biosample {
// The "description" attributes should not contain any structured data.
string description = 4;

// OntologyTerm describing the primary disease associated with this Biosample.
OntologyTerm disease = 5;
// characteristics object uses lists of BioCharacteristic objects to describe
// diseases, phenotypes, source ... associated with this BioSample.
repeated BioCharacteristic bio_characteristics = 5;

// The :ref:`ISO 8601<metadata_date_time>` time at which this Biosample record
// was created.
string created = 6;

// The :ref:`ISO 8601<metadata_date_time>` time at which this Biosample record was
// The :ref:`ISO 8601<metadata_date_time>` time at which this Biosample record was
// updated.
string updated = 7;

Expand All @@ -94,21 +116,36 @@ message Biosample {
// A map of additional information about the Biosample.
Attributes attributes = 10;

// External identifiers representing this biosample. These are considered
// different representation of the same record, not records which are in some
// other relation with the record at hand.
repeated ExternalIdentifier external_identifiers = 11;

// An age object describing the age of the individual this biosample was
// derived from at the time of collection. The Age object allows the encoding
// of the age either as ISO8601 duraion or time interval (preferred), or
// as ontology term object.
Age individual_age_at_collection = 11;
// Example:
// "individual_age_at_collection": {
// "age": "P12Y0M",
// "age_class": {
// "term": "Juvenile onset",
// "term_id": "HP:0003621"
// }
// },
Age individual_age_at_collection = 12;

// The address coded as GeoLocation where the biosample was collected.
GeoLocation location = 13;
}


// The age object permits both the (considered default) encoding of an age
// value in ISO8601, with arbitrary granularity; and the representation of an
// "age class" as qualitative ontology term.
// If available, a quantitative value should be used & take precedence over the
// age class, and class assignment should be performed at user/API level.

message Age {

// The :ref:`ISO 8601<metadata_date_time>` age of this object as ISO8601
// duration or time intervals. The use of time intervals makes an additional
// anchor unnecessary (i.e. DOB and age can be represented as start-anchored
Expand All @@ -117,9 +154,65 @@ message Age {
string age = 1;

// An age class, e.g. corresponding to the use of "age of onset" in HPO.
// HPO is recommended, for example, subclasses of
// HPO is recommended, for example, subclasses of "Onnset":
// http://purl.obolibrary.org/obo/HP_0003674
// Example:
// age_class : { term_id : "HP:0003596", term : "Middle age onset" }
OntologyTerm age_class = 2;
}

// BioCharacteristic is a prototype wrapper object for single instances
// of phenotypes, diseases ... which may be described through one or several
// ontology terms
message BioCharacteristic {
// A free text description of the specific disease diagnosis or phenotype
// here, which is then characterized by zero or more OntologyTerm objects.
// The description should be concise and should not include data points
// better expressed through specific attributes elsewhere in the schema.
// Example (for a single disease item):
// "squamous cell carcinoma, base of tongue, stage 2"
string description = 1;

// The ontologyTerms attribute contains a list of zero (discouraged) or more
// OntologyTerm objects covering the characteristic (e.g. disease diagnosis,
// phenotype) reorted here.
// Example (for a single diagnosis "squamous cell carcinoma, base of tongue"):
//
// term_id: "DOID:0050865",
// term: "tongue squamous cell carcinoma",
//
// term_id: "UBERON:0006919",
// term: "tongue squamous epithelium",
//
// term_id: "UBERON:0010033",
// term: "posterior part of tongue",
//
repeated OntologyTerm ontology_terms = 2;

// negatedOntologyTerms are used to describe features which are explicitely
// not part of the BioCharacteristic.
// Example: For a phenotype
//
// description: "Bilateral ventricle anomalies (but not hypertrophy)"
//
// ... one could use the ontologyTerms
//
// term_id: "HP:0001711"
// term: "Abnormality of the left ventricle"
//
// id: "HP:0001707"
// term: "Abnormality of the right ventricle"
//
// ... and add to negatedOntologyTerms
//
// term_id: "HP:0001714"
// term: "Ventricular hypertrophy"
//
repeated OntologyTerm negated_ontology_terms = 3;

// Logical scope of this BioCharacteristic. Typical examples
// could be "phenotype", "disease", "observation", "source".
// TODO: This may be modified into an enumeration or expressed through
// an OntologyTerm.
string scope = 4;
}
Loading