Skip to content

Latest commit

 

History

History
27 lines (23 loc) · 3.36 KB

evidence-of-identifier-pain.md

File metadata and controls

27 lines (23 loc) · 3.36 KB

This is a list of real-world identifier issues encountered; it aims to be representative rather than exhaustive. This list could be used to

  • Convince funders of the problem
  • Provide a set of references for a paper or specification
  • See what can be done to improve informatics/tooling around identifiers

We warmly welcome anyone to contribute.

Reported by Reported about Problems referenced Problem category
EBI-Ontology Lookup Service (OLS) various ontologies underscore delimited vs colon-delimited forms, case sensitivity search, delimiters
Not clear Darwin Core Triples institutional code collisions amongst darwin core triples collisions, institution identifiers
PrefixCommons NCBI number of shortform and http URI permutations found in the wild for a single identifier in NCBI gene data integration, text mining
General (wikipedia entry) Web-at-large 17 different ways in which URLs could be determined to be equivalent; some of these are lossy data integration
biostars HGNC Mapping between similar entities across databases mapping
Human Phenotype Ontology OMIM Prefix heterogeneity OMIM vs MIM. Have to build special processors to collapse them prefix variation, data integration
Monarch Initiative TAIR TAIR prefix variation difficult to resolve type-specificity
Stian EU grants No obvious documentation for permalinks in EU grants, nor any correlation between destination URL and project ID documentation
H pylori paper HP Protein identifiers Naming problems that result from embedded meaning in identifiers and evolving scientific knowledge. Embedded meaning
PrefixCommons HGNC co-occuring identifier complexities in HGNC (multiple entity types, multiple identifier types, prefixed/unprefixed versions, type-specific URLs without type-specific determinism in local IDs) type-specificity
WebProNews EBAY need for location-independent ids data integration
PrefixCommons ZENODO No rollup to impact for all DOI versions DOI versions
Monarch Initiative Monarch's ingest of FlyBase Faulty ingest process resulted in fly and human genes being considered equivalents instead of orthologs. Data integration
Monarch Initiative EBI-OLS Tricky to support searches of identifiers because of standard query-parsing behavior of solr. Data applications
Ziemann et al Several journals Gene name corruption in supplementary data affects 20% of papers Data quality