Skip to content

PublicSequenceResource

Pjotr Prins edited this page Mar 24, 2020 · 23 revisions

Deliverable: a public sequence resource

One recurring idea is to create an uploader where raw data from a sequencer (long reads and short reads) is loaded onto a backend and mapped using traditional tools as well as the variation graph/pangenome tools. Next a visualization is generated of the viral strain in comparison with data we already have in the database. Furthermore, phenotypes that we have and metadata can be presented at the same time, to show how this viral strain relates to other strains, geo info, clinical info, treatment info - anything that we have and that can be linked out. Obviously the uploaded data becomes part of the whole.

The justification of such an uploader is easy. Currently there is no system that handles ontologies well. Currently there is no system that allows for on-the-fly analysis of raw data.

Mind, this is a pretty large project! But if we split it into small parts where each group owns subsections we should be able to put it together and make a working prototype. When the full application works we can improve after the BioHackathon and encourage data providers to add their material. As a BioHackathon we can get a high impact paper out of such a project though that is not the primary goal.

We can discuss subtasks here and ask for group coordinators for each subtask to work out what needs to be done? Subtasks we identify:

  1. Uploader with authentication, uploading fastq or BAM, add known (clinical) phenotypes
  2. Create workflow for traditional analysis (coordinator Michael R. Crusoe)
  3. Create workflow for vgtools (coordinator Michael R. Crusoe)
  4. Run workflows in cloud/HPC (coordinator Michael R. Crusoe)
  5. Store results in persistent storage (coordinator Michael R. Crusoe)
  6. Metadata and ontologies
  7. Define and query linked data (wikidata)
  8. Create visualisation
  9. Create output website

Does that sound reasonable? Other tasks may be

  1. Deploy graph store, database, IPFS (coordinator Pjotr Prins)
  2. Deploy cloud/HPC workflow runner (coordinator Pjotr Prins)
  3. Deploy web interfaces (coordinator Pjotr Prins)

The Galaxy team already has put some things in place and we may be able to collaborate on this. Galaxy team, wdyt?

Clone this wiki locally