-
Notifications
You must be signed in to change notification settings - Fork 31
PublicSequenceResource
Coordinator: Thomas Liener
One recurring idea is to create an uploader where raw data from a sequencer (long reads and short reads) is loaded onto a backend and mapped using traditional tools as well as the variation graph/pangenome tools. Next a visualization is generated of the viral strain in comparison with data we already have in the database. Furthermore, phenotypes that we have and metadata can be presented at the same time, to show how this viral strain relates to other strains, geo info, clinical info, treatment info - anything that we have and that can be linked out. Obviously the uploaded data becomes part of the whole.
The justification of such an uploader is easy. Currently there is no system that handles ontologies well. Currently there is no system that allows for on-the-fly analysis of raw data.
Mind, this is a pretty large project! But if we split it into small parts where each group owns subsections we should be able to put it together and make a working prototype. When the full application works we can improve after the BioHackathon and encourage data providers to add their material. As a BioHackathon we can get a high impact paper out of such a project though that is not the primary goal.
We can discuss subtasks here and ask for group coordinators for each subtask to work out what needs to be done? Subtasks we identify:
- Uploader with authentication, uploading fastq or BAM, add known (clinical) phenotypes. Study usage of Phenopackets as standard for phenotypic data submission. Going the other direction, this may be useful: omopomics
- Create workflow for traditional analysis (coordinator Michael R. Crusoe)
- Create workflow for vgtools (coordinator Michael R. Crusoe)
- Run workflows in cloud/HPC (coordinator Michael R. Crusoe)
- Store results in persistent storage (coordinator Michael R. Crusoe)
- Metadata and ontologies (interim? coordinator Thomas Liener)
- Define and query linked data (wikidata) (interim? coordinator Thomas Liener)
- Create visualization (interim? coordinators Simon Heumos and Josiah Seaman - We are two PhD students familiar with Javascript, React and Mobx-State-Tree. But far from Javascript Champions. Any help and tips highly appreciated!)
- Create output website
- Coordinate with existing efforts (e.g. NextStrain, ELIXIR, others) to be able to port data back and forth! Ben Busby -- if anyone has contacts that they want to share -- please do so in slack and tag me!
All items (1-9) Vanessasaurus
Does that sound reasonable? Other tasks may be
- Deploy graph store, database, IPFS (coordinator Pjotr Prins)
- Deploy cloud/HPC workflow runner (coordinator Pjotr Prins)
- Deploy web interfaces (coordinator Pjotr Prins)
The Galaxy team already has put some things in place and we may be able to collaborate on this. Galaxy team, wdyt?