Warning: This is a work in progress, probably not yet ready for use!
Varda is an application for storing genomic variation data obtained from next-generation sequencing experiments, such as full-genome or exome sequencing of individuals or populations. Variants can be imported from standard formats such as VCF files and annotated with their frequencies in previously imported datasets.
Varda is implemented as a service exposing a RESTful HTTP interface. Two clients for this interface are under development:
The following are some example use cases which Varda is designed to support.
Private exome variant database for a sequencing lab
Installed on the local network, Varda can be used to import and annotate variants from all exome sequencing experiments at a sequencing lab. Additionally, the database could contain public datasets from population studies (e.g., 1000 Genomes, Genome of the Netherlands), such that all exome experiments are also annotated with frequencies in those studies.
Shared database between several groups
Several sequencing centers can import their variants in a central Varda installation which can subsequently be used by the same centers for frequency annotation. The system can be setup such that annotation is only possible on previously imported data (to encourage sharing).
Data from one center can only be accessed anonymized by other groups, since only the frequencies over the entire databased are available. To accomodate even stricter anonymity, samples can be imported after pooling.
Publicly sharing variant frequencies from a population study
Variation data from a population study can be imported in a Varda installation accessible over the internet such that others can annotate their data with frequencies in the study.
For contrast, consider the following examples of what Varda is not designed to do.
Sharing and browsing genomic variants
Varda is focussed on sharing variant frequencies only, and as such is not designed for direct browsing. Other systems, such as LOVD, are much more suitable for sharing and browsing genomic variants and additionally store phenotypes and other metadata.
Ad-hoc exploration of genomic variation
Again, Varda is focussed on sharing variant frequencies only, and does not store additional metadata nor does it allow for effective exploration of variants. If you have variation data from a disease or population study which you want to analyse in a flexible way, have a look at gemini.
The server is implemented in Python using the Flask framework and directly interfaces the PostgreSQL (or MySQL) database backend using SQLAlchemy. It exposes a RESTful API over HTTP where response payloads are JSON-encoded.
Long-running actions are executed asynchonously through the Celery distributed task queue.
The latest documentation with installation instructions, user guide and REST server API reference, is hosted at Read The Docs.
You can also compile the documentation directly from the source code by
running make html
from the doc/
subdirectory. This requires Sphinx
to be installed.
Varda is licensed under the MIT License, see the LICENSE file for details. See the AUTHORS file for a list of authors.
The profile picture for the Varda GitHub organisation was cropped from an artist's rendition of Varda Elentári, Queen of the Stars by Dominik Matus and is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.