-
Notifications
You must be signed in to change notification settings - Fork 6
2.2.1 Tutorial: Start your own instances of LAPIS and SILO
Every LAPIS instance needs to be backed by a SILO instance, that acts as data source. SILO could be operated stand-alone. LAPIS is meant as a layer of convenience and abstraction around SILO.
We provide Docker images of SILO and LAPIS that are ready to use. We recommend using those Docker images, so in this tutorial, we explain how to use them.
- You have Docker installed.
- Some knowledge how to use Docker and Docker Compose.
- Make sure you have the latest Docker images:
docker pull ghcr.io/genspectrum/lapis-v2 && docker pull ghcr.io/genspectrum/lapis-silo
- Create a directory for the example:
mkdir ~/lapisExample
Both LAPIS and SILO need to know which metadata columns are available in the dataset. Furthermore, you need to define which column acts as primary key and which column should be used to generate partitions in SILO. Also we configure LAPIS to be an open instance, meaning that the underlying data requires no visibility restrictions.
~/lapisExample/config/databaseConfig.yaml
:
schema:
instanceName: testInstance
metadata:
- name: gisaid_epi_isl
type: string
- name: date
type: date
- name: region
type: string
generateIndex: true
- name: country
type: string
generateIndex: true
- name: division
type: string
generateIndex: true
- name: pango_lineage
type: pango_lineage
- name: age
type: int
- name: qc_value
type: float
opennessLevel: OPEN
primaryKey: gisaid_epi_isl
dateToSortBy: date
partitionBy: pango_lineage
SILO currently supports the following metadata types:
int
float
-
string
: String columns support indexing (configured viagenerateIndex: true
). SILO internally stores precomputed bitmaps for those columns to speed up queries. Generating an index makes most sense for columns with many equal values. -
pango_lineage
: Systematic classification of lineage with inheritance structure that can be computed for some pathogens. Also see https://github.com/GenSpectrum/LAPIS/wiki/4.6-Pango-lineage-query. -
date
: Values must be valid dates in the formYYYY-MM-DD
. -
insertion
: A comma separated list of insertions. Each insertion has the form<position>:<symbols>
. Example value:123:CCG,501:AAAGGG
.
This section might change soon, as we're currently reworking how SILO is supposed to be started
Download the example dataset from the SILO repository:
- pangolineage_alias.json
- reference_genomes.json
- small_metadata_set.tsv
- fasta files for the sequences
SILO expects fasta files (possibly compressed via zstandard or xz) in the same directory with naming scheme nuc_<sequence_name>.fasta
for nucleotide sequences or gene_<sequence_name>.fasta
for amino acid sequences. The sequence_names
s have to match the names defined in the reference_genomes.json
.
Put those files into the folder ~/lapisExample/data/
.
Now SILO needs to know where it can find those files. You have to provide a config for that. Note that you need to provide the paths where the files will be stored in the Docker container.
~/lapisExample/config/siloConfig.yaml
:
inputDirectory: "/data/"
outputDirectory: "/data/output/"
metadataFilename: "small_metadata_set.tsv"
pangoLineageDefinitionFilename: "pangolineage_alias.json"
referenceGenomeFilename: "reference_genomes.json"
Start the SILO Docker container with the options:
- expose port 8081 to the host.
- mount the config into the container.
- mount the data into the container.
- provide the path to the SILO config.
- provide the path to the database config.
The following command puts it all together:
docker run --detach \
--publish 8081:8081 \
--volume ~/lapisExample/config:/app/config \
--volume ~/lapisExample/data:/data \
ghcr.io/genspectrum/lapis-silo \
--api \
--preprocessingConfig=/app/config/siloConfig.yaml \
--databaseConfig=/app/config/databaseConfig.yaml
Now SILO should be available at http://localhost:8081.
Now you can start LAPIS. You have to:
- expose port 8080 to the host.
- mount the previously created database configuration into the Docker container.
- provide LAPIS with the SILO URL.
- tell LAPIS where to find the database configuration.
The following command puts it all together:
docker run --detach \
--publish 8080:8080 \
--volume ~/lapisExample/config/databaseConfig.yaml:/workspace/databaseConfig.yaml \
ghcr.io/genspectrum/lapis-v2 \
--silo.url=http://localhost:8081 \
--lapis.databaseConfig.path=/workspace/databaseConfig.yaml
Now LAPIS should be available at http://localhost:8080. LAPIS offers a Swagger UI that serves as a good starting point for exploring it's functionalities.
We recommend using Docker Compose to start LAPIS and SILO. The above docker run
commands can be combined into a docker-compose.yaml
file:
~/lapisExample/docker-compose.yaml
version: "3.9"
services:
lapis:
image: ghcr.io/genspectrum/lapis-v2
ports:
- "8080:8080"
command: --silo.url=http://silo:8081 --lapis.databaseConfig.path=/workspace/databaseConfig.yaml
volumes:
- type: bind
source: ~/lapisExample/config/databaseConfig.yaml
target: /workspace/databaseConfig.yaml
read_only: true
- type: bind
source: ~/lapisExample/logs
target: /workspace/log
silo:
image: ghcr.io/genspectrum/lapis-silo
ports:
- "8081:8081"
command:
- "--api"
- "--preprocessingConfig=/app/config/siloConfig.yaml"
- "--databaseConfig=/app/config/databaseConfig.yaml"
volumes:
- type: bind
source: ~/lapisExample/logs
target: /data/logs
- type: bind
source: ~/lapisExample/config
target: /app/config
- type: bind
source: ~/lapisExample/data
target: /data
This requires a logs directory: mkdir ~/lapisExample/logs
. Then LAPIS and SILO can be started via
cd ~/lapisExample
docker compose up -d
Logs from LAPIS and SILO will be available in the previously created logs directory.
- Documentation of SILO in its GitHub repository
- Our tests provide a working example: https://github.com/GenSpectrum/LAPIS/tree/main/siloLapisTests. It might not cover all details, but the CI makes sure that it works.