OMEinfo is an open-source bioinformatics tool designed to automate the retrieval of consistent geographical metadata for microbiome research. It provides an easy-to-use interface for researchers to obtain geographical metadata, including Köppen-Geiger climate classification, degree of rurality, population density, and fossil fuel CO2 emissions from user-provided location data. The tool aims to facilitate cross-study comparisons and promote reproducibility in microbiome research by adhering to the principles of FAIR and Open data.
Publication available now at Bioinformatics Advances: OMEinfo: Global Geographic Metadata for -omics Experiments (Note: Due to issues with the rendering of regex in the article, the current paper regular expression does not reflect the intended regular expressions for latitude and longitude. The correct expressions are: Latitude: (^[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?)$)
and Longitude: (^[-+]?((1[0-7]\d(\.\d+)?)|([1-9]?\d(\.\d+)?)|180(\.0+)?)$)
)
See here for a walkthrough of using OMEinfo with test data.
- Dash web application for user-friendly data upload and visualization
- Custom Cloud Optimized GeoTIF file hosted on Figshare for efficient data access
- Integration with open data sources, such as Global Human Settlement Layer (GHSL)
- Portable and lightweight Docker container for easy deployment
- Adheres to FAIR and Open data principles for better reproducibility and collaboration
OMEinfo is provided as a Docker container and command line tool, which can be easily set up in a local environment or on cloud-based platforms. OMEinfo has been tested to work using Rocky Linux 8.8, Windows 10 22H2 (via WSL) and MacOS 13.2.
- Install Docker on your machine following the official installation guide. NOTE: If running on Windows, Docker will also require Windows Subsystem for Linux to be installed - see the documentation here. You may also need to disable or allow WSL access to the internet in your firewall.
- Pull the Docker image from Docker-Hub:
docker pull mattcrown/omeinfo:latest
ordocker pull mattcrown/omeinfo:1.1.0
- Run the Docker container:
docker run -p 8050:8050 mattcrown/omeinfo:latest
ordocker run -p 8050:8050 mattcrown/omeinfo:1.1.0
(see Usage for for more parameters when running the docker container).
- Install Docker on your machine following the official installation guide. NOTE: If running on Windows, Docker will also require Windows Subsystem for Linux to be installed - see the documentation here. You may also need to disable or allow WSL access to the internet in your firewall.
- Clone this repository:
git clone https://github.com/m-crown/OMEinfo.git
- Navigate to the project app directory:
cd OMEinfo/OMEinfo
- Build the Docker image:
docker build -t omeinfo .
Note: you may need to prefix this command with sudo. - Run the Docker container:
docker run -p 8050:8050 omeinfo
(see Usage for more details)
- Install mamba.
- Clone this repository:
git clone https://github.com/m-crown/OMEinfo.git
cd OMEinfo/OMEinfo
- Create a mamba environment using the .yml file:
conda_cli_requirements.yml
:mamba env create --file conda_cli_requirements.yml
Note The fileconda_requirements.yml
is used in the Docker container and writes the base environment. It is not recommended to use this file for CLI usage. - Activate the conda environment:
mamba activate omeinfo
- Copy OMEinfo to the environment bin:
cp omeinfo.py $CONDA_PREFIX/bin/
- Copy Rurality and Koppen-Geiger legends to bin:
cp *.txt $CONDA_PREFIX/bin/
(see Usage for more details)
- Run the Docker container:
- For default mode:
docker run -p 8050:8050 omeinfo
ordocker run -p 8050:8050 mattcrown/omeinfo:latest
if you pulled the image from Docker Hub. - To specify a specific OMEinfo data version:
docker run -p 8050:8050 -e OMEINFO_VERSION data_version omeinfo
where data version may be 1.0.0 or 2.0.0
- For default mode:
- Open the OMEinfo web application in your browser at
http://localhost:8050
. - Upload a CSV or TSV file containing geolocation data (latitude and longitude) using the provided interface. A test addresses file is distributed with the OMEinfo GitHub repo,
OMEinfo/test_data/test_addresses.tsv
, which provides example locations covering a variety of possible annotations. Download this file or clone the repo to use it within the Docker app (or CLI). NOTE: if downloading the file, use this link for the raw file, and be aware that some browsers may add a.txt
suffix to the file. Be sure to upload CSV or TSV files with a.csv
or.tsv
extension for compatibility. - The application will retrieve the geographical metadata for the uploaded locations and display the results on a map and in a histogram.
- You can choose to display metadata features as the colour coding on the map and as the histogram's x-axis.
- A table with the processed data is also provided for further analysis.
- When finished using OMEinfo app, stop the Docker container using
docker stop <container_id_or_name>
where<container_id_or_name>
is the path of your container instance e.g.omeinfo
if built locally ormattcrown/omeinfo:latest
if running an image from Docker Hub. You can list running containers in Docker usingdocker ps
.
Running the command line tool requires only a single command. Assuming you want to analyse the test addresses file from the GitHub repo, and are currently in the directory containing this file, run the following command:
omeinfo.py --location_file test_addresses.tsv
Upon running the command, a summary of the samples to be processed, and the versions of CLI tool and data packet being used will be presented. Upon completion, a table of the first 10 samples analysed will be shown, and a file with annotated metadata will be saved to the current directory, together with a BibTeX citation file of all citations necessary for crediting data authors.
The full command line parameters are presented below:
usage: omeinfo.py [-h] [--location_file LOCATION_FILE] [--location LOCATION] [--data_version DATA_VERSION] [--source_data SOURCE_DATA] [--output_file OUTPUT_FILE] [--n_samples N_SAMPLES] [--quiet QUIET]
The OMEinfo command-line tool enables users to annotate geographical metadata, including Koppen climate classification, degree of rurality, population density, and fossil fuel CO2 emissions, from user-provided location data. The tool
offers options for selecting the data version and the data source. Annotations are stored in a specified output file in TSV format.
options:
-h, --help show this help message and exit
--location_file LOCATION_FILE
file containing locations
--location LOCATION location in latitude,longitude EPSG:4326 format, input string in format 'sample,latitude,longitude'
--data_version DATA_VERSION
version of data to use
--source_data SOURCE_DATA
url to data or filepath to local version
--output_file OUTPUT_FILE
name of output file
--n_samples N_SAMPLES
number of output summary table samples to show in command line
--quiet QUIET suppress console output
By default, OMEinfo runs analyses with a version of the data packet stored in the cloud (currently, via Figshare). It is also possible to run OMEinfo using a locally stored version of the data packet, should the remote version become unavailable.
For the Dash app, build as normal, or download from Docker hub, and change directory to the location where the local version of the data packet is stored. On execution add the following parameters:
docker run -p 8050:8050 mattcrown/omeinfo:latest -v $PWD:/data/ -e OMEINFO_URL=/data/[DATA_PACKET_HERE] omeinfo
$PWD can also be replaced with the fully resolved path to the directory in which the data packet is stored on your machine. Replace [DATA_PACKET_HERE] with the filename(s) of the data packet files necessary for analysis. For example, if running the OMEinfo v2 data packet locally (a single file) the command would look like this:
docker run -p 8050:8050 -v $PWD:/data/ -e OMEINFO_URL=/data/omeinfo_v2.tif -e OMEINFO_VERSION=2.0.0 mattcrown/omeinfo:latest
and from the CLI tool:
omeinfo.py --data_version 2.0.0 --source_data omeinfo_v2.tif --location_file test_addresses.tsv
If running the OMEinfo v1 data packet, it would instead look like this:
docker run -p 8050:8050 -v $PWD:/data/ -e OMEINFO_URL=/data/rurpopkop_v1_cog.tif,/data/co2_v1_cog.tif,/data/no2_v1_cog.tif -e OMEINFO_VERSION=1.0.0 mattcrown/omeinfo:latest
and from the CLI tool (assuming you are currently in the directory with the data files and test data file):
omeinfo.py --data_version 1.0.0 --source_data rurpopkop_v1_cog.tif,co2_v1_cog.tif,no2_v1_cog.tif --location_file test_addresses.tsv
With the v1 data packet, it is important to specify files as a single comma-separated string, in the order RurPopKop file, CO2 file, NO2 file.
File Name | File URL | Description |
---|---|---|
omeinfo_v2.tif | Figshare | All data sources unified in a single WGS84 COG. Additionally includes relative deprivation on top of V1 data sources. |
File Name | File URL | Description |
---|---|---|
co2_v1_cog.tif | Figshare | Fossil Fuel CO2 Emissions |
rurpopkop_v1_cog.tif | Figshare | Rurality, Population Density, and Koppen-Geiger Climate Classification |
no2_v1_cog.tif | Figshare | Tropospheric NO2 Emissions |
For details on the process for the creation of the current data sources, see the explanation here
Data Type | Spatial Extents |
---|---|
Rurality | Upper Left: -179.999, 89.091 Lower Left: -179.999, -89.094 Upper Right: 179.997, 89.091 Lower Right: 179.997, -89.094 |
Population Density | Upper Left: -179.999, 89.091 Lower Left: -179.999, -89.094 Upper Right: 179.997, 89.091 Lower Right: 179.997, -89.094 |
Koppen Geiger Climate Classification | Upper Left: -180.00, 90.00 Lower Left: -180.00, -90.00 Upper Right: 180.00, 90.00 Lower Right: 180.00, -90.00 |
Fossil Fuel CO2 Emissions | Upper Left: -180.00, 90.00 Lower Left: -180.00, -90.00 Upper Right: 180.00, 90.00 Lower Right: 180.00, -90.00 |
Tropospheric NO2 Emissions | Upper Left: -180.00, 90.00 Lower Left: -180.00, -90.00 Upper Right: 180.00, 90.00 Lower Right: 180.00, -90.00 |
Relative Deprivation | Upper Left: -180.00, 82.183 Lower Left: -180.00, -55.983 Upper Right: 179.816, 82.183 Lower Right: 179.816, -55.983 |
OMEinfo v2 Data Packet Combined | Upper Left: -180.00, 90.00 Lower Left: -180.00, -89.998 Upper Right: 179.996, 90.00 Lower Right: 179.996, -89.998 |
Data Source | Citation | DOI |
---|---|---|
Fossil Fuel CO2 emissions data | Tomohiro Oda, Shamil Maksyutov (2015), ODIAC Fossil Fuel CO2 Emissions Dataset (Version name: ODIAC2020b), Center for Global Environmental Research, National Institute for Environmental Studies | 10.17595/20170411.001 |
Köppen-Geiger Climate Classification | Beck, H., Zimmermann, N., McVicar, T. et al. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Sci Data 5, 180214 (2018) | 10.1038/sdata.2018.214 |
Population Density | Schiavina, Marcello; Freire, Sergio; MacManus, Kytt (2019): GHS population grid multitemporal (1975, 1990, 2000, 2015) R2019A. European Commission, Joint Research Centre (JRC) | European Commission |
Rurality | Pesaresi, Martino; Florczyk, Aneta; Schiavina, Marcello; Melchiorri, Michele; Maffenini, Luca (2019): GHS settlement grid, updated and refined REGIO model 2014 in application to GHS-BUILT R2018A and GHS-POP R2019A, multitemporal (1975-1990-2000-2015), R2019A. European Commission, Joint Research Centre (JRC) | European Commission |
Tropospheric NO2 Emissions data | Romahn, Pedergnana, Loyola, Apituley, Sneep and Veefkind (2022): Sentinel-5 Precursor/TROPOMI Level 2 Product User Manual: Cloud Properties | ESA Sentinel 5P |
Relative Deprivation Index | NASA Socioeconomic Data and Applications Center (SEDAC) (2022) | SEDAC |
Download the current citations in BibTeX format.
Past citations can be found in BibTeX format in the citations directory of OMEinfo.
OMEinfo is released under the MIT License. By using OMEinfo, you agree to the terms and conditions of this license. See the LICENSE
file in this repo for more information.
If you encounter any issues or have questions about using OMEinfo, please create an issue on this repo.