metaquest
is a command-line tool designed to help users search through all SRA datasets to find containment of specified genomes. By analyzing the metadata information, it provides insights into where different species may be found.
- Clone the repository:
git clone https://github.com/FOI-Bioinformatics/MetaQuest.git
cd MetaQuest
- Install the requirements:
pip install -r requirements.txt
- Install MetaQuest:
python setup.py install
First, visit https://branchwater.jgi.doe.gov/ to search and download containment files for your genomes of interest. Save these CSV files to a designated folder.
Process the downloaded files to prepare them for the MetaQuest pipeline:
metaquest use_branchwater --branchwater-folder /path/to/branchwater/files --matches-folder matches
branchwater-folder
: The directory where Branchwater CSV files are located.matches-folder
: The directory where the processed files will be saved.
You can extract basic metadata directly from Branchwater CSV files without downloading from NCBI:
metaquest extract_branchwater_metadata --branchwater-folder /path/to/branchwater/files --metadata-folder metadata
After processing the Branchwater files, you can summarize the results:
metaquest parse_containment --matches_folder matches --parsed_containment_file parsed_containment.txt --summary_containment_file summary_containment.txt --step_size 0.05 --file_format branchwater
Example output: summary.txt and containment.txt
For more comprehensive metadata, you can download it from NCBI:
metaquest download_metadata --matches_folder matches --metadata_folder metadata --threshold 0.95 --email [EMAIL]
matches_folder
: Directory containing match files.metadata_folder
: Directory where the metadata files will be saved.threshold
: Only consider matches with containment above this threshold.
Once the metadata is downloaded, you can parse it to generate a more concise and readable format:
metaquest parse_metadata --metadata_folder metadata --metadata_table_file parsed_metadata.txt
Example output: parsed_metadata.txt
This step helps in understanding the distribution of metadata attributes:
metaquest check_metadata_attributes --file-path parsed_metadata.txt --output-file parsed_metadata_overview.txt
Example output: parsed_metadata_overview.txt
This step helps in understanding the distribution of genomes across different datasets:
metaquest count_metadata --summary-file parsed_containment.txt --metadata-file parsed_metadata.txt --metadata-column Sample_Scientific_Name --threshold 0.95 --output-file genome_counts.txt
Example output: genome_counts.txt
To analyze a single sample from the summary, you can use the single_sample
command:
metaquest single_sample --summary-file parsed_containment.txt --metadata-file parsed_metadata.txt --summary-column GCF_000008985.1 --metadata-column Sample_Scientific_Name --threshold 0.95
To download the raw SRA data for accessions that match your criteria:
metaquest download_sra --accessions_file accessions.txt --fastq_folder fastq --num_threads 8 --max_workers 4
The accessions file should contain one SRA accession per line.
Plot the distribution of containment scores:
metaquest plot_containment --file_path parsed_containment.txt --column max_containment --plot_type rank --save_format png --threshold 0.05
Available plot types: rank, histogram, box, violin
Visualize the distribution of metadata attributes:
metaquest plot_metadata_counts --file_path counts_Sample_Scientific_Name.txt --plot_type bar --save_format png
Available plot types: bar, pie, radar
We welcome contributions to metaquest
! Whether you want to report a bug, suggest a feature, or contribute code, your input is valuable. Here's how to get started:
- Fork the Repository: Create your own fork of the
metaquest
repository. - Clone Your Fork: Clone your fork to your local machine and set the upstream repository.
- Create a New Branch: Make a new branch for your feature or bugfix.
- Make Your Changes: Implement your feature or fix the bug and commit your changes.
- Push to Your Fork: Push your changes to your fork on GitHub.
- Create a Pull Request: From your fork, open a new pull request in the
metaquest
repository.