Cross-reference of Genomic Taxonomy
xgTaxonomy is a new method for metagenomic classification that utilizes data compression algorithms, known as compressors, to classify genomic sequences. Our two-step evaluation process shows that this approach outperforms existing methods in terms of accuracy and reliability. Additionally, combining features from multiple compressors improves classification accuracy by 26,22%. This method offers a promising strategy for improving the accuracy and reliability of metagenomic classification and provides insights into the statistical and algorithmic nature of genomic data.
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal ↩
- University of A Coruña, A Coruña, Spain ↩
- Git
- Docker and Docker-compose (if using the Docker option)
Get xgTaxonomy project using:
git clone https://github.com/bioinformatics-ua/xgTaxonomy.git
cd xgTaxonomy/
To perform installation correctly, docker and docker compose must be installed in the system (see https://docs.docker.com/engine/install/ubuntu/).
Then, follow these instructions:
git clone https://github.com/bioinformatics-ua/xgTaxonomy.git
cd xgTaxonomy
docker-compose build
docker-compose up -d && docker exec -it xgTaxonomy bash && docker-compose down
Give run Install Compressors for Benchmark:
bash install_compressors.sh;
To run the pipeline and obtain all the Reports in the folder reports, use the following commands.
For obtaining random sequences for baseline test performance run:
cd src/
python3 getSampleSequences.py
For baseline compression test run:
cd src/
python3 compress_baseline.py
For obtaining random sequences for taxonomic classification run:
cd src/
python3 getDatabaseSequences.py
cd src/
python3 classifier.py -b > ../results/f1score_accuracy_single.txt
cd src/
python3 classifier.py -cr > ../results/classification_reports_single.txt
cd src/
python3 classifier.py -ag -b > ../results/f1_score_accuracy_all_genome_features.txt
python3 classifier.py -ag -cr > ../results/classification_report_all_genome_features.txt
cd src/
python3 classifier.py -ap -b > ../results/f1_score_accuracy_all_proteome_features.txt
python3 classifier.py -ap -cr > ../results/classification_report_all_proteome_features.txt
cd src/
python3 classifier.py -cr -ac > ../results/classification_report_all_columns.txt
cd src/
python3 classifier.py -ac -b > ../results/f1score_accuracy_all_columns.txt
cd src/
python3 classifier.py -fs -ac -b > ../results/feature_selection.txt
cd src/
python3 classifier.py -bf -b > ../results/f1score_accuracy_all_combinations.txt
cd src/
python3 classifier.py -bf -cr > ../results/classification_report_all_combinations.txt
cd src/
python3 correlateTable.py
Please cite the following, if you use xgTaxonomy in your work:
in progress
Please let us know if there are any issues.
xgTaxonomy is released under the MIT License.