This repository contains a machine learning model based on a decision tree algorithm for the validation of Trypanosoma epitopes. The model was trained and evaluated using data obtained from IEDB and UniProtKB databases.
The dataset used for model training and validation is available at:
Data was preprocessed using EpiBuilder and complementary java code available in /assets
folder. The dataset was evaluated using key algorithms with the Orange Data Mining tool, the file is also available in /assets
folder.
To use this project in Google Colab, follow these steps:
- Open the .ipynb file with Google Colab
- Install the required dependencies for running the project, use the following command:
!pip install pandas matplotlib seaborn numpy scikit-learn
- Upload the dataset file to your workspace
- Run the model using CTRL+F9
Contributions are welcome! To suggest improvements or corrections, open an issue or submit a pull request.
Bruna Caroline Russi, Renato Simões Moreira, Pablo Daniel Cuña Cabrera, & Silvio César Cazella. (2024). Trypanosoma Epitope Dataset: Valid Epitopes and Randomly Generated Peptides with Biochemical Metrics and AI-Generated Scores (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.
Russi, B. C., Moreira, R. S., Cabrera, P. D. C., & Cazella, S. C. (2024). Concepção de dataset de epítopos lineares de células B de organismos do gênero Trypanosoma para treinamento de algoritmos baseados em aprendizado de máquina. In C. S. Dias, E. T. Albergaria, & Z. S. N. Reis (Eds.), Anais do II Simpósio CI-IA Saúde da UFMG: Inteligência artificial responsável no ensino, pesquisa e práticas em saúde (pp. 70–72). Universidade Federal de Minas Gerais. http://hdl.handle.net/1843/78914