Inputted DNA sequence filters through organism genomes from Genbank files for matching sequence in a protein. Once the sequence matches a portion of the corresponding protein CDS (coding sequence) region, the function will return the organism and location the sequence was found at in the protein.
Organism genomes were obtained using GenBank files from NCBI,
i.e. complete genome for Paramecium bursaria Chlorella virus
Submitting 'CGCAGGCGCT'
will return
in protein 'YP_004678872.1'
in organism 'NC_000852.5'
at location '1370..1380'
Celery asynchronous architecture
Utilized BioPython
library and SeqIO
to parse genbank
files that store sequence data, representing nucleotide sequences.
Asynchronous search capabilities with celery and Redis as message broker and result store.
React frontend uses local storage to persist searched sequences and generated results.
Have instance of React app, server, and celery worker running
cd ./frontend
npm start
cd ./backend
pipenv shell
celery -A sequences_api worker -l info
cd ./backend
pipenv shell
pip install -r requirements.txt
python manage.py runserver
UI http://localhost:3000/
and
API http://localhost:8000/api/:DNASequence
to get results from Celery TaskResult table
http://localhost:8000/api/results/tasks