This project is designed and tailored to assist Destiny Pharma PLC in scraping recently posted scientific literature and assembling this into monthly literature reviews.
This repo is comprised of 6 scripts which provide 2 distinct functions.
-
X_main.py are web-scraping bots designed to extract relevant PubMed literature, rank the literature based on search category and importance, and send a push email if any high priority literature is identified. These scripts are automated to run daily, using GitHub actions.
-
review_X.py are scripts run monthly which assembles all of the literature gathered in the previous month (for a target bot) and creates a literature reviews. These are then emailed to the target recipients. This script is automated to run monthly, using GitHub actions.
The bots rely on their respective data folders which contain doi_db.txt, log.txt, rank.txt and queries.txt files:
- doi_db.txt: Database containing all the DOIs the bot has identified - this prevent duplicates
- log.txt: Simple log file recording when the bot was run and a breakdown of the papers per rank found
- rank.txt: Files containing strings ready for inclusion in the monthly literature review
- queries.txt: PubMed queries syntax run in order line by line
*The bot also uses the image.JPG in the main repo and attaches it to the email
The review script does not have it's own respective folder but requires _template.docx(s), start_date.txt, review_log.txt files
- template.docx: Formatted .docx file which is the template for the respective literature review that is generated
- start_date.txt: Accessed by the script to record the start date of the literature scraping
- review_log.txt: Simple log file that documents the searches run and their respective date
*The review script will automatically update the start_date and clear the rank.txt files once run
This repo is designed to be automated using GitHub actions - please see the workflows folder and ensure the actions bot has permission to commit to the repo
Although this bot network has been tailored for specific use, it can be adapted by any user using the following steps:
Please create blank log, database, and rank files and then create your own queries.txt, image.JPG, and template.docx files
- Within the html_formatting() function change the search_queries variable to be a readable string of the rank 1 queries. Also change the 'url' variable
- Alter the bot_email, project name and directory under main() function
- Change the email_password, and email_receiver variables in main()
-
Alter the bot_email, project name and directory under main() function
-
Change the email_password, and email_receiver variables in main()