Enhance your knowledge in medical research.
AItrika (formerly PubGPT) is a tool that can extract lots of relevant informations inside medical papers in an easy way:
- Abstract
- Full text (when available)
- Genes
- Diseases
- Mutations
- Associations between genes and diseases
- MeSH terms
- Other terms
- Results
- Bibliography
And so on!
You can try AItrika with the Streamlit app by running:
streamlit run app.py
Or you can use it a script by running:
python main.py
To install everything, you need uv
.
First of all, install uv
with the command:
python main.py
After that, create a virtual environment with the command:
uv venv venv_name
Activate the virtual env:
source venv_name/bin/activate
And install dependencies:
uv pip install -r requirements.in
In order to set API keys, insert your keys into the env.example
file and rename it to .env
.
You can easily get informations of a paper by passing a PubMed ID:
from aitrika.engine.aitrika import OnlineAItrika
aitrika_engine = OnlineAItrika(pubmed_id=pubmed_id)
title = aitrika_engine.get_title()
print(title)
Or you can parse a local pdf:
from aitrika.engine.aitrika import LocalAItrika
aitrika_engine = LocalAItrika(pdf_path = pdf_path)
title = aitrika_engine.get_title()
print(title)
Breast cancer genes: beyond BRCA1 and BRCA2.
You can get other informations, like the associations between genes and diseases:
associations = aitrika_engine.get_associations()
[
{
"gene": "BRIP1",
"disease": "Breast Neoplasms"
},
{
"gene": "PTEN",
"disease": "Breast Neoplasms"
},
{
"gene": "CHEK2",
"disease": "Breast Neoplasms"
},
]
...
Or you can get a nice formatted DataFrame:
associations = aitrika_engine.associations(dataframe = True)
gene disease
0 BRIP1 Breast Neoplasms
1 PTEN Breast Neoplasms
2 CHEK2 Breast Neoplasms
...
With the power of RAG, you can query your document:
## Prepare the documents
documents = generate_documents(content=abstract)
## Set the LLM
llm = GroqLLM(documents=documents, api_key=os.getenv("GROQ_API_KEY"))
## Query your document
query = "Is BRCA1 associated with breast cancer?"
print(llm.query(query=query))
The provided text suggests that BRCA1 is associated with breast cancer, as it is listed among the high-penetrance genes identified in family linkage studies as responsible for inherited syndromes of breast cancer.
Or you can extract other informations:
results = engine.extract_results(llm=llm)
print(results)
** RESULTS **
- High-penetrance genes - BRCA1, BRCA2, PTEN, TP53 - responsible for inherited syndromes
- Moderate-penetrance genes - CHEK2, ATM, BRIP1, PALB2, RAD51C - associated with moderate BC risk
- Low-penetrance alleles - common alleles - associated with slightly increased or decreased risk of BC
- Current clinical practice - high-penetrance genes - widely used
- Future prospect - all familial breast cancer genes - to be included in genetic test
- Research need - clinical management - of moderate and low-risk variants
To run the AItrika API, follow these steps:
-
Ensure you have set up your environment and installed all dependencies as described in the Installation section.
-
Run the API server using the following command:
python api.py
The API will start running on http://0.0.0.0:8000. You can now make requests to the various endpoints:
- /associations: Get associations from a PubMed article
- /abstract: Get abstract of a PubMed article
- /query: Query a PubMed article
- /results: Get results from a PubMed article
- /participants: Get number of participants from a PubMed article
- /outcomes: Get outcomes from a PubMed article
You can use tools like curl, Postman, or any HTTP client to interact with the API. For example:
curl -X POST "http://localhost:8000/abstract" -H "Content-Type: application/json" -d '{"pubmed_id": 12345678}'
The API documentation is automatically generated and saved to docs/api-reference/openapi.json
.
You can use this file with tools like Swagger UI for a more interactive API exploration experience.
If you find this project useful, please consider supporting it:
- 🌟 Star the project on GitHub
- 🐛 Report bugs or suggest new features
- 🤝 Contribute with pull requests
- ☕️ Buy me a coffee or consider a sponsor.
If you're using this project in a business or commercial context, please contact me.
I'm available for consulting, custom development, or commercial licensing.
Your support helps keep this project active and continuously improving. Thank you!
AItrika is licensed under the Apache 2.0 License. See the LICENSE file for more details.