Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/talk2scholars docker update #118

Conversation

ansh-info
Copy link
Contributor

@ansh-info ansh-info commented Feb 28, 2025

For authors

image
Both Build and tested locally

Description

This pull request introduces Docker support for the talk2scholars module, aligning it with the existing talk2biomodels setup. The changes include:

  1. Added a Dockerfile for talk2scholars
    • Uses python:3.12-slim as the base image
    • Installs required dependencies, including g++ and build-essential for compiling pcst_fast
    • Exposes port 8501 for Streamlit
    • Configures environment variables to be passed at runtime for security
    • Preventing API key exposure by passing them at runtime instead of storing them in the Dockerfile

Additional features on request of @gurdeep330 -> Please check the conversation thread below

  1. Enhanced GitHub Actions (ci.yml) for improved CI/CD
    • Added trigger to automatically run after the RELEASE workflow completes

      • This ensures Docker images are built whenever a new version is released
      • Maintains backward compatibility with existing triggers (push, PR, manual)
    • Implemented module-specific build jobs with path filtering

      • Separate build jobs for talk2biomodels and talk2scholars
      • Only builds Docker images for modules that have been modified
      • Uses dorny/paths-filter to detect changes in specific directories
    • Added automatic versioning for Docker images

      • Uses semantic versioning from git tags for release builds
      • Adds both version tags and latest tags to all images
      • Includes git commit SHA in image tags for traceability

  1. Updated README.md
    • Documented setup instructions for both talk2biomodels and talk2scholars
    • Running containers persistent and detached(-d) mode
    • Provided example commands for pulling and running the images

Context and Motivation

The goal of this update is to streamline the deployment process for both talk2biomodels and talk2scholars, ensuring a consistent and scalable setup. By integrating talk2scholars into the CI/CD pipeline, the module can now be built and deployed automatically alongside talk2biomodels.

The enhanced CI workflow improves efficiency by only building Docker images for modules that have changed and automatically handling versioning, reducing maintenance overhead and ensuring consistency across releases.

Dependencies

  • Environment variables (OPENAI_API_KEY, ZOTERO_API_KEY, ZOTERO_USER_ID, NVIDIA_API_KEY) must be set when running the container.

Fixes # (issue) Mention the issue number.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests you conducted to verify your changes. These may involve creating new test scripts or updating existing ones.

  • Added new test(s) in the tests folder
  • Added new function(s) to an existing test(s) (e.g.: tests/testX.py)
  • No new tests added (Please explain the rationale in this case)

Checklist

  • My code follows the style guidelines mentioned in the Code/DevOps guides
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (e.g. MkDocs)
  • My changes generate no new warnings
  • I have added or updated tests (in the tests folder) that prove my fix is effective or that my feature works
  • New and existing tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

For reviewers

Checklist pre-approval

  • Is there enough documentation?
  • If a new feature has been added, or a bug fixed, has a test been added to confirm good behavior?
  • Does the test(s) successfully test edge/corner cases?
  • Does the PR pass the tests? (if the repository has continuous integration)

Checklist post-approval

  • Does this PR merge develop into main? If so, please make sure to add a prefix (feat/fix/chore) and/or a suffix BREAKING CHANGE (if it's a major release) to your commit message.
  • Does this PR close an issue? If so, please make sure to descriptively close this issue when the PR is merged.

Checklist post-merge

  • When you approve of the PR, merge and close it (Read this article to know about different merge methods on GitHub)
  • Did this PR merge develop into main and is it suppose to run an automated release workflow (if applicable)? If so, please make sure to check under the "Actions" tab to see if the workflow has been initiated, and return later to verify that it has completed successfully.

Copy link
Member

@gurdeep330 gurdeep330 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @ansh-info for setting this up. Looks very good 👍

I don't have any comments but I have a request in the workflow file. Let me know if it (or a subset) would be possible in this PR? Thx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to:

  1. Have this workflow automatically run after the RELEASE workflow was successful.
  2. Separate it into different jobs based on the agent, and invoke the jobs corresponding to the changes made. For example, if the changes were made to T2S, then only the docker job of T2S is invoked.
  3. Automatically increment the version (and not use v1 always) :-)

Copy link
Contributor Author

@ansh-info ansh-info Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gurdeep330, Thank you for the review -> Yes we can add these changes, but I want your review first
Please review the below, also regarding the last point we can use GitHub tags, please let me know if this solution will work about tags(will be fetched from the GitHub environment variable)

1. Trigger This Workflow After the RELEASE Workflow Completes

Solution: Modify the workflow trigger to run after release.yml completes successfully using workflow_run:

Update ci.yml:

on:
  workflow_run:
    workflows: ["RELEASE"]
    types:
      - completed
  workflow_dispatch:

2. Separate Jobs by Agent and Run Only for Relevant Changes

Solution:
Modify the workflow to conditionally run jobs based on changes in talk2biomodels or talk2scholars.

Update ci.yml:

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      build_t2b: ${{ steps.filter.outputs.talk2biomodels }}
      build_t2s: ${{ steps.filter.outputs.talk2scholars }}
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v4

      - name: Detect Changes
        id: filter
        uses: dorny/paths-filter@v2
        with:
          filters: |
            talk2biomodels:
              - 'aiagents4pharma/talk2biomodels/**'
            talk2scholars:
              - 'aiagents4pharma/talk2scholars/**'

  docker-build-t2b:
    needs: detect-changes
    if: needs.detect-changes.outputs.build_t2b == 'true'
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v4

      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ vars.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

      - name: Build and Push Talk2Biomodels
        uses: docker/build-push-action@v6
        with:
          file: aiagents4pharma/talk2biomodels/Dockerfile
          push: true
          tags: virtualpatientengine/talk2biomodels:latest

  docker-build-t2s:
    needs: detect-changes
    if: needs.detect-changes.outputs.build_t2s == 'true'
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v4

      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ vars.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

      - name: Build and Push Talk2Scholars
        uses: docker/build-push-action@v6
        with:
          file: aiagents4pharma/talk2scholars/Dockerfile
          push: true
          tags: virtualpatientengine/talk2scholars:latest

3. Automatically Increment the Version Instead of Always Using v1

Solution:
Modify the workflow to auto-increment the version using the latest GitHub tag.

Update the Build and Push steps in both jobs:

      - name: Get Version
        id: version
        run: echo "VERSION=$(git describe --tags --abbrev=0)" >> $GITHUB_ENV

      - name: Build and Push
        uses: docker/build-push-action@v6
        with:
          file: aiagents4pharma/talk2biomodels/Dockerfile
          push: true
          tags: |
            virtualpatientengine/talk2biomodels:${{ env.VERSION }}
            virtualpatientengine/talk2biomodels:latest
  • Uses the latest Git tag as the version number instead of hardcoding v1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks fine to me. But could you double-check these tests for different conditions by running them in your forked repo? Thx

@ansh-info
Copy link
Contributor Author

ansh-info commented Mar 1, 2025

HI @gurdeep330 please check the screenshot, The first point you mentioned is completed
"1. Have this workflow automatically run after the RELEASE workflow was successful."
image

Let me explain what's happening:

  • The workflow is correctly executing and successfully building the Docker image
  • It correctly determined the version tag as "v1.10.2" the repository(for my fork)
  • It's correctly attempting to push to Docker Hub using the account "virtualpatientengine"

The error "denied: requested access to the resource is denied" means that my GitHub Actions workflow doesn't have permission to push to that Docker Hub repository. This is expected in a fork scenario

@ansh-info
Copy link
Contributor Author

ansh-info commented Mar 1, 2025

@gurdeep330 Please check the screenshot, I am able to achive the second point you mentioned as well:
"2. Separate it into different jobs based on the agent, and invoke the jobs corresponding to the changes made. For example, if the changes were made to T2S, then only the docker job of T2S is invoked."

image

Previously, with the matrix strategy, when the first job failed (talk2biomodels), the second job (talk2scholars) was automatically canceled. This was because the matrix jobs were part of a single logical job.
Now, with our changes:

  • Both modules are running as separate, independent jobs
  • Both jobs ran to completion (even though both failed at the same point)
  • Each job has its own build summary and record
  • Both jobs failed with the same error message about Docker Hub access permissions

This confirms that:

  • The path filtering for detecting changes is working (both modules were detected as changed)
  • The separation of jobs is working correctly (they run independently)
  • The versioning is being properly applied to both jobs

The only error happening now is still the Docker Hub permission issue, which is expected in my fork scenario and doesn't indicate any problem with the workflow structure itself.

@ansh-info
Copy link
Contributor Author

@gurdeep330 Your 3rd point is also completed, Please check the attached screenshots
"3. Automatically increment the version (and not use v1 always) :-)"

We have a version check workflow added now
image

Checks for the latest tag from RELEASE workflow

image
Build summary -> both got build
image

Implementation of step 3: automatically incrementing the version instead of using a static version. Since our project is already using semantic-release in the RELEASE workflow, we leveraged that work and ensure the Docker images use the correctly incremented versions.

I've implemented step 3, adding automatic version incrementing with a sophisticated approach:

Key Improvements:

  1. Centralized Version Generation:

    • Added a new version job that computes version information once
    • Other jobs depend on this job and use its outputs
    • This ensures consistent versioning across all Docker images
  2. Smart Version Types:

    • For releases triggered by the RELEASE workflow, uses the proper semantic version from git tags
    • For development builds (PRs, manual runs), creates a development version like v1.2.3-dev.5 where 5 is the number of commits since the last tag
    • Also generates a short Git SHA for an additional tag option
  3. Multiple Docker Tags:

    • Each image now gets tagged with:
      • The specific version number (v1.2.3 or v1.2.3-dev.5)
      • The Git short SHA for easy traceability (a1b2c3d)
      • The latest tag for convenience

This approach gives you several benefits:

  • Images built from release workflows get clean semantic version tags
  • Images built from PRs or development builds get clearly marked development version tags
  • Using the Git SHA as a tag makes it easy to trace builds back to specific commits
  • The version calculation is centralized, ensuring consistent versioning across modules

This implementation completes the third and final step requested in the PR review. The workflow now:

  1. Runs after the RELEASE workflow
  2. Separates jobs based on which modules have changed
  3. Automatically increments version numbers appropriately

Let me know if you'd like any further refinements to this implementation!

@ansh-info ansh-info requested a review from gurdeep330 March 1, 2025 13:09
@ansh-info
Copy link
Contributor Author

ansh-info commented Mar 1, 2025

Hello @gurdeep330, Just to make sure, I change the username to my GitHub credentials on my test fork/test brach, and everything ran successfully. Please check the screenshots bellow(-> This was just to test I have removed the images from my docker hub😊):

Github Actions Overview

image

Docker Hub preview

image image

@gurdeep330
Copy link
Member

Thanks very much @ansh-info This is very nicely done 💯 I'll run a few tests after the merge is complete. Great work 🥇

You just need an approval from @dmccloskey :-)

Copy link
Member

@dmccloskey dmccloskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work with the CI and DockerHub build and push 👍.

@dmccloskey dmccloskey merged commit 9f662ed into VirtualPatientEngine:main Mar 2, 2025
8 of 16 checks passed
Copy link
Contributor

github-actions bot commented Mar 2, 2025

🎉 This PR is included in version 1.24.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@ansh-info ansh-info deleted the feat/talk2scholars-docker-update branch March 2, 2025 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants