Authors: Oscar Almqvist and Eric Vickström
Kadacoda is an interactive way of conducting the tutorial in the browser, found here.
This tutorial aims to automate the evaluation of a Machine Learning repository using webhooks on GitHub. The process of evaluating the effect of particular on the training set is tedious, especially if you weren't the author. This tutorial will teach how you automate the testing process for a specific pull request assigned to a certain label, and comment results said pull request.
- Tutorial: MLOps - Automation of Model Evaluation
- Appendix A - Example payload
- A GitHub account
- An ngrok account
- A public GitHub repository containing a machine learning model. We have included our own example project which you can add to your repository
All of the code in this tutorial can be found in inside the code folder, which includes code for both the server and the small machine learning project; all written in Python 3.9.4
The generel steps to complete our goal is described below. In further sections, there will be more detailed instructions.
A server that listens on pull request events.
For every event on GitHub, you have the option to specify an HTTP endpoint where you want to retrieve data regarding the event. In our case, we want to listen to the event of setting a label on a pull request event. To listen, we need to create a webserver than listens for a PUSH event for a specific endpoint. To be able to specify which endpoint and later on access the GitHub API, we need to install our own GitHub App on the specific repository.
Evaluate the model inside the pull request.
The data in event contains the necessary information to evaluate the changes in the pull request. With this, we will clone the repository, evaluate and compare both the HEAD and the BASE of the pull request. Then we train and test the model against the test sets for each version of the code. This will be done by executing shell-commands via Python. -
Comment the results
The result from the evaluation will be sent as comment on a pull request via the GitHub API. To do this, we need to authenicate via our GitHub App.
In this tutorial we're using Python along with the Keras and Tensorflow libraries to create a simple model to classify digits from the MNIST dataset. This can of course be modified to your own setup, but for the sake of the tutorial, we have included a folder containing a model located here. This includes instructions on how to run it. For simplicity sake, we download the MNIST using keras.datasets
Essentially, the included code does the following steps:
- Downloads the dataset,
- Preprocesses the data,
- Creates the model,
- Compiles the model,
- Trains the model,
- Evaluates it.
After all of these steps are done, it saves the result to a file. In our case, it is saved as result.txt
with a JSON object containing the loss and accuarcy.
# result.txt
{'loss': 1, 'accuracy': 1}
⚠️ Remember that the results file produced from the model must match the file that the server is supposed to read from!⚠️ If you don't have a repository where you want to implement this, create a new repository and copy the content of our small demo. The
should be at the root of the repository.
Additionally, add an
label to the repository as this will be our flag for showing when to run our evaluation.
An example of how our project looks like:
This has been tested on ubuntu-20.04
with the following Python modules:
For the server:
For the machine learning project:
To install the latest versions of the dependencies, either use pip3 install <module>
for each module, or use the included requirements.txt
for the server and the machine learning project. This can be installed via pip3 install -r <requirements file>
. We recommend using some form of python environment manager, for instance, Conda.
Before telling GitHub, which endpoint we expect the PUSH event to be sent to, we need to start listen on that specific endpoint. To listen, we will make use of the webframework Flask. First, we create a webserver in a file called
that listens on an arbitrary port, lets say 1337
, and that expects a POST-request on endpoint /mlops-server
. This is the content:
from flask import Flask, request
app = Flask(__name__)
@app.route('/mlops-server', methods=['POST'])
def mlops_server_endpoint():
request_data = request.get_json()
return 'Awaiting POST'
if __name__ == '__main__':, port=1337)
By running python3
, you start the webserver. Now the server awaits a POST request at http://localhost:1337/mlops-server
. The POST request will contain JSON data in its body, which will contain all the data belonging to a pull request event. Lets get GitHub to send us these events to our endpoint.
The port 1337
is probably not open to the public (otherwise I recommend you to see over your security settings). Proper network configuration is beyond the scope of this tutorial as it is dependent on your server, so for debugging purposes, we and GitHub Docs recommend ngrok. The idea is that ngrok forwards the request to your local server and thus the idea of opening ports becomes a non-issue. The installation depends on your system, so we recommend following the instructions on ngrok download. You need to register, get and link your authenication token and forward the port 1337
via the command./ngrok http 1337
(for Linux).
In our case, the endpoint our GitHub App will send its requests to, is
Communicating with an API requires authenication and IDs specific to your project. The various keys and IDs we will collect in the coming parts should not be uploaded to any public repository. We will handle these by a creating an environment file called .env
. The data we need to store there is the app ID, the install ID, and the path to the private key.
# .env
To then access these variables, we will make use of the dotenv
import os
from dotenv import load_dotenv
INSTALL_ID = os.getenv('INSTALL_ID')
APP_ID = os.getenv('APP_ID')
We need to create a GitHub App and install in on our machine learning repository. But why? 🧐 Webhooks can be configured without an app, however commenting on our pull request requires us to communicate via GitHub. So lets create an App. First navigate to their app page.
Click New GitHub App.
We need to specify a name for our app. However, they require you to specify a website. No website? Here we could simply input the ngrok url We are using the default settings for all the options (Identifying and authorizing users, Post installations, etc.), expect for the following:
Here, we will input the URL we recieved by ngrok with the added /mlops-server
Tell GitHub, that we need access to every pull-request where we install our App.
For the webhook, we want to listen to specific request regarding pull-requests.
We do not plan to install this everywhere and only for our repo.
Click create. Voilá, we have our first App. 🥳
Save the App ID inside .env
# .env
In the future, we need to authenticate as the app through our server, so generate a private-key! This is found at the bottom of the same page.
Download it and keep it safe! Save the path to the key to .env
# .env
Scroll up, and click on Install App.
Choose that you want to install the app for one of your repositories.
Here, select your machine learning repository. Hopefully, your project name is more exciting that your-ml-project
Click on Install.
Look 👀 See the installation ID in the url? This ID is used when we want to specify which repository we want to add a comment to. Save that as well.
# .env
Wow, that was a lot of steps.😮💨 The good news are that we're done with registrering the GitHub App. Let's continue with our server!
Okay, now we have our server that tells us when someone has created a pull request and we have a model. Let's combine the two; train & evaluate the model when someone has created a pull request!
The plan is to have the server clone the repository, checkout the latest commit, install all the dependencies, evaluate the model against the base. To do this, we can run shell-commands via Python. To do this, we utilize the module subprocess
and the method run()
. First, we clone and change the folder name to an arbitrary string project_dst
, with the help of module uuid
; the reason for this is to avoid collisions with folders that already exist. Next, we checkout
a specific version of the cloned repository (with cwd
we change our current directory inside the project_dst
folder). Then, we install the dependencies of the machine learning project, and test the model expecting a result.txt
. For future pull requests, we can't use the same cloned directory, so we move the directory to the 🗑️ ️️
import subprocess
def evaluate_pull_request(commit_sha, html_url):
project_dst = uuid.uuid4().hex["git", "clone", html_url, project_dst])["git", "checkout", commit_sha], cwd=f"./{project_dst}")["pip", "install", "-r", f"{project_dst}/requirements.txt"])["python3", f"{project_dst}/"])["rm", "-rf", project_dst])
with open(f"result.txt", "r") as f:
return json.load(f)
Notice how we depend on the project layout via the paths here (
). If you use your own project, make sure you call the correct files!
If you want to supress the output from
you can add redirect thestdout
. See for how we did it!
What data from the pull request do we need? Well, we need the html_url
of the repository in order to clone it and then commit_sha
in order to checkout the latest changes. Note that we need the html_url
and commit_sha
of both the head and base as we want to compare the changes. Additionally, we need the comments_url
to have our application comment the results on the pull request. Let's create two utility functions for this:
def get_commits(data):
return data["pull_request"]["head"]["sha"], \
def get_urls(data):
return data["pull_request"]["comments_url"], \
data["pull_request"]["head"]["repo"]["html_url"], \
We also need to specify when this could be run. For instance, if we just made a pull request that updated documentation or similar, we don't need to run all of this testing as the model hasn't been changed. To solve this, we will create a label in our repository which is treated as a flag for letting our server know when to evaluate it. We will call this label evaluate. This also means that we need to create some form of validation function that asserts if it's a valid response intended for testing and comparing the model. In the case of GitHub webhooks, we want to listen of the labeled
action for a pull_request
. We also need to see if it contains our label.
def is_valid_response(data):
is_valid = False
keys = data.keys()
if 'pull_request' in keys and 'action' in keys:
if data['action'] != 'labeled':
return False
for label in data['pull_request']['labels']:
if 'evaluate' == label['name']: # 'evaluate' corresponds to said label
is_valid = True
return is_valid
Now, let's glue everything together inside mlops_server_endpoint()
from flask import Flask, request
@app.route('/mlops-server', methods=['POST'])
def mlops_server_endpoint():
response = request.get_json()
if is_valid_response(response):
sha_head, sha_base = get_commits(response)
comments_url, url_head, url_base = get_urls(response)
head_result = evaluate_pull_request(sha_head, url_head)
base_result = evaluate_pull_request(sha_base, url_base)
# TODO: send comment with results to pull request
return 'Awaiting POST'
In section 1, the communication with the repository was rather one-sided; the server could only listen to Webhook events. In order to send requests to our repository, we need to add functionality. First of all, we need to fetch an access token. The purpose of the access token is authenticate against GitHub. This is done by constructing a JSON Web Token (JWT) based on the app ID and private key from section 2.
Let's generate our JWT. The different time
fields (iat
, exp
) represent for how long this should be valid in terms of seconds (?).
import jwt
import time
def generate_jwt():
pemfile = open(PRIVATE_KEY_PATH, 'r')
key =
payload = {
"iat": int(time.time() - 60),
"exp": int(time.time() + (10 * 60)) - 10,
"iss": APP_ID
return jwt.encode(payload, key, algorithm="RS256")
Using generate_jwt()
we create a function for fetching the access token, which essentially sends a POST
request to
in order to fetch an access token for a certain GitHub app.
import json
def get_token():
headers = {
"Authorization": f"Bearer {generate_jwt()}",
"Accept": "application/vnd.github.v3+json"
r ="{GITHUB_APP_URL}/installations/{INSTALL_ID}/access_tokens", headers=headers)
return r.json()["token"]
To later post a message on a certain pull request, we use the access token to send a POST
request with a body containing our message.
def post_message_on_pull_request(comments_url, token, message):
headers = {
"Authorization": f"token {token}",
"Accept": "application/vnd.github.v3+json"
payload = {
"body": message
}, headers=headers, data=json.dumps(payload))
Finally, we can assemble everything to evaluate a model from a certain pull request and send the results! If we would like, we could also format our message to make them more readable, kind of like this:
Source | Loss | Accuracy |
Head | 1.07 | 78.0% |
Base | 1.06 | 80.0% |
Diff | 0.01 | -2.0% |
Let's create a utility function that creates a Markdown table with the data based on the loss
and accuracy
from what the pull request contains (head) and to where it's going (base).
def format_markdown_comment(head, base):
diff_loss = round(head['loss'] - base['loss'], 2)
diff_acc = round(head['accuracy'] - base['accuracy'], 2)
rows = [
f"| Source| Loss | Accuracy |",
f"| ------| ---------------| ------------------- |",
f"| Head | {head['loss']} | {head['accuracy']}% |",
f"| Base | {base['loss']} | {base['accuracy']}% |",
f"| Diff | {diff_loss} | {diff_acc}% |"
return "\n".join(rows)
Now, add these to the mlops_server_endpoint()
@app.route('/mlops-server', methods=['POST'])
def mlops_server_endpoint():
response = request.get_json()
if is_valid_response(response):
sha_head, sha_base = get_commits(response)
comments_url, url_head, url_base = get_urls(response)
head_result = evaluate_pull_request(sha_head, url_head)
base_result = evaluate_pull_request(sha_base, url_base)
message = format_markdown_comment(head_result, base_result)
token = get_token()
post_message_on_pull_request(comments_url, token, message)
return 'Awaiting POST'
Start your server python3
Now, if you branch out from your own repository and make some change (either some parameter in the model or just some text change), you can then open a pull request and add an evaluate
label. You should see a bunch of output on the server for every command that it's running, but finally you should see a comment on your pull request! This tutorial is proof-of-concept of how one could build a server to evaluate pull requests containing ML models. This can hopefully be altered to your needs and inspire to automate more parts of your development process!
If you get a
on the server, it's most likely due to the wrong URLs specified inside the app settings. To solve this, click Edit on your app. In the Webhook URL must have this url:http://[[HOST_SUBDOMAIN]]-1337-[[KATACODA_HOST]]
An example subset of what a pull request event can contain. If you can't wait and are interested in seeing a full payload from a request, you may visit the GitHub docs. You can also read more about webhooks here.
"action": "opened",
"number": 3,
"pull_request": {
"comments_url": "",
"head": {
"sha": "89243d3490e9c0djk3as8791b84bc05a42837a363a",
"repo": {
"html_url": "",
"base": {
"sha": "45as43d3490e9c0djk3as8791b84bc05a42837a363a",
"repo": {
"html_url": "",
"labels": [{
"name": "Evaluate",