Skip to content
This repository has been archived by the owner on Dec 9, 2022. It is now read-only.

Setup new staging and prod environments #57

Closed
jlewi opened this issue Jan 5, 2020 · 10 comments
Closed

Setup new staging and prod environments #57

jlewi opened this issue Jan 5, 2020 · 10 comments
Labels
enhancement New feature or request

Comments

@jlewi
Copy link
Collaborator

jlewi commented Jan 5, 2020

Opening this issue to document the setup of new staging and prod environments.

We'd like to roll out some changes to the label bot frontend (see kubeflow/code-intelligence#90).
Trying to figure out how to roll things safely is revealing some areas for improvement in the way
our staging and prod clusters are setup.

It looks like the prod instance is currently running in

  • project: github-probots
  • cluster: kf-ci-ml
  • namespace: mlapp

It looks like there are two separate ingresses in this namespace

kubectl get ingress 
NAME        HOSTS               ADDRESS         PORTS     AGE
ml-gh-app   predict.mlbot.net   35.190.23.225   80, 443   264d
mlbot-net   mlbot.net           34.95.77.230    80        264d
  • The predict.mlbot.net endpoint is handling the webhook for the issue-label-bot GitHub App
  • mlbot.net is handling the web app
  • They are both pointing at the same flask app/ K8s service
    • Looks like there are two ingresses just to allow provisioning two SSL certificates corresponding to two domains

Here's my plan

  • Create namespace label-bot-dev with a dev instance of the label bot
  • Configure the domain label-bot-dev.mlbot.net to use this server
  • Configure namespace label-bot-prod with a prod instance
  • Configure the domain label-bot-prod.mlbot.net to use this server
  • Update the label bot webhook to use label-bot-prod.mlbot.net
  • Update mlbot.net to point to the service in label-bot-prod namespace
@issue-label-bot issue-label-bot bot added the enhancement New feature or request label Jan 5, 2020
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label enhancement to this issue, with a confidence of 0.78. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

jlewi pushed a commit to jlewi/Issue-Label-Bot that referenced this issue Jan 5, 2020
  * create_secrets.py creates secrets needed for dev instance

* Created a kustomize package for deploying the app.

* Need to add in the ingress resources

Related to: machine-learning-apps#57 setup a dev instance
@jlewi
Copy link
Collaborator Author

jlewi commented Jan 6, 2020

Created static IP resources in github-probots named

  • label-bot-dev
  • label-bot-prod

I created CNAME records corresponding to those addresses as well.

@jlewi
Copy link
Collaborator Author

jlewi commented Jan 6, 2020

https://label-bot-dev.mlbot.net/ is now up.

@jlewi
Copy link
Collaborator Author

jlewi commented Jan 6, 2020

For the dev instance the webhooks are failing with 405's method not allowed.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>405 Method Not Allowed</title>
<h1>Method Not Allowed</h1>
<p>The method is not allowed for the requested URL.</p>

Maybe I have the wrong URL for the webhook; maybe there's a missing path?

jlewi pushed a commit to jlewi/Issue-Label-Bot that referenced this issue Jan 17, 2020
* machine-learning-apps#57 is tracking setting up new staging and prod environments

  * This PR sets up a new staging (or dev environment)
  * We create a kustomize manifest for deploying the front end into that
    namespace
  * The staging environment is configured to use the dev instance of the
    issue label bot backend microservice (i.e the pubsub workers)
  * I created some python scripts to make it easier to setup the secrets.
  * The motivation for doing this was to test the changes to the front end

* Front end now forwards all issues for the kubeflow org to the backend

  * This is needed because we want to use multiple models for all Kubeflow
    repos kubeflow/code-intelligence#70

  * The backend should also be configured with logging to measure the impact
    of the predictions.

kubeflow/code-intelligence#104 is an a test issue showing that the bot is
working.

* Fix how keys are handled

  * For GOOGLE_APPLICATION_CREDENTIALS; depend on that environment variable
    being set and pointing to the file containing the private key;
    don't get the private key from an environment variable and then write it
    to a file.

* For the GitHub App private key; use an environment variable to point to
  the file containing the PEM key.

* Create a script to create the secrets.

* Flask app is running in dev namespace

  * create_secrets.py creates secrets needed for dev instance
@jlewi
Copy link
Collaborator Author

jlewi commented Jan 18, 2020

Everying this is deployed in prod. Looks like issues were labeled correctly when I manually sent a webhook

Updating the app

Old url
http://predict.mlbot.net/event_handler

New URL
https://label-bot-prod.mlbot.net/event_handler

@jlewi
Copy link
Collaborator Author

jlewi commented Jan 18, 2020

Delivery of webhooks is returning 502s but I don't see any errors in my weblogs.

@jlewi
Copy link
Collaborator Author

jlewi commented Jan 18, 2020

Success issue was labeled with the new bot see
kubeflow/code-intelligence#108

@jlewi
Copy link
Collaborator Author

jlewi commented Jan 18, 2020

We need to tear down the old namespace mlapp

@jlewi
Copy link
Collaborator Author

jlewi commented May 3, 2020

Going to delete the old mlapp namespace. Here's a list of the things currently running in that namespace.

NAME                    TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
service/ml-github-app   NodePort   10.7.248.83   <none>        3000:31195/TCP   392d

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ml-github-app   3/9     9            3           392d

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/ml-github-app-549bd9d59c   0         0         0       253d
replicaset.apps/ml-github-app-59456b86f9   0         0         0       380d
replicaset.apps/ml-github-app-67fb558bd7   0         0         0       240d
replicaset.apps/ml-github-app-68c85996c7   0         0         0       358d
replicaset.apps/ml-github-app-6b8458fdf8   9         9         3       236d
replicaset.apps/ml-github-app-74d65cb87f   0         0         0       240d
replicaset.apps/ml-github-app-75994457bb   0         0         0       382d
replicaset.apps/ml-github-app-78cbfb4cb9   0         0         0       383d
replicaset.apps/ml-github-app-dc6cc8c7b    0         0         0       358d
replicaset.apps/ml-github-app-ff776479d    0         0         0       253d
replicaset.apps/ml-github-app-ffb596777    0         0         0       236d

@jlewi
Copy link
Collaborator Author

jlewi commented May 3, 2020

namespace deleted.

@jlewi jlewi closed this as completed May 3, 2020
jlewi pushed a commit to jlewi/code-intelligence that referenced this issue May 3, 2020
* As described in kubeflow#133 as people comment on
  an issue; label bot should take these additional comments into
  account when predicting labels.

  * Hopefully these additional comments will lead to better predictions
    as they will contain valuable information.

To support this:

* get_issue should get all comments (not just the body)

* We also need to get any labels that have been explicitly removed as well as
  any labels already on the issue.
  We need this because we want to take into account multiple comments
  and not just the first one when predicting labels.

  * Since we are going to add additional labels based on additional comments
    we want to be sure not to add back labels which were explicitly removed.

* issue_label_predictor should filter out labels which have already
  been applied or any labels which have been explicitly removed.

  This is necessary to ensure we don't spam the issue when we allow
  the bot to comment not just in response to the first comment but
  additional comments.

* Likewise, we only want to apply the comment about not being able to
  label an issue once. So we need to check if the label bot has
  already commented on the issue.

* Update the readme to account for the new staging and prod environments
  for the front end as described in machine-learning-apps/Issue-Label-Bot#57
k8s-ci-robot pushed a commit to kubeflow/code-intelligence that referenced this issue May 4, 2020
* LabelBot should take into account all comments on an issue.

* As described in #133 as people comment on
  an issue; label bot should take these additional comments into
  account when predicting labels.

  * Hopefully these additional comments will lead to better predictions
    as they will contain valuable information.

To support this:

* get_issue should get all comments (not just the body)

* We also need to get any labels that have been explicitly removed as well as
  any labels already on the issue.
  We need this because we want to take into account multiple comments
  and not just the first one when predicting labels.

  * Since we are going to add additional labels based on additional comments
    we want to be sure not to add back labels which were explicitly removed.

* issue_label_predictor should filter out labels which have already
  been applied or any labels which have been explicitly removed.

  This is necessary to ensure we don't spam the issue when we allow
  the bot to comment not just in response to the first comment but
  additional comments.

* Likewise, we only want to apply the comment about not being able to
  label an issue once. So we need to check if the label bot has
  already commented on the issue.

* Update the readme to account for the new staging and prod environments
  for the front end as described in machine-learning-apps/Issue-Label-Bot#57

* Fix log messages.

* Update prod to use a newly built image.
@jlewi jlewi mentioned this issue May 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant