Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Will the new version of the code affect the execution logic of the old version? #5465

Closed
siaimes opened this issue May 1, 2021 · 17 comments
Closed

Comments

@siaimes
Copy link
Contributor

siaimes commented May 1, 2021

Organization Name:

Short summary about the issue/question: I encountered some version problems. Currently, the openpai I deployed is v1.6.0, but on the k8s dashboard, I see it is pulling the cert-expeiration-checker image, which is a feature introduced in v1.7.0.
image

Brief what process you are following:

How to reproduce it:

OpenPAI Environment: ubuntu 16.04

  • OpenPAI version: v1.6.0
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.):
  • Others:

Anything else we need to know:

@Binyang2014
Copy link
Contributor

Please make sure your code is checked out to v1.6.0 tag. Seems you use the latest code

@siaimes
Copy link
Contributor Author

siaimes commented May 2, 2021

image

@siaimes
Copy link
Contributor Author

siaimes commented May 4, 2021

csip@csip-dev-box:~$ sudo docker run -itd         -e COLUMNS=$COLUMNS -e LINES=$LINES -e TERM=$TERM         -v /var/run/docker.sock:/var/run/docker.sock         -v ${HOME}/pai-deploy/cluster-cfg:/cluster-configuration          -v ${HOME}/pai-deploy/kube:/root/.kube         --pid=host         --privileged=true         --net=host         --name=dev-box         openpai/dev-box:v1.6.0
915741461e98630ddadb9a29c6061778f18059906c0bc484ba83a2c68249ac9c
csip@csip-dev-box:~$ sudo docker exec -it dev-box bash
root@csip-dev-box:/# ls
bin  boot  cluster-configuration  dev  etc  home  lib  lib64  media  mnt  opt  pai  proc  root  run  sbin  srv  sys  tmp  usr  var
root@csip-dev-box:/# cd pai
root@csip-dev-box:/pai# git status
HEAD detached at v1.7.0
nothing to commit, working directory clean
root@csip-dev-box:/pai# 

I started a v1.6.0 deb-box, but found that the version number was v1.7.0.

@siaimes
Copy link
Contributor Author

siaimes commented May 4, 2021

Checkout v1.6.0 manually solved this issue. This issue is also caused by the version problem.

@Binyang2014
Copy link
Contributor

@suiguoxin Please help to take a look?

@siaimes
Copy link
Contributor Author

siaimes commented May 6, 2021

The Dockerfile that builds the images does not seem to be open source, so I can not do any further analysis.

@Binyang2014
Copy link
Contributor

Code is here, https://github.com/microsoft/pai/blob/master/src/dev-box/build/dev-box.common.dockerfile. Welcome for further analysis

@siaimes
Copy link
Contributor Author

siaimes commented May 6, 2021

Code is here, https://github.com/microsoft/pai/blob/master/src/dev-box/build/dev-box.common.dockerfile. Welcome for further analysis

From the start script of dev-box, I do some checks as follows.

csip@csip-dev-box:~$ sudo docker run -itd \
>         -e COLUMNS=$COLUMNS -e LINES=$LINES -e TERM=$TERM \
>         -v /var/run/docker.sock:/var/run/docker.sock \
>         -v ${HOME}/pai-deploy/cluster-cfg:/cluster-configuration  \
>         -v ${HOME}/pai-deploy/kube:/root/.kube \
>         --pid=host \
>         --privileged=true \
>         --net=host \
>         --name=dev-box_1 \
>         openpai/dev-box:v1.6.0
bc633ff4766ad05cdce4bd658ba1b15ddc4988ae143b6de6ac908a6f67674852
csip@csip-dev-box:~$ sudo docker logs dev-box_1
Checkout to latest release tag
csip@csip-dev-box:~$ sudo docker exec -it dev-box_1 bash
root@csip-dev-box:/# echo $OPENPAI_BRANCH_NAME

root@csip-dev-box:/# cd pai
root@csip-dev-box:/pai# git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

So, the env OPENPAI_BRANCH_NAME do not set up correctly.

@suiguoxin
Copy link
Member

@siaimes To install OpenPAI version v1.6, please:

  • checkout local pai repo to pai-1.6.y branch
  • set docker_image_tag: v1.6.0 in config.yaml

For detailed doc, pls refer to here

We do download the pai repo and reset the code to the latest tag in dev-box, but we won't use it.
Instead, we'll mount the local pai repo to the dev-box and use this copy.

We made this change in v1.6 to make sure that only the local source code matters so that the admin user don't need to specify OPENPAI_BRANCH_NAME anymore.

I'll close this issue, feel free to re-open it for further questions.

@siaimes
Copy link
Contributor Author

siaimes commented May 8, 2021

@siaimes To install OpenPAI version v1.6, please:

  • checkout local pai repo to pai-1.6.y branch
  • set docker_image_tag: v1.6.0 in config.yaml

For detailed doc, pls refer to here

We do download the pai repo and reset the code to the latest tag in dev-box, but we won't use it.
Instead, we'll mount the local pai repo to the dev-box and use this copy.

We made this change in v1.6 to make sure that only the local source code matters so that the admin user don't need to specify OPENPAI_BRANCH_NAME anymore.

I'll close this issue, feel free to re-open it for further questions.

sudo docker run -itd \
        -e COLUMNS=$COLUMNS -e LINES=$LINES -e TERM=$TERM \
        -v /var/run/docker.sock:/var/run/docker.sock \
        --pid=host \
        --privileged=true \
        --net=host \
        --name=dev-box \
        openpai/dev-box:<openpai version tag>

So, the start script of dev-box should need to be updated, right?

@siaimes
Copy link
Contributor Author

siaimes commented May 8, 2021

@siaimes To install OpenPAI version v1.6, please:

  • checkout local pai repo to pai-1.6.y branch
  • set docker_image_tag: v1.6.0 in config.yaml

For detailed doc, pls refer to here
We do download the pai repo and reset the code to the latest tag in dev-box, but we won't use it.
Instead, we'll mount the local pai repo to the dev-box and use this copy.
We made this change in v1.6 to make sure that only the local source code matters so that the admin user don't need to specify OPENPAI_BRANCH_NAME anymore.
I'll close this issue, feel free to re-open it for further questions.

sudo docker run -itd \
        -e COLUMNS=$COLUMNS -e LINES=$LINES -e TERM=$TERM \
        -v /var/run/docker.sock:/var/run/docker.sock \
        --pid=host \
        --privileged=true \
        --net=host \
        --name=dev-box \
        openpai/dev-box:<openpai version tag>

So, the start script of dev-box should need to be updated, right?

And this statement:
image

@suiguoxin
Copy link
Member

@siaimes Do you mean basic management operations ? Users are free to mount anything they want. The existing pai repo should also work.

@siaimes
Copy link
Contributor Author

siaimes commented May 8, 2021

@siaimes Do you mean basic management operations ? Users are free to mount anything they want. The existing pai repo should also work.

According to the latest version of the source code, this doc seems to need to be updated. For me, I just knew how to do it: mount the local pai to /mnt/pai, and change the relevant commands in the dev-box to /mnt/pai/paictl.py.

Thank you for your reply.

@Binyang2014
Copy link
Contributor

Reopen it, seems we need to update doc

@siaimes
Copy link
Contributor Author

siaimes commented Jun 7, 2021

4675

I found that this issue was introduced in v1.0.0, but it did not get the attention of developers at that time.

@siaimes
Copy link
Contributor Author

siaimes commented Aug 10, 2021

PR #5567 conflicts with PR #5240, I will close it, a better solution needs further discussion.

@hlyf-xs
Copy link

hlyf-xs commented Sep 21, 2021

Please make sure your code is checked out to v1.6.0 tag. Seems you use the latest code

csip@csip-dev-box:~$ sudo docker run -itd         -e COLUMNS=$COLUMNS -e LINES=$LINES -e TERM=$TERM         -v /var/run/docker.sock:/var/run/docker.sock         -v ${HOME}/pai-deploy/cluster-cfg:/cluster-configuration          -v ${HOME}/pai-deploy/kube:/root/.kube         --pid=host         --privileged=true         --net=host         --name=dev-box         openpai/dev-box:v1.6.0
915741461e98630ddadb9a29c6061778f18059906c0bc484ba83a2c68249ac9c
csip@csip-dev-box:~$ sudo docker exec -it dev-box bash
root@csip-dev-box:/# ls
bin  boot  cluster-configuration  dev  etc  home  lib  lib64  media  mnt  opt  pai  proc  root  run  sbin  srv  sys  tmp  usr  var
root@csip-dev-box:/# cd pai
root@csip-dev-box:/pai# git status
HEAD detached at v1.7.0
nothing to commit, working directory clean
root@csip-dev-box:/pai# 

I started a v1.6.0 deb-box, but found that the version number was v1.7.0.

csip@csip-dev-box:~$ sudo docker run -itd         -e COLUMNS=$COLUMNS -e LINES=$LINES -e TERM=$TERM         -v /var/run/docker.sock:/var/run/docker.sock         -v ${HOME}/pai-deploy/cluster-cfg:/cluster-configuration          -v ${HOME}/pai-deploy/kube:/root/.kube         --pid=host         --privileged=true         --net=host         --name=dev-box         openpai/dev-box:v1.6.0
915741461e98630ddadb9a29c6061778f18059906c0bc484ba83a2c68249ac9c
csip@csip-dev-box:~$ sudo docker exec -it dev-box bash
root@csip-dev-box:/# ls
bin  boot  cluster-configuration  dev  etc  home  lib  lib64  media  mnt  opt  pai  proc  root  run  sbin  srv  sys  tmp  usr  var
root@csip-dev-box:/# cd pai
root@csip-dev-box:/pai# git status
HEAD detached at v1.7.0
nothing to commit, working directory clean
root@csip-dev-box:/pai# 

I started a v1.6.0 deb-box, but found that the version number was v1.7.0.

I also want to deploy the dev-box in a docker ubuntu container, but I can't install docker sucessful, It shows "docker: Cannot connect to Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" And Do you have some suggestions?

@siaimes siaimes closed this as completed Jun 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants