Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI/CD setup #1137

Merged
merged 42 commits into from
Jun 22, 2022
Merged

CI/CD setup #1137

merged 42 commits into from
Jun 22, 2022

Conversation

frank-wei
Copy link
Contributor

Description

This is PR is not be ready for merge yet. There are some points we need to improve together with NV.

This PR is for setting up the CI/CD. Pytorch collaborates with circle CI to provide CI/CD. I am new to circle CI so I spent some efforts to read their doc and come up with our 1st version. Since we need to use GPU machine, we actually can only choose their machine executor with choice of 3 types of card P4,T4,V100
Their available GPU images come with cuda driver and their supporting libs installed. We can not use our own docker image as it is not supported (error msg below).

Build-agent version 1.0.129077-32fb56c8 (2022-06-17T08:48:08+0000)
Creating a dedicated VM with nvcr.io/nvidia/tensorrt:22.01-py3 image
failed to create host: Image nvcr.io/nvidia/tensorrt:22.01-py3 is not supported

So based on the vanilla cuda environment, here are the steps that I set up in config.yaml (recipe to set up the CI/CD)

  1. install cudnn, tensorRT lib
  2. install python required libs and tensorRT python lib
  3. python setup.py install
  4. run the fx2trt tests ( passed )
  5. tried to run torch_tensorrt tests (failed and need help from @narendasan. Below is the command I used in ssh mode. The circle CI supports ssh model to debug on their machine)
# modules
circleci@default---us-central1-a---005145ed-e543-4a5c-bb30-119e0ba80409:~/project$ bazel test //tests/modules:test_modules --compilation_mode=dbg --test_output=errors
ERROR: Skipping '//tests/modules:test_modules': no such target '//tests/modules:test_modules': target 'test_modules' not declared in package 'tests/modules' defined by /home/circleci/project/tests/modules/BUILD
ERROR: no such target '//tests/modules:test_modules': target 'test_modules' not declared in package 'tests/modules' defined by /home/circleci/project/tests/modules/BUILD
 

# core
circleci@default---us-central1-a---005145ed-e543-4a5c-bb30-119e0ba80409:~/project$ bazel test //tests/core:core_tests  --compilation_mode=dbg --test_output=errors --jobs=1
ERROR: /home/circleci/project/tests/core/BUILD:8:10: no such target '//tests/modules:mobilenet_v2_scripted.jit.pt': target 'mobilenet_v2_scripted.jit.pt' not declared in package 'tests/modules' defined by /home/circleci/project/tests/modules/BUILD and referenced by '//tests/core:jit_models'
ERROR: Analysis of target '//tests/core:test_detecting_input_type' failed; build aborted: 
 
# py
circleci@default---us-central1-a---005145ed-e543-4a5c-bb30-119e0ba80409:~/project$ bazel test //tests/py:test_api  --compilation_mode=dbg --test_output=errors --jobs=1
ERROR: error loading package 'tests/py': Unable to find package for @py_test_deps//:requirements.bzl: The repository '@py_test_deps' could not be resolved: Repository '@py_test_deps' is not defined.

Overall, circle CI is good to use and can satisfy our needs for general CI/CD.
Pros:

  1. meets general needs
  2. speed is ok

Cons:

  1. 3 types of machines P4,T4,V100
  2. GPU images are limited
  3. trigger mode supports new PR trigger and fixed time trigger only. Not flexible for certain folder.

I want to discuss with NV folks in general, if we could deploy circle CI for our CI/CD. What else you think we can possibly improve?

@yinghai
Copy link

yinghai commented Jun 21, 2022

@narendasan For documentation changes may add [skip ci] in PR title and it should just work?

https://github.blog/changelog/2021-02-08-github-actions-skip-pull-request-and-push-workflows-with-skip-ci/

@narendasan
Copy link
Collaborator

Hmm, ok, we still need some jobs to run to build the docs but we dont need everything to run i guess.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

@frank-wei
Copy link
Contributor Author

@narendasan I addressed the comments before. If you are Ok with it, could we land it first and could you help to add the torchscript tests later? We can iterate based on this version as a beginning for our future test needs.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor thing about the name of the command other than that I am good to merge this and improve incrementally

py/setup.py Outdated

def get_git_revision_short_hash() -> str:
return subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD']).decode('ascii').strip()

if "--fx2trt-only" in sys.argv:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should call this --fx-only or --fx-backend-only

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop references to fx2trt for the most part

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly. fx-only is a much better name here.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

@frank-wei frank-wei merged commit 30cd67c into master Jun 22, 2022
@frank-wei frank-wei deleted the circleci-editor/245/circleci-project-setup branch July 27, 2022 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants