Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AZP/AWS: EFA tests #10494

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Alexey-Rivkin
Copy link
Contributor

@Alexey-Rivkin Alexey-Rivkin commented Feb 16, 2025

What?

Add EFA tests in AWS.

Why?

Run tests on the real EFA environment.

How?

Use AWS batch to start a predefined AWS job.

Note

The scope of tests to run in AWS is TBD. Until then, the job will issue a warning instead of failing.

@Alexey-Rivkin Alexey-Rivkin added the WIP-DNM Work in progress / Do not review label Feb 16, 2025
@Alexey-Rivkin Alexey-Rivkin force-pushed the topic/efa_aws_tests branch 29 times, most recently from 12da037 to 5a9eac6 Compare February 19, 2025 17:29
@Alexey-Rivkin Alexey-Rivkin force-pushed the topic/efa_aws_tests branch 19 times, most recently from e3159d3 to 9fa1f06 Compare February 26, 2025 11:45
@Alexey-Rivkin Alexey-Rivkin removed the WIP-DNM Work in progress / Do not review label Feb 26, 2025
@MrBr-github
Copy link

test

RUN_TESTS: yes
TEST_PERF: 0
VALGRIND_CHECK: no
CMD: "yum groupinstall -y 'Development Tools' && yum install -y fuse-devel iproute hostname strace git wget environment-modules autoconf libtool python3 python3-pip pkg-config libnl3-devel curl valgrind valgrind-devel rdma-core-devel libibverbs libibverbs-utils librdmacm librdmacm-utils && git clone https://github.com/openucx/ucx.git && cd ucx && git fetch origin pull/${PR_ID}/head:pr && git checkout pr && ./contrib/test_jenkins.sh"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you always installing these tools? Why not create single image with all of preinstalled prerequisites?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We evaluated creating a prebuilt image but opted against it.
Installation takes under 20 seconds, and maintaining an additional image adds overhead in management and documentation.

RUN_TESTS: yes
TEST_PERF: 0
VALGRIND_CHECK: no
CMD: "yum groupinstall -y 'Development Tools' && yum install -y fuse-devel iproute hostname strace git wget environment-modules autoconf libtool python3 python3-pip pkg-config libnl3-devel curl valgrind valgrind-devel rdma-core-devel libibverbs libibverbs-utils librdmacm librdmacm-utils && git clone https://github.com/openucx/ucx.git && cd ucx && git fetch origin pull/${PR_ID}/head:pr && git checkout pr && ./contrib/test_jenkins.sh"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git clone https://github.com/openucx/ucx.git
why not shallow copy? this will speed up the checkout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks!
AZP/AWS: Shallow clone

clean: true
- bash: |
set -x
aws eks update-kubeconfig --name ucx-ci

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need it here? The conf should be already in place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate job spins another container, so the config is empty.

RUN_TESTS: yes
TEST_PERF: 0
VALGRIND_CHECK: no
CMD: "yum groupinstall -y 'Development Tools' && yum install -y fuse-devel iproute hostname strace git wget environment-modules autoconf libtool python3 python3-pip pkg-config libnl3-devel curl valgrind valgrind-devel rdma-core-devel libibverbs libibverbs-utils librdmacm librdmacm-utils && git clone https://github.com/openucx/ucx.git && cd ucx && git fetch origin pull/${PR_ID}/head:pr && git checkout pr && ./contrib/test_jenkins.sh"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's worth moving the contents of CMD command to dedicated script
It'll be much more readable and if there will be changes one will be able to see diffs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any script would be part of the codebase, but the container starts as a blank slate.
It can only run code after installing packages and cloning the repo - which is most of this CMD.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO moving CMD in to efa_aws_test.sh make more sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alexey-Rivkin Alexey-Rivkin added the WIP-DNM Work in progress / Do not review label Mar 5, 2025
@Alexey-Rivkin Alexey-Rivkin added Ready for Review and removed WIP-DNM Work in progress / Do not review labels Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants