Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible request for testing AWX to deploy to macos #1910

Closed
sxa opened this issue Feb 8, 2021 · 58 comments
Closed

Ansible request for testing AWX to deploy to macos #1910

sxa opened this issue Feb 8, 2021 · 58 comments

Comments

@sxa
Copy link
Member

sxa commented Feb 8, 2021

Our AWX server does not currently have a template for deploying to macos systems. We should add that and ensure that it is "safe" to deploy across all machines

@sxa sxa added the ansible label Feb 8, 2021
@Haroon-Khel
Copy link
Contributor

We dont have a separate playbook for macos. We run the UNIX playbook, which contains tasks that only run if it detects Macos. Would a seperate template be redundant?

@sxa
Copy link
Member Author

sxa commented Feb 9, 2021

As long as it works probably not if the hosts line in the top of the top level main.yml is ok without any tweaks.

EDIT: A quick glance at the line suggests it'll be ok so I've changed the title to indicate we should test it ;-)

@sxa sxa changed the title Ansible request for AWX to deploy to macos Ansible request for testing AWX to deploy to macos Feb 9, 2021
@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Mar 10, 2021

@sxa
test-macincloud-macos1010-x64-1 and test-macincloud-macos1010-x64-2 are unreachable

"msg": "Failed to connect to the host via ssh: Connection closed by 74.80.250.173 port 22",

Not sure if this is relevant, but ansible connects as Administrator

ansible_ssh_user: Administrator

But in inventory.yml, the user is given as admin

EDIT: Ive tested it with both usernames. Still unreachable

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Mar 10, 2021

test-macstadium-macos1012-x64-1 is unreachable

fatal: [test-macstadium-macos1012-x64-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '208.83.1.46' (ECDSA) to the list of known hosts.\r\nno such identity: /var/lib/awx/.ssh/id_rsa: No such file or directory\r\nAdministrator@208.83.1.46: Permission denied (publickey,password,keyboard-interactive).", "unreachable": true}

As is test-macstadium-macos1013-x64-1

@Haroon-Khel
Copy link
Contributor

Unreachable:
test-macstadium-macos1014-x64-1
test-macstadium-macos1014-x64-2
test-macstadium-macos1014-x64-3
test-macstadium-macos1015-x64-1
build-macstadium-macos1010-x64-1
build-macstadium-macos1014-x64-2
build-macstadium-macos1014-x64-1

Hanged:
test-macstadium-macos11-arm64-1 (also hung when I tried to ssh in)

Ran on (but had errors):
test-macstadium-macos11-arm64-2

@sxa
Copy link
Member Author

sxa commented Mar 10, 2021

Hmmm - I thought @gdams had them all configured via Bastillion to have the AWX key and others. It looks like the following ones should be in there (I believe macstadium generally uses administrator, macincloud admin which is why you're seeing that difference):

  • build-macstadium-macos1010-x64-1
  • build-macstadium-macos1014-x64-1
  • build-macstadium-macos1014-x64-2
  • test-macincloud-macos1010-x64-1
  • test-macincloud-macos1010-x64-2
  • test-macstadium-macos1012-x64-1
  • test-macstadium-macos1013-x64-1
  • test-macstadium-macos1014-x64-1
  • test-macstadium-macos1014-x64-2
  • test-macstadium-macos1014-x64-3
  • test-macstadium-macos1015-x64-1
  • test-macstadium-macos11-arm64-1
  • test-macstadium-macos11-arm64-2

@Haroon-Khel
Copy link
Contributor

AWX isnt able to connect to the nine machines, test-nine-macos1015-x64-1 and test-nine-macos1015-x64-2, but I suppose that's because theyre behind a firewall? Either way, its not an unreachable error that returns, its a does not match any hosts error:

[WARNING]: Could not match supplied host pattern, ignoring: test-nine-
macos1015-x64-2
ERROR! Specified hosts and/or --limit does not match any hosts

@sxa
Copy link
Member Author

sxa commented Mar 10, 2021

OK Looks like they hadn't been added to any of the profiles so they didn't have anything other than the default keys. All the test ones ending in -1 ought to be ok now (excluding nine ones)

@Haroon-Khel
Copy link
Contributor

The condition in the Common role

  when:
    - not macos_version | regex_search("10.12")
  tags: build_tools

Should be

  when:
    - not macos_version.stdout | regex_search("10.12")
  tags: build_tools

since macos_version is a registered variable. This gives the following error

fatal: [test-macstadium-macos1014-x64-1]: FAILED! => {"msg": "The conditional check 'not macos_version | regex_search(\"10.12\")' failed. The error was: Unexpected templating type error occurred on ({% if not macos_version | 
regex_search(\"10.12\") %} True {% else %} False {% endif %}): expected string or bytes-like object\n\nThe error appears to be in '/tmp/awx_763_m5aop2n8/project/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/roles/Common/tasks/MacOSX.yml': 
line 93, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Install Build Tool Packages NOT macOS 10.12\n  ^ here\n"}

@Haroon-Khel
Copy link
Contributor

@sxa
test-macincloud-macos1010-x64-2 and test-macincloud-macos1010-x64-1 are still unreachable. It would be ideal if I could test the playbook on either of these, since they are not currently being used by a jenkins job, hence I could bring them temporarily offline. Preferably -2

@sxa
Copy link
Member Author

sxa commented Mar 11, 2021

test-macincloud-macos1010-x64-2 and test-macincloud-macos1010-x64-1 are still unreachable.

-1 already seems to have yours and AWX's key on it (admin user) as per the earlier change

Can you post the error you're getting for -1 (and whether it's from AWX or a connection using your own machine)?

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Mar 11, 2021

TASK [Gathering Facts] *********************************************************
fatal: [test-macincloud-macos1010-x64-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via
 ssh: Connection closed by 74.80.250.151 port 22", "unreachable": true}

Same on -2. I tested both users, admin and administrator. Ive just run this now. Bare in mind im using the Deploy UNIX playbook to macos template, not the Deploy UNIX playbook template. Ill test it on the latter now, depsite both having the same ssh admin key credential

@sxa
Copy link
Member Author

sxa commented Mar 11, 2021

Hmmm I put the verbosity level up to maximum on the AWX job and re-run it (I've put it back now). After being somewhat confused it looks like fails when you try to connect from a RHEL8/CentOS8/Fedora33 system (The AWX docker images are based on CentOS8, but I've tried it on a couple of other RHEL8 systems and get the same problem (Either try with one of our other ones, or docker run -it centos:8 /bin/bash then dnf install openssh-clients ssh -vvvv 74.80.250.151 and you'll see it kicks you out without even attempting to prompt for a password or anything else - that's what we're seeing ...

@sxa
Copy link
Member Author

sxa commented Mar 12, 2021

As far as I can tell this only affects the older macos 10.10 systems - the later ones seem to not suffer in the same way (https://awx.adoptopenjdk.net/#/jobs/playbook/797?job_search=page_size:20;order_by:-finished;not__launch_type:sync is a run on test-macstadium-macos1012-x64-1) although it doesn't run to completion.

@sxa
Copy link
Member Author

sxa commented Mar 12, 2021

OK the problem was that the CentOS8 ssh client by default does not allow the aes128-ctr cipher. Can be verified by using ssh -c aes128-ctr. I've now enabled it in the ssh_config of the awx_task container and it now seems to work on the 10.10 boxes

@sxa
Copy link
Member Author

sxa commented Mar 12, 2021

At least for now the macos11 aarch64 machines seem to be stalling - I would possibly ignore them for now as we only really want to the x64 ones

@Haroon-Khel
Copy link
Contributor

I see that test-macincloud-macos1010-x64-1 is now no longer unreachable so ive taken it down in jenkins to run the playbook on. -2 is still unreachable. Im using the unix playbook template, so this should not disrupt the jobs that you have pending/running

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Mar 15, 2021

While upgrading all packages

TASK [Common : Upgrade installed packages] *************************************
fatal: [test-macincloud-macos1010-x64-1]: FAILED! => {"changed": false, "msg": "Error: /usr/local/Cellar is not writable. You should change the\nownership and permissions of /usr/local/Cellar back to your\nuser account:\n  sudo chown -R $(whoami) /usr/local/Cellar\nWarning: You are using macOS 10.10.\n
We (and Apple) do not provide support for this old version.\n
You will encounter build failures with some formulae.\n
Please create pull requests instead of asking for help on Homebrew's GitHub,\nDiscourse, Twitter or IRC. You are responsible for resolving any issues you\nexperience while you are running this old version.\n\nError: The following directories are not writable by your user:\n
/usr/local/Cellar\n
/usr/local/Homebrew\n
/usr/local/include\n
/usr/local/lib/pkgconfig\n
/usr/local/share/aclocal\n
/usr/local/share/info\n
/usr/local/share/zsh\n/usr/local/share/zsh/site-functions\n
/usr/local/var/homebrew/linked\n
/usr/local/var/homebrew/locks\n
\nYou should change the ownership of these directories to your user.\n
  sudo chown -R $(whoami) /usr/local/Cellar /usr/local/Homebrew /usr/local/include /usr/local/lib/pkgconfig /usr/local/share/aclocal /usr/local/share/info /usr/local/share/zsh /usr/local/share/zsh/site-functions /usr/local/var/homebrew/linked /usr/local/var/homebrew/locks\n
\nAnd make sure that your user has write permission.\n
  chmod u+w /usr/local/Cellar /usr/local/Homebrew /usr/local/include /usr/local/lib/pkgconfig /usr/local/share/aclocal /usr/local/share/info /usr/local/share/zsh /usr/local/share/zsh/site-functions /usr/local/var/homebrew/linked /usr/local/var/homebrew/locks"}

This machine (test-macincloud-macos1010-x64-1) uses the admin user, instead of the Administrator user used by other mac machines. I think this error is caused by the fact that those mentioned directories arent owned by the admin user and belong to the admin group, which the admin user isnt a part of

@sxa
Copy link
Member Author

sxa commented Mar 15, 2021

My gut feel is that this is a reasonable thing to do. @gdams do you know if there's any reason why the macincloud systems are set up with an admin user that doesn't have these privileges, or any other reason why we should add it to the admin group?

@sxa
Copy link
Member Author

sxa commented Mar 15, 2021

@gdams Why is e.g. /usr/local/Cellar on test-macincloud-macos1010-x64-1 full of files owned by the zeus user - did you install as that user instead of admin?

@Haroon-Khel
Copy link
Contributor

Various errors found. Ive documented them here #2042

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Mar 18, 2021

The ansible version on awx

ansible-playbook 2.9.11
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/var/lib/awx/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.6/site-packages/ansible
  executable location = /usr/bin/ansible-playbook
  python version = 3.6.8 (default, Apr 16 2020, 01:36:27) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]

@sxa Is it possible to get this updated to the latest version? (2.10+)

It will solve issues relating to the homebrew_cask module, but will hit errors relating to resolving the {{ ansible_user }} variable, but this can be fixed

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Mar 4, 2022

We've got two new machines via https://github.com/adoptium/infrastructure/pull/2494/files and have put a minimal installation on it (ant+contrib+/usr/local/bin symlink, jenkins user, boot JDK, Xcode license accept, and adding the hostname to /etc/hosts) but we should look at running a full playbook on at least one of these (based on an earlier comment it sounded like we hadn't used the playbooks from scratch yet)

I'll get started on deploying the playbook to one of these machines
https://awx.adoptopenjdk.net/#/jobs/playbook/1285

The locale issue (I think) is documented in a separate infra issue, I'll see if I can dig it out

Was this one resolved?

I must have been mistaken when I thought it was a separate infra issue since I cannot seem to find it. I'll rerun the jdk_util tests to see if the problem persists
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3795/console

@Haroon-Khel
Copy link
Contributor

Anything under 10.13 is too old for the Upgrade installed packages task in the Common role. A simple condition to skip this task on those OS's seems like a good temp fix.

@sxa We still have test-macstadium-macos1012-x64-1 and
test-macstadium-macos1013-x64-1 whose packages brew will not upgrade. Do we have any plans to decommission those and replace them with newer machines?

@sxa
Copy link
Member Author

sxa commented Mar 4, 2022

Hmmm we have nothing in plan at the moment, however since we only list 10.14 and later as supported on https://adoptium.net/supported_platforms.html that would definitely be an option. Is the error the same as the one you had on 10.10? If so we should definitely consider it as part of #2496

@sxa
Copy link
Member Author

sxa commented Mar 4, 2022

Also, could you download and try and run JDK11 and 17 on 10.13 and check if they work at all? (Extract tarball, run java -version from it) I know 17 doesn't work on 10.10.

(EDIT: 17 is ok with 10.13 as https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_sanity.functional_x86-64_mac/123/console ran on it, assuming the name matches the real level on that machine - sounds like it does if you're getting that error from the playbooks)

@Haroon-Khel
Copy link
Contributor

Both JDKs work on macos10.13.

test-macstadium-macos1013-x64-1:~ administrator$ ./jdk-11.0.14.1+1/Contents/Home/bin/java -version
openjdk version "11.0.14.1" 2022-02-08
OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)
OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)
test-macstadium-macos1013-x64-1:~ administrator$ ./jdk-17.0.2+8/Contents/Home/bin/java -version
openjdk version "17.0.2" 2022-01-18
OpenJDK Runtime Environment Temurin-17.0.2+8 (build 17.0.2+8)
OpenJDK 64-Bit Server VM Temurin-17.0.2+8 (build 17.0.2+8, mixed mode, sharing)

@Haroon-Khel
Copy link
Contributor

We've got two new machines via https://github.com/adoptium/infrastructure/pull/2494/files and have put a minimal installation on it (ant+contrib+/usr/local/bin symlink, jenkins user, boot JDK, Xcode license accept, and adding the hostname to /etc/hosts) but we should look at running a full playbook on at least one of these (based on an earlier comment it sounded like we hadn't used the playbooks from scratch yet)

@sxa Any reservations on having the second machine setup to completion?

@sxa
Copy link
Member Author

sxa commented Mar 7, 2022

@sxa Any reservations on having the second machine setup to completion?

While we have a "clean" machine I'd recommendtrying to run a build pipeline on the one you've set up (Say, JDK17 or 19) and making sure that the one that has had the playbook run on it does build properly first. That way if there is any remedial action required we can do that from scratch on the second one.

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Mar 7, 2022

One thing missing from the awx deployment of the playbook is the variable Apple_ID_User, here

TASK [Xcode : Install Xcode on Intel Macs] *************************************
fatal: [test-macincloud-macos1201-x64-1]: FAILED! => {"msg": "The field 'environment' has an invalid value, which includes an undefined variable. The error was: 'Apple_ID_User' is undefined\n\nThe error appears to be in '/tmp/awx_1285_7hdbqt40/project/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/roles/Xcode/tasks/main.yml': line 69, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# with older macOS won't have Xcode installed.\n- name: Install Xcode on Intel Macs\n  ^ here\n"}

In the Xcode role, other variables (which awx will likewise not be able to find) are "{{ Apple_ID_Password }}" and "{{ FASTLANE_SESSION }}"

Awx is able to pull the vendor files/secrets from the secrets repo, which is where I assume these variables are. So I am not sure why it is not able to find these variables. @gdams Any ideas?

@Haroon-Khel
Copy link
Contributor

We've got two new machines via https://github.com/adoptium/infrastructure/pull/2494/files and have put a minimal installation on it (ant+contrib+/usr/local/bin symlink, jenkins user, boot JDK, Xcode license accept, and adding the hostname to /etc/hosts) but we should look at running a full playbook on at least one of these (based on an earlier comment it sounded like we hadn't used the playbooks from scratch yet)

@sxa I see that you've manually installed xcode on these machines already (which suggests that the xcode role is not idempotent?). I've kicked off a jdk17 build here https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-mac-x64-hotspot/74/console

@sxa
Copy link
Member Author

sxa commented Mar 7, 2022

@sxa I see that you've manually installed xcode on these machines

I did not install Xcode - it had been preinstalled on the systems. I did do a license accept for it that was a prereq to it being used.

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Mar 7, 2022

Build seems to be failing

13:03:49  [SUCCESS] Executing local file at /Users/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-mac-x64-hotspot/build-farm/platform-specific-configurations/mac.sh
13:03:49  [WARNING] You may be asked for your su user password, attempting to switch Xcode version to /Applications/Xcode.app
13:03:49  sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper
13:03:49  sudo: a password is required

Fails when executing this
https://github.com/adoptium/temurin-build/blob/c4842313fe56dacd4986e35d34d132db979bf8df/build-farm/platform-specific-configurations/mac.sh#L68

Possibly related to the fact that the xcode role did not fully run?

I did not install Xcode - it had been preinstalled on the systems.

Eventhough it is preinstalled. Investigating

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Mar 7, 2022

Did some testing in an ssh environment on test-macincloud-macos1201-x64-1 and build-macstadium-macos1014-x64-2. As the jenkins user running sudo xcode-select --switch "/Applications/Xcode.app" on the build machine does not require a password to be entered, while it does on test-macincloud-macos1201-x64-1

@Haroon-Khel
Copy link
Contributor

@sxa
Copy link
Member Author

sxa commented Mar 8, 2022

sudo xcode-select --switch "/Applications/Xcode.app"

While I'm somewhat curious as to why we use a different version for 11 than all of the other releases if we determine that this is required and there's no other way to do it then we should add a suitably restricted (i.e. only allow xcode-select --switch) into the /etc/sudoers setup for the jenkins user.

@Haroon-Khel
Copy link
Contributor

Since https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-mac-x64-hotspot/76/console passed, I will begin setting up the second machine 216.39.74.140. Any changes to its xcode-select config can be made after setup

@Haroon-Khel
Copy link
Contributor

Running sanity tests on test-macincloud-macos1201-x64-1 and -2

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Apr 14, 2022

Since the move to awx2, I will close this issue once we get a MacOS specific deployment job. It currently uses the Unix job

@sxa
Copy link
Member Author

sxa commented May 10, 2022

@Haroon-Khel Presumably you an create the job yourself since you have admin access to the server?

@sxa
Copy link
Member Author

sxa commented May 30, 2022

@Haroon-Khel Are you able to progress this?

@Haroon-Khel
Copy link
Contributor

I've created a MacOS template on awx2 and have tested it on one of our macos machines. It runs successfully. This issue can now be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants