Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most orka hosted macs offline in Jenkins #2961

Closed
richardlau opened this issue Jun 13, 2022 · 20 comments
Closed

Most orka hosted macs offline in Jenkins #2961

richardlau opened this issue Jun 13, 2022 · 20 comments

Comments

@richardlau
Copy link
Member

image

Probably linked to a series of service emails/notifications we've had from MacStadium today.

I've tried restarting the Jenkins agents via ansible.nodejs.org but that doesn't seem to have brought the agents back online (although the script in AWX succeeded). I am able to ssh into the machines, so they're at least reachable.

@richardlau
Copy link
Member Author

/Users/iojs/jenkins_err.log on both test-orka-macos11-x64-1 and test-orka-macos11-x64-2 is full of

The operation couldn’t be completed. Unable to locate a Java Runtime.
Please visit http://www.java.com for information on installing Java.

messages.

@richardlau
Copy link
Member Author

I'm rerunning the Ansible playbooks against the orka machines to see if that puts Java back on them.

@mhdawson
Copy link
Member

It looks like all of -1 -2, -3 test-orka-macos10.14-x64-1 are trying to connect as the same machine. It's possible they need to be re-ansibled?

@richardlau
Copy link
Member Author

Looks like issues running Ansible against the majority of the machines -- I don't have time to look at those tonight.

PLAY RECAP *************************************************************************************************************************************************
release-orka-macos10.15-x64-1 : ok=7    changed=0    unreachable=1    failed=0    skipped=2    rescued=0    ignored=0
test-orka-macos10.14-x64-1 : ok=10   changed=1    unreachable=0    failed=1    skipped=7    rescued=0    ignored=0
test-orka-macos10.14-x64-2 : ok=10   changed=1    unreachable=0    failed=1    skipped=7    rescued=0    ignored=0
test-orka-macos10.14-x64-3 : ok=10   changed=1    unreachable=0    failed=1    skipped=7    rescued=0    ignored=0
test-orka-macos10.15-x64-1 : ok=10   changed=1    unreachable=0    failed=1    skipped=7    rescued=0    ignored=0
test-orka-macos10.15-x64-2 : ok=10   changed=1    unreachable=0    failed=1    skipped=7    rescued=0    ignored=0
test-orka-macos11-x64-1    : ok=39   changed=12   unreachable=0    failed=0    skipped=128  rescued=0    ignored=1
test-orka-macos11-x64-2    : ok=39   changed=12   unreachable=0    failed=0    skipped=128  rescued=0    ignored=1

the two machines that did successfully have Ansible run against them,
test-orka-macos11-x64-1 and test-orka-macos11-x64-2, now look to be online again and have picked up jobs.

@mhdawson
Copy link
Member

Running ansible on test-orka-macos10.14-x64-2 to see if I have any better luck.

@mhdawson
Copy link
Member

I don't know if it is stuck or just taking a long time, but it's still running the "Upgrade installed pacakges" step

@mhdawson
Copy link
Member

Still running. So Cancelled and tried again.

@mhdawson
Copy link
Member

That seemed to then run through quickly

@mhdawson
Copy link
Member

But it seems to have failed with

ASK [package-upgrade : Update Casks] *******************************************************************************************************************************************
fatal: [test-orka-macos10.14-x64-2]: FAILED! => {"changed": false, "msg": "Error: brew cask is no longer a brew command. Use brew <command> --cask instead."}

@richardlau
Copy link
Member Author

I'm unable to ssh into release-orka-macos10.15-x64-1 with the release ssh key but can with the test key and it looks like the VM is test-orka-macos10.15-x64-1.

I'm trying to figure out if the NAT translation has broken.

@richardlau
Copy link
Member Author

I think the NAT is okay -- if I touch a file on release-orka-macos10.15-x64-1 it doesn't show up on test-orka-macos10.15-x64-1. I think the VMs have reset to some snapshotted state. For example, the script to start the Jenkins agent has SECRET as the agent secret instead of an actual secret corresponding to the Jenkins agent:

$ ssh test-orka-macos10.15-x64-1
Last login: Tue Jun 14 04:54:53 2022 from 172.16.44.16
administrator@test-orka-macos10 ~ % cat /Users/iojs/start.sh
#!/bin/bash
export HOME=/Users/iojs
export NODE_TEST_DIR="$HOME/tmp"
export JOBS="4"

export OSTYPE=osx
export ARCH=x64
export DESTCPU=x64

PATH="/usr/local/opt/ccache/libexec:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin" java -Xmx128m \
    -jar /Users/iojs/slave.jar -secret SECRET \
    -jnlpUrl https://ci.nodejs.org/computer/test-orka-macos10.15-x64-1/slave-agent.jnlp
administrator@test-orka-macos10 ~ %

I'm going to try updating community.general (ansible-galaxy collection install -U community.general) to see if that fixes the Error: brew cask is no longer a brew command. Use brew <command> --cask instead. issue with the homebrew_cask task and allows the Ansible scripts to complete.

@AshCripps
Copy link
Member

FYI the base images in ORKA all have the hostname set as test- cause I used a test image as the base so they likely have been turned off and reset to their base state so will need ansibling again.

@richardlau
Copy link
Member Author

I've been able to get the two macOS 10.15 x64 test VMs reansibled and online. The 10.15 release machine and two of the 10.14 machines continue to have issues with the package upgrade task running brew.

@richardlau
Copy link
Member Author

I got the release machine back online but I'm fairly certain if it has been reset to base state we'll need to manually get the full xcode and signing certs on there.

@richardlau
Copy link
Member Author

Currently stuck because I don't appear to be getting the verification texts/calls from Apple when trying to log into my developer account (needed to download the full xcode)😞.

@mhdawson
Copy link
Member

@richardlau let me know if you want me to try to access the developer account.

@richardlau
Copy link
Member Author

@richardlau let me know if you want me to try to access the developer account.

@mhdawson I think I'm going to have to take you up on that. The calls from Apple are hanging up before I have a chance to answer. It's frustrating because I know I was able to receive the calls back in January on the same number. I tried dialling my number from my personal phone and that rang properly without hanging up so I can only imagine it's some incompatibility with Apple's system and the SIP my office phone is on.

What I'm trying to get onto release-orka-macos10.15-x64-1 is a full xcode install, as per https://github.com/nodejs/build/blob/main/ansible/MANUAL_STEPS.md#macos-release-machines. Basically the .xip file for xcode 11 (https://github.com/nodejs/node/blob/master/BUILDING.md#official-binary-platforms-and-toolchains -- although that says Command Line Tools but on the release machines we have full xcode as per the manual steps.). According to the nearform machine we have xcode 11.7 installed.

Once the .xip is on the machine I should be able to follow the rest of the steps in https://github.com/nodejs/build/blob/main/ansible/MANUAL_STEPS.md#macos-release-machines.

@richardlau
Copy link
Member Author

Thanks to @mhdawson for downloading the full xcode onto the release machine. I've completed the rest of https://github.com/nodejs/build/blob/main/ansible/MANUAL_STEPS.md#macos-release-machines to unpack and install the full xcode, accept the license and copy the signing keys onto the machine.

Test build: https://ci-release.nodejs.org/job/iojs+release/8537/

@richardlau
Copy link
Member Author

Needed to put Xcode Command Line Tools as per https://github.com/nodejs/build/blob/main/ansible/MANUAL_STEPS.md#install-command-line-tools-for-xcode on the two orka macOS 10.15 x64 test machines for the older versions of gyp in npm 6 (Node.js 14).

@richardlau
Copy link
Member Author

Re-ansibled the orka macOS 10.14 x64 VMs today and they're all back up now. I think we're done 😌.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants