[SSO/Spot] Fix spot controller failed to launch spot cluster when using SSO #1817
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A user reported that when using AWS SSO account, the
sky spot launch
will fail due to the controller does not have any cloud access enabled. That is a bug with our current SSO implementation, as the controller is assigned with an assumed IAM-role, which has the access to create the spot cluster and will not have static credential files, but our cloud access checking will fail when it finds there is no credential files.Another problem with the spot controller is: the controller will keep retrying even if the all cloud disabled exception is raised, causing the spot job in STARTING status forever.
To reproduce:
sky spot launch -n test echo hi --retry-until-up
This PR fixes:
Tested (run the relevant ones):