[Spot] Make get_job_timestamp fetching more robust #1148
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, our
start_at
for each job is default to NULL. When the job is set to FAILED by skylet, theget_job_timestamp
below will fail, due to cannot convertNone
to float. We now make thestart_at
default to -1, so that the value will always be available.skypilot/sky/spot/recovery_strategy.py
Line 159 in b36bbcf
The problem was caught by @concretevitamin in spot jobs:
Note: Our
log_utils.readable_time_duration
will handle the case where thestart_at < 0
just asstart_at is None
.skypilot/sky/utils/log_utils.py
Lines 86 to 87 in 2505fb1