-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the file_mounts optimization and oslogin #1108
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice @Michaelvll! One question: is delaying the optimization necessary for correctness here (to fix the bug of oslogin)? Or is it just an implementation choice? It seems like if this PR does not add delaying, the oslogin bug would still have been fixed by using a separate path.
Results for for i in {1..3}; do time sky launch --cloud {cloud} -y ''; done
:
This PR
sky launch --cloud gcp -y '' 13.29s user 3.21s system 12% cpu 2:11.69 total
sky launch --cloud gcp -y '' 14.60s user 2.99s system 13% cpu 2:14.45 total
sky launch --cloud gcp -y '' 11.05s user 2.17s system 10% cpu 2:06.05 total
sky launch --cloud aws -y '' 8.31s user 1.59s system 5% cpu 2:53.42 total
sky launch --cloud aws -y '' 9.59s user 2.10s system 7% cpu 2:42.28 total
sky launch --cloud aws -y '' 8.17s user 1.54s system 6% cpu 2:31.57 total
Master
sky launch --cloud gcp -y '' 12.90s user 2.89s system 11% cpu 2:20.54 total
sky launch --cloud gcp -y '' 10.51s user 1.98s system 10% cpu 2:04.03 total
sky launch --cloud gcp -y '' 10.89s user 2.08s system 10% cpu 2:08.76 total
sky launch --cloud aws -y '' 8.47s user 1.77s system 6% cpu 2:48.24 total
sky launch --cloud aws -y '' 8.64s user 1.80s system 6% cpu 2:49.91 total
sky launch --cloud aws -y '' 8.37s user 1.73s system 6% cpu 2:42.54 total
AWS doesn't benefit that much, interestingly.
yes, it will be important for the correctness, since the authentication step will add the new file into the file mount, but in our previous implementation, the files were already cp to the local tmp directory in the step before the authentication, ie the newly added file will not be uploaded to the cloud. |
@Michaelvll I confirmed the new fix on a new spot controller. It works for me now! |
This PR refactors the file_mounts optimization to make it lazy until all the file mounts are added. The file_mounts optimization will be applied to all clouds now.
It should also fix the oslogin issue mentioned in #1106 (comment) , as previously we place the backup file in the same directory as
config_default
whose parent directory will be overwritten by ray autoscaler, instead we now place the backup file in~/.sky/.sky_gcp_config_default
on the remote instance.Hey @lhqing, would you like to switch to this branch
git checkout refactor-file-mount-optimization; pip install -e ".[gcp]"
and try if this solves the problem?Tested:
./tests/run_smoke_tests.sh