-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Reduce import time with lazy imports and exec time by avoiding script rsync #3394
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice @Michaelvll! Just did a pass.
Q: are pandas
/ networkx
/ boto3
.. all found to be very slow during import? Maybe worth documenting their loading speed in LazyImport/PR description for the record.
from typing import Optional | ||
|
||
|
||
class LazyImport: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Is there ever a call site where the same LazyImport is accessed in parallel inside the module? If that happened, will there be a race? (Regular import xx
doesn't have this problem.) If that never happened, shall we add a clear warning in this module/class docstr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fine as importlib.import_module
is just a wrapper over __import__
. I could not think of a case that can cause a race condition, if the original import xx
works.
Added comments in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Michaelvll, just a note on type errors from Pylance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @Michaelvll for this major optimization!
We use this for pandas and networkx, as they can be time-consuming to import | ||
(0.1-0.2 seconds). With this class, we can avoid the unnecessary import time | ||
when the module is not used (e.g., `networkx` should not be imported for | ||
`sky status and `pandas` should not be imported for `sky exec`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
This is to get rid of some time-consuming packages on the critical path, to reduce the import time, which will both improve the import time and some remote command execution.
The first step of #3157
Reduce import time by 2x
master:
python -c "import sky"
-- 1.002sThis PR:
python -c "import sky"
-- 0.494sReduce the
exec
timesky launch -c test --cloud kubernetes --cpus 1
for i in `seq 1 5`; do time sky exec test -d echo hi; done
Master: 17.89s
This PR: 13.91s
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py --aws
(except fortests/test_smoke.py::test_skyserve_new_autoscaler_update
,tests/test_smoke.py::test_skyserve_base_ondemand_fallback
,tests/test_smoke.py::test_skyserve_user_bug_restart
due to [AWS] Bucket on eu-south-1 fail to copy/mount #3405)pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh