Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Reduce import time with lazy imports and exec time by avoiding script rsync #3394

Merged
merged 33 commits into from
Apr 3, 2024

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Mar 30, 2024

This is to get rid of some time-consuming packages on the critical path, to reduce the import time, which will both improve the import time and some remote command execution.

The first step of #3157

Reduce import time by 2x

master: python -c "import sky" -- 1.002s
This PR: python -c "import sky" -- 0.494s

Reduce the exec time

sky launch -c test --cloud kubernetes --cpus 1
for i in `seq 1 5`; do time sky exec test -d echo hi; done
Master: 17.89s
This PR: 13.91s

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • All smoke tests: pytest tests/test_smoke.py --aws (except for tests/test_smoke.py::test_skyserve_new_autoscaler_update, tests/test_smoke.py::test_skyserve_base_ondemand_fallback, tests/test_smoke.py::test_skyserve_user_bug_restart due to [AWS] Bucket on eu-south-1 fail to copy/mount #3405)
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@Michaelvll Michaelvll marked this pull request as ready for review April 1, 2024 04:07
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice @Michaelvll! Just did a pass.

Q: are pandas / networkx / boto3 .. all found to be very slow during import? Maybe worth documenting their loading speed in LazyImport/PR description for the record.

sky/adaptors/common.py Outdated Show resolved Hide resolved
sky/adaptors/common.py Outdated Show resolved Hide resolved
sky/adaptors/cudo.py Outdated Show resolved Hide resolved
sky/cli.py Outdated Show resolved Hide resolved
from typing import Optional


class LazyImport:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Is there ever a call site where the same LazyImport is accessed in parallel inside the module? If that happened, will there be a race? (Regular import xx doesn't have this problem.) If that never happened, shall we add a clear warning in this module/class docstr?

Copy link
Collaborator Author

@Michaelvll Michaelvll Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine as importlib.import_module is just a wrapper over __import__. I could not think of a case that can cause a race condition, if the original import xx works.

sky/adaptors/aws.py Outdated Show resolved Hide resolved
sky/adaptors/azure.py Outdated Show resolved Hide resolved
sky/adaptors/cloudflare.py Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Apr 1, 2024

Q: are pandas / networkx / boto3 .. all found to be very slow during import? Maybe worth documenting their loading speed in LazyImport/PR description for the record.

pandas/networkx are the slow imports found, which cause 0.2 and 0.1 seconds during imports, respectively.
For boto3 this is mainly for refactoring the original import_packages function, to simplify the cloud adaptors.

Added comments in the LazyImport class : )

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll, just a note on type errors from Pylance.

sky/adaptors/aws.py Outdated Show resolved Hide resolved
sky/adaptors/common.py Outdated Show resolved Hide resolved
sky/provision/runpod/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Michaelvll for this major optimization!

We use this for pandas and networkx, as they can be time-consuming to import
(0.1-0.2 seconds). With this class, we can avoid the unnecessary import time
when the module is not used (e.g., `networkx` should not be imported for
`sky status and `pandas` should not be imported for `sky exec`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

sky/adaptors/common.py Outdated Show resolved Hide resolved
sky/adaptors/common.py Outdated Show resolved Hide resolved
Michaelvll and others added 2 commits April 2, 2024 09:25
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
@Michaelvll Michaelvll merged commit bca709b into master Apr 3, 2024
20 checks passed
@Michaelvll Michaelvll deleted the lazy-imports branch April 3, 2024 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants