-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Storage][Spot] Resolving failure to exclude local files for managed spot jobs #1939
[Storage][Spot] Resolving failure to exclude local files for managed spot jobs #1939
Conversation
@concretevitamin @romilbhardwaj Ready for a look! One thing I was unsure was where to keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I tried running
sky spot launch --workdir=. --cloud gcp echo hi
with a fairly typical .gitignore
:
*.egg-info
__pycache__
wandb
*.out
**/.ipynb_checkpoints
data/
exp/
exp-*/
.git
This throws an error
Task from command: echo hi
Launching a new spot task 'sky-da93-zongheng'. Proceed? [Y/n]:
I 05-12 09:50:09 execution.py:708] Translating file_mounts with local source paths to SkyPilot Storage...
I 05-12 09:50:09 execution.py:733] Workdir '.' will be synced to cloud storage 'skypilot-workdir-zongheng-4d3146cf'.
I 05-12 09:50:09 execution.py:805] Uploading sources to cloud storage. See: sky storage ls
I 05-12 09:50:12 storage.py:1653] Created GCS bucket skypilot-workdir-zongheng-4d3146cf in US-CENTRAL1 with storage class STANDARD
W 05-12 09:50:12 storage.py:793] '.git' directory under '.' is excluded during sync.
⠙ Syncing . to gs://skypilot-workdir-zongheng-4d3146cf/E 05-12 09:50:13 data_utils.py:224] CommandException: Invalid exclude filter ((.git/*|.gitignore|*.egg-info|__pycache__|wandb|*.out|**/.ipynb_checkpoints|data/|exp/|exp-*/|.git))
E 05-12 09:50:13 data_utils.py:224]
E 05-12 09:50:13 storage.py:805] Could not upload . to store name skypilot-workdir-zongheng-4d3146cf.
sky.exceptions.StorageUploadError: Upload to bucket failed for store skypilot-workdir-zongheng-4d3146cf. Please check the logs.
Looks like we need to robustify the parsing / handling?
@concretevitamin Ready for another look! Please refer to the updated comment at the top. The original |
Awesome work @landscapepainter! I'll take a look. This feature is likely quite involved since we're implementing our own gitignore parsing. Can you share a script/tar of the dir structure you used (the one in the screenshot above) for testing, and ideally add that to smoke tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note on !
:
For .gitignore
, git comprehends a preceding !
as an exception. For example, if you have the followings in the .gitignore
,
*.log
!important/*.log
it ignores every file that ends with .log in the directory and subdirectories, besides the ones in important
subdirectory.
And there are differences on the behavior of !
either it being prefixed to a file or a directory. Consider the following examples:
*.log
!important.log
Above .gitignore
file wants to ignore every files that ends with .log
besides important.log
, and this works fine. On the other hand, for directories, consider the following example:
logs/
!logs/important.log
You may think every items in logs
directory would be ignored besides important.log
, but logs/important.log
gets ignored as well since negating file is impossible if it matches the pattern for a directory. This seems like pattern matched for a directory has higher priority than pattern matched for a file.
Following is just regurgitating what I mentioned at the top.
*.log
!important/*.log
trace.*
Above .gitignore
file wants to ignore every files that ends with .log
besides the ones in important
directory. However, important/trace.log
will be ignored as trace.*
was added after !important/*.log
and this overrides it.
I'm not sure how often people use !
, but if we are going to implement this exactly as how git interacts with .gitignore
, it seems necessary for data_utils.parallel_upload()
to run gsutil rsync
and aws s3 sync
commands variable number of times by breaking down the directories/files into several pieces as --include
from aws s3 sync
overrides every --exclude
d items, and gsutil rsync
does not have an option like --include
and it only has excluding option, -x
.
} | ||
|
||
@staticmethod | ||
def create_dir_structure(base_path, structure): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a function and a structure that can be used to create more robust testing structure in temporary destination with with tempfile.TemporaryDirectory() as tempdir:
Will update with a complete test soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add '.gitignore' corresponding to the structure as part of the test as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romilbhardwaj Mostly done with the test using complex structure for robust implementation. But we need to decide what to do with '!' to complete the test, so I can set the standards for either or not the test has passed.
implementing with git instead #2018 |
This PR closes #1898
Normally, when you launch a cluster with Skypilot,
workdir
and localfile_mounts
sources are uploaded directly to the remote VM withrsync
. However, for managed spot jobs, those are uploaded to Cloud Object Storages withaws
orgsutil
CLI. Due to this difference, managed spot jobs were failing to ignore the files either in.gitignore
or.git/info/exclude
. This PR fixes the issue by implementing a feature to excluding the files from being uploaded to the Cloud Storages.Adding a test to see if the process of local source (
workdir
and localfile_mounts
sources in this case) being uploaded to Cloud Object Storage withCOPY
mode respects the.gitignore
and.git/info/exclude
to exclude the files from being uploaded.+update:
There are some discrepancies between how
git
handles the strings in.gitignore
or.git/info/exclude
and how--exclude
filter fromaws s3 sync
or-x
filter fromgsutil rsync
handles the passed in strings. Hence, in order to resolve the discrepancy, I followed howgit
handles strings read from.gitignore
based on this doc.Following is the structure of the directories I used to test the updated implementation:
Following is the
.gitignore
file I included in thetest-spot-exclusive
directory:Current update successfully understands the format given in
.gitignore
above besides two exceptions relating to!
:For
s3
andr2
(aws s3 sync
):Items preceding with
!
in.gitignore
overrides other items to be excluded. For an instance, using the example from the doc,git
would understand the given order in.gitignore
as to excludeimportant/trace.log
astrace.*
appears after!important/*.log
. However, for our implementation, any item preceding with!
will override the items to be excluded regardless of the order. In other words,important/trace.log
will be included. This is due to the how--include
fromaws s3 sync
behaves.--include
overrides every files passed to--exclude
.For
gs
(gsutil rsync
):!
used to include(such as!included.log
and notsquare_bracket_excla/excluded[!01].log
) is not supported at all. Foraws s3 sync
, we can pass in--include
to implement!
, butgsutil rsync
does not have any functionality to specify which file to include(ref). It only has-x
which is to specify the files to be excluded.With the file structure above and the
.gitignore
file, the current implementation foraws s3 sync
uploadsinclude.txt
,included.log
, andincluded/included.log
. Andgsutil rsync
uploadsincluded.txt
.Tested (run the relevant ones):
sky spot launch
and checked theworkdir
and localfile_mounts
src from the spot instancepytest tests/test_smoke.py::TestStorageWithCredentials --cloudflare
pytest tests/test_smoke.py --managed-spot
test_smoke.py/test_excluded_file_cloud_storage_upload_copy