-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] OSS flatten state dict #65
Conversation
cc @mannatsingh |
@@ -119,7 +119,7 @@ def closure(): | |||
print(f"[{dist.get_rank()}] : Mean speed: {mean:.2f} +/- {std:.2f}") | |||
|
|||
if use_oss and check_regression and dist.get_rank() == 0: | |||
assert (mean - 3.0 * std) < reference_speed, "Speed regression detected" | |||
assert (mean + 3.0 * std) > reference_speed, "Speed regression detected" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was wrong, fixed in another PR but might as well be fixed here (I was bumping into this locally)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we want the speed to increase, not decrease.. the test initially made sense when comparing runtime, now that it compares frames per second higher is better)
parser.add_argument("--check_regression", action="store", default=True, type=bool) | ||
parser.add_argument("--reference_speed", action="store", default=39.82, type=float) | ||
parser.add_argument("--check_regression", action="store_true", default=False) | ||
parser.add_argument("--reference_speed", action="store", default=32.32, type=float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
39 was the default speed for SGD, I had changed that earlier to RMSProp when checking for the memory pressure, somehow this change was lost
* add unit test pack/unpack kwargs * added two more corner cases * more doc and more tests * more corner cases * formatting * Update fairscale/utils/containers.py Co-authored-by: Sam Shleifer <sshleifer@gmail.com> * with pytest.raises is awesome * addressed comment * add tuple to be tested Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Before submitting
What does this PR do?
Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃