Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Add a memory usage regression test to the OSS benchmark #62

Merged
merged 10 commits into from
Sep 3, 2020

Conversation

blefaudeux
Copy link
Contributor

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • [-] Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

  • Adds a check on the benchmark to make sure that the memory consumption of OSS does not regress
  • Adds a state consolidation step to the benchmark to make sure that this codepath is tested for regression, and future speedups on that front are visible

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

definitely 🙃

assert (mean - 3.0 * std) < reference_speed, "Regression detected"
print(f"[Regression Test] Mean speed: {mean:.2f} +/- {std:.2f}")
assert (mean - 3.0 * std) < reference_speed, "Speed regression detected"
assert max_memory < 1.05 * reference_memory, "Memory use regression detected"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a 5% tolerance here, guessing that some CUDA or torch version changes could affect (I don't have any STD for this value, it's the max memory used over the whole run)

@blefaudeux blefaudeux marked this pull request as draft September 3, 2020 18:30
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 3, 2020
@blefaudeux blefaudeux marked this pull request as ready for review September 3, 2020 19:52
@blefaudeux blefaudeux merged commit ee38e1e into master Sep 3, 2020
@blefaudeux blefaudeux deleted the oss_benchmark_memory branch September 3, 2020 20:18
myleott added a commit that referenced this pull request Feb 22, 2021
* Do reduce-scatter in a separate CUDA stream

* Add _post_backward_stream to stubs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants