-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] batch broadcast requests into a configurable buffer #43
Closed
Closed
Changes from 10 commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
86628e1
first take at comms batching when broadcasting the state
blefaudeux 4dfc71c
sorting imports..
blefaudeux 5b00c3b
nit
blefaudeux df15aa7
new machine means new linting..
blefaudeux 3565391
better unit testing
blefaudeux f0e6814
hotfix, dimension
blefaudeux b7f7802
better unit testing still, preemptive bugfix
blefaudeux 8084f98
linting
blefaudeux 510f773
unit test fix, a little prettier, should be gtg
blefaudeux 582134f
annoying, remove coverage check on the type hints
blefaudeux 4ed074b
initial commit, dummy training loop, pure pytorch but not DDP
blefaudeux a167289
probably slightly broken, but rough DDP benchmark run
blefaudeux 20b981d
adding the torchvision requirement for testing
blefaudeux 8a2377c
brainfart
blefaudeux 41dcf69
reduce the loss, do something slightly distributed
blefaudeux b212dee
Some cleanup, distributing the training on two GPUs
blefaudeux b149113
Merge remote-tracking branch 'upstream/master' into oss_benchmark
blefaudeux b5cacbd
some cleanup + adding a vanilla run, still not good to go
blefaudeux 928791e
less silly defaults, gtg for a start I think
blefaudeux e6a4756
smaller batch to fit the smaller gpus used in the circleci rigs
blefaudeux e01a60a
Merge commit 'c2d6f4b68e9c24d05a3eb5da4f60431d9e5c86d8' into oss_batc…
blefaudeux ab79ddc
better device/buffer alloation
blefaudeux 906d740
Merge commit 'e6a4756c1c2927d35af2148dbfb8d0e1f3bff797' into oss_batc…
blefaudeux d599f4c
WIP, some type hint cleaning, speed deficit for now
blefaudeux 78fc476
lint + double buffering setting
blefaudeux 4560a0c
fix some lazy programming when running on cpu
blefaudeux 56974ed
tighter OSS input type
blefaudeux 8aa48f2
Merge branch 'master' into oss_batch_broadcast
blefaudeux 24d619d
fixing botched merge
blefaudeux 11811f7
Merge remote-tracking branch 'upstream/master' into oss_batch_broadcast
blefaudeux 2209cce
default the buffer to None, check for device locality
blefaudeux 47da0b5
linting + tweak the broadcast buffer settings
blefaudeux 4d4b8cf
minor tweak to the oss benchmark CLI, smaller param buffer
blefaudeux 5cbe21e
adjust speed for RMSProp
blefaudeux b3aad66
bugfix
blefaudeux 07d6626
back to 100% code coverage, slightly cleaner unit test
blefaudeux b09d2a1
WIP
blefaudeux 3739ee0
Merge branch 'master' into oss_batch_broadcast
blefaudeux 5f78ccf
better bucketing, across devices and ranks. credits to oss_ddp. WIP i…
blefaudeux c711b73
cosmetics
blefaudeux af9dc13
WIP
blefaudeux 3dd3c27
Merge remote-tracking branch 'upstream/master' into oss_batch_broadcast
blefaudeux 8916c50
merge fixes + tentative perf improvements
blefaudeux 68a67fb
allocate per-device broadcast buffer once and for all, at constructio…
blefaudeux 4cadd58
deduplicate oss_ddp/oss
blefaudeux 07c20a9
merge with upstream master, could still be optimized
blefaudeux 5decde1
Merge remote-tracking branch 'upstream/master' into oss_batch_broadcast
blefaudeux a8f601c
Merge remote-tracking branch 'upstream/master' into oss_batch_broadcast
blefaudeux b34bedd
restoring working state, nccl deadlocking unfortunately
blefaudeux 08ce45d
wip
blefaudeux 26308b4
in working order, but unbearably slow
blefaudeux File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does optional mean in this context? The parameter does not look like an optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to write that people are free to pass it in or not, there's a default provided