Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bechmarks/transformer.py doesn't specify chunks when constructing Pipe #51

Closed
froody opened this issue Aug 25, 2020 · 0 comments · Fixed by #52
Closed

bechmarks/transformer.py doesn't specify chunks when constructing Pipe #51

froody opened this issue Aug 25, 2020 · 0 comments · Fixed by #52
Assignees

Comments

@froody
Copy link
Contributor

froody commented Aug 25, 2020

🐛 Bug

The whole point of the Pipe module is to split a batch into #chunks microbatches and then process these through the stages of the pipeline in order to achieve parallelism by having multiple microbatches being processed on different GPUs at the same time. The benchmark in bechmarks/transformer.py doesn't specify chunks so it defaults to chunks=1, which doesn't make use of any of the microbatch logic. Further, changing the benchmark to set chunks=2 or chunks=4 yields a slowdown, when I would expect that more chunks -> more parallelism.

Command

PYTHONPATH=$PWD python benchmarks/transformer.py

To Reproduce

Steps to reproduce the behavior:

  1. PYTHONPATH=$PWD python benchmarks/transformer.py
  2. Change L263 to specify chunks=2 and rerun the command, e.g. p = pipe.Pipe(model, balance, chunks=2)
  3. Change L263 to specify chunks=4 and rerun the command

chunks=1: test loss 5.57 | time: 30.72s | words: 2304870 | wps: 75028.93
chunks=2: test loss 5.58 | time: 53.51s | words: 2304870 | wps: 43077.41
chunks=4: test loss 5.57 | time: 81.93s | words: 2304870 | wps: 28133.60

Expected behavior

chunks=N is faster than chunks=1 for some N when there are more than 1 devices

Environment

Collecting environment information...
PyTorch version: 1.6.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Quadro GP100
GPU 1: Quadro GP100

Nvidia driver version: 418.116.00
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.6.0
[pip3] torchtext==0.7.0
[pip3] torchvision==0.7.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] mkl 2020.1 217
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.1.0 py37h23d657b_0
[conda] mkl_random 1.1.1 py37h0da4684_0 conda-forge
[conda] numpy 1.19.1 py37hbc911f0_0
[conda] numpy-base 1.19.1 py37hfa32c7d_0
[conda] pytorch 1.6.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] torchtext 0.7.0 pypi_0 pypi
[conda] torchvision 0.7.0 py37_cu101 pytorch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants