Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubles updating to CUDA 8.0RC #306

Closed
3DTOPO opened this issue Aug 10, 2016 · 14 comments
Closed

Troubles updating to CUDA 8.0RC #306

3DTOPO opened this issue Aug 10, 2016 · 14 comments

Comments

@3DTOPO
Copy link

3DTOPO commented Aug 10, 2016

I received a new Pascal Titan X today and haven't been able to get things running.

I installed the video driver, then CUDA 8.0RC and got a Torch error trying to run neural-style. So I reinstalled its dependencies, cloned the latest version of Torch, installed it and then got the error:

/home/prime/torch/install/share/lua/5.1/trepl/init.lua:384: module 'loadcaffe' not found:No LuaRocks module found for loadcaffe

Then I reinstalled loadcaffe but I am still getting the same error as above. Any suggestions?

@jcjohnson
Copy link
Owner

I'm running a Pascal Titan X on a fresh install of Ubuntu 16.04 with the CUDA 8.0 RC and I didn't have any problems like that, so I don't think it has to do with the new GPU or the new version of CUDA.

Is it possible that you have multiple torch installs floating around, and that luarocks installed loadcaffe into one torch install, while th points at a different torch install? Check which th and which luarocks and make sure they point to binaries in the same directory.

@3DTOPO
Copy link
Author

3DTOPO commented Aug 10, 2016

Thanks for the info. I just checked and they are both in the same directory:

/home/neural/torch/install/bin/th
/home/neural/torch/install/bin/luarocks

Hmm guess I will have to to a clean install of Ubuntu. When you installed yours, just curious if anything deviated from the instructions?

Just curious; how does is the performance compared to your last Titan X?

@jcjohnson
Copy link
Owner

Ubuntu 16.04 ships with a newer version of gcc than the CUDA installer expects, so you need to hack it a bit during CUDA installation:

https://www.pugetsystems.com/labs/hpc/NVIDIA-CUDA-with-Ubuntu-16-04-beta-on-a-laptop-if-you-just-cannot-wait-775/

Apart from that I didn't have to do anything out of the ordinary to get it installed.

Funny you should ask about speed; I just updated the neural-style benchmarks:

https://github.com/jcjohnson/neural-style#speed

and I did more expensive benchmarking of various CNNs here:

https://github.com/jcjohnson/cnn-benchmarks

Overall the new Titan X is about 1.5x faster than the old one!

@3DTOPO
Copy link
Author

3DTOPO commented Aug 10, 2016

Ubuntu 16.04 ships with a newer version of gcc than the CUDA installer expects, so you need to hack it a bit during CUDA installation

Note that apparently they released a patch on August 8th to support gcc 5.4:

"This patch supports GCC 5.4 as one of the host compilers to use with CUDA 8 RC."

Overall the new Titan X is about 1.5x faster than the old one!

50% faster is definitely an improvement - but I guess I am a bit disappointed after buying into their marketing hype that deep learning could be accelerated 10-fold with the Pascal architecture. Thats what I get for believing marketing I guess!

;)

@jcjohnson
Copy link
Owner

The 10x was absolutely a stretch - they arrived at this number by factoring in a 2x boost from moving to FP16 (which is supported on the P100 but not the Titan X) and also a multiplier due to better multi-GPU parallelism due to NVLink (which might help give more efficient data parallelism during training, but doesn't help neural-style). Overall I'm quite happy with a 1.5x boost; also note that cuDNN 5 is quite a bit more efficient than cuDNN 4 (thanks to winograd convolutions maybe?), so a Pascal Titan X with cuDNN 5 is almost 2x faster than a Maxwell Titan X with cuDNN 4.

Were you able to get your new card up and running?

@nthverse
Copy link

Hey Jcj, how large of a picture can you process now with the 12GB?

@3DTOPO
Copy link
Author

3DTOPO commented Aug 11, 2016

Ah, I didn't know they crippled that feature. So if its 1.5 times faster now, another 2x would have made it roughly 3x faster? That would have been really nice! I guess the good news with that is time on a Pascal Quadro would be that much faster.

Dumb question. Does Torch utilize cuDNN?

Yes; I got it running on Ubuntu 16.04. Instead of using the hack you referenced, I used the --override flag to disable the GCC version check for the CUDA 8.0 install. Then I installed the patch I referenced above.

@3DTOPO 3DTOPO closed this as completed Aug 11, 2016
@jcjohnson
Copy link
Owner

Fast PF16 on Pascal requires special hardware that for now is only present in the Tesla P100, so it's not like the new Titan X is artificially limited here.

Torch has cuDNN support through the cudnn.torch package. You can enable cuDNN in neural-style with the flag -backend cudnn; you should probably also add the -cudnn_autotune flag.

@3DTOPO
Copy link
Author

3DTOPO commented Aug 11, 2016

Torch has cuDNN support through the cudnn.torch package. You can enable cuDNN in neural-style with the flag -backend cudnn; you should probably also add the -cudnn_autotune flag.

Oh yeah duh, sorry I'm a bit rusty - I haven't used it without my shell scripts that add those flags for me in a while!

@htoyryla
Copy link

htoyryla commented Aug 13, 2016

Has anyone tried using plain cunn with CUDA 8.0RC on a Pascal card? Just installed my 1070 on a fresh Ubuntu 14.04 yesterday (only four hours of struggling... mostly before I found out that 7.5 will not work, then the rest was getting around the login loop --as I want to use the onboard intel adapter for display and nvidia for cuda only.)

Now, torch/cudnn/neural-style work fine while leaving cudnn out results in an error. Also cunn.test() gives similar errors. I probably should post on the torch.cunn project page but just wanted to check first if cunn works for you others?

Update: I have now reported this as torch/cunn#321 because as far as I can see, my installation is ok otherwise and the problem consistently occurs with cunn.test().

@3DTOPO
Copy link
Author

3DTOPO commented Aug 14, 2016

@jcjohnson do you have your older Titan X working in the same machine as your new Pascal X?

When I boot up, the "GeForce GTX" green name lights up on both cards (last ten Titan X and Pascal Titan X), but as soon as I run nvidia-smi, the light turns off the older (non-Pascal) Titan X. I tried reinstalling NVIDIA-Linux-x86_64-367.35.run and the same thing happens. Any ideas?

@jcjohnson
Copy link
Owner

@htoyryla I've used plain cunn with CUDA 8.0 RC on both the Pascal Titan X and on the 1080 without any problems.

@3DTOPO My Pascal X is in a new machine where it is the only card; unfortunately I don't have access to my Maxwell X at the moment to see if I have the same issue as you. However I'll eventually want to run them both in the same machine, so I'd love to know of any workaround you find to make that work.

@3DTOPO
Copy link
Author

3DTOPO commented Aug 15, 2016

@htoyryla same as jcjohnson - both are working here.

@jcjohnson I posted a question over at the nvidia forms, I will let you know what I learn.

@htoyryla
Copy link

Cunn now works for me too, after I re-installed from nn and further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants