-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Troubles updating to CUDA 8.0RC #306
Comments
I'm running a Pascal Titan X on a fresh install of Ubuntu 16.04 with the CUDA 8.0 RC and I didn't have any problems like that, so I don't think it has to do with the new GPU or the new version of CUDA. Is it possible that you have multiple torch installs floating around, and that luarocks installed loadcaffe into one torch install, while th points at a different torch install? Check |
Thanks for the info. I just checked and they are both in the same directory: /home/neural/torch/install/bin/th Hmm guess I will have to to a clean install of Ubuntu. When you installed yours, just curious if anything deviated from the instructions? Just curious; how does is the performance compared to your last Titan X? |
Ubuntu 16.04 ships with a newer version of gcc than the CUDA installer expects, so you need to hack it a bit during CUDA installation: Apart from that I didn't have to do anything out of the ordinary to get it installed. Funny you should ask about speed; I just updated the neural-style benchmarks: https://github.com/jcjohnson/neural-style#speed and I did more expensive benchmarking of various CNNs here: https://github.com/jcjohnson/cnn-benchmarks Overall the new Titan X is about 1.5x faster than the old one! |
Note that apparently they released a patch on August 8th to support gcc 5.4: "This patch supports GCC 5.4 as one of the host compilers to use with CUDA 8 RC."
50% faster is definitely an improvement - but I guess I am a bit disappointed after buying into their marketing hype that deep learning could be accelerated 10-fold with the Pascal architecture. Thats what I get for believing marketing I guess! ;) |
The 10x was absolutely a stretch - they arrived at this number by factoring in a 2x boost from moving to FP16 (which is supported on the P100 but not the Titan X) and also a multiplier due to better multi-GPU parallelism due to NVLink (which might help give more efficient data parallelism during training, but doesn't help neural-style). Overall I'm quite happy with a 1.5x boost; also note that cuDNN 5 is quite a bit more efficient than cuDNN 4 (thanks to winograd convolutions maybe?), so a Pascal Titan X with cuDNN 5 is almost 2x faster than a Maxwell Titan X with cuDNN 4. Were you able to get your new card up and running? |
Hey Jcj, how large of a picture can you process now with the 12GB? |
Ah, I didn't know they crippled that feature. So if its 1.5 times faster now, another 2x would have made it roughly 3x faster? That would have been really nice! I guess the good news with that is time on a Pascal Quadro would be that much faster. Dumb question. Does Torch utilize cuDNN? Yes; I got it running on Ubuntu 16.04. Instead of using the hack you referenced, I used the --override flag to disable the GCC version check for the CUDA 8.0 install. Then I installed the patch I referenced above. |
Fast PF16 on Pascal requires special hardware that for now is only present in the Tesla P100, so it's not like the new Titan X is artificially limited here. Torch has cuDNN support through the cudnn.torch package. You can enable cuDNN in neural-style with the flag |
Oh yeah duh, sorry I'm a bit rusty - I haven't used it without my shell scripts that add those flags for me in a while! |
Has anyone tried using plain cunn with CUDA 8.0RC on a Pascal card? Just installed my 1070 on a fresh Ubuntu 14.04 yesterday (only four hours of struggling... mostly before I found out that 7.5 will not work, then the rest was getting around the login loop --as I want to use the onboard intel adapter for display and nvidia for cuda only.) Now, torch/cudnn/neural-style work fine while leaving cudnn out results in an error. Also cunn.test() gives similar errors. I probably should post on the torch.cunn project page but just wanted to check first if cunn works for you others? Update: I have now reported this as torch/cunn#321 because as far as I can see, my installation is ok otherwise and the problem consistently occurs with cunn.test(). |
@jcjohnson do you have your older Titan X working in the same machine as your new Pascal X? When I boot up, the "GeForce GTX" green name lights up on both cards (last ten Titan X and Pascal Titan X), but as soon as I run nvidia-smi, the light turns off the older (non-Pascal) Titan X. I tried reinstalling NVIDIA-Linux-x86_64-367.35.run and the same thing happens. Any ideas? |
@htoyryla I've used plain cunn with CUDA 8.0 RC on both the Pascal Titan X and on the 1080 without any problems. @3DTOPO My Pascal X is in a new machine where it is the only card; unfortunately I don't have access to my Maxwell X at the moment to see if I have the same issue as you. However I'll eventually want to run them both in the same machine, so I'd love to know of any workaround you find to make that work. |
@htoyryla same as jcjohnson - both are working here. @jcjohnson I posted a question over at the nvidia forms, I will let you know what I learn. |
Cunn now works for me too, after I re-installed from nn and further. |
I received a new Pascal Titan X today and haven't been able to get things running.
I installed the video driver, then CUDA 8.0RC and got a Torch error trying to run neural-style. So I reinstalled its dependencies, cloned the latest version of Torch, installed it and then got the error:
Then I reinstalled loadcaffe but I am still getting the same error as above. Any suggestions?
The text was updated successfully, but these errors were encountered: