Troubles updating to CUDA 8.0RC #306

3DTOPO · 2016-08-10T03:33:00Z

I received a new Pascal Titan X today and haven't been able to get things running.

I installed the video driver, then CUDA 8.0RC and got a Torch error trying to run neural-style. So I reinstalled its dependencies, cloned the latest version of Torch, installed it and then got the error:

/home/prime/torch/install/share/lua/5.1/trepl/init.lua:384: module 'loadcaffe' not found:No LuaRocks module found for loadcaffe

Then I reinstalled loadcaffe but I am still getting the same error as above. Any suggestions?

jcjohnson · 2016-08-10T04:42:35Z

I'm running a Pascal Titan X on a fresh install of Ubuntu 16.04 with the CUDA 8.0 RC and I didn't have any problems like that, so I don't think it has to do with the new GPU or the new version of CUDA.

Is it possible that you have multiple torch installs floating around, and that luarocks installed loadcaffe into one torch install, while th points at a different torch install? Check which th and which luarocks and make sure they point to binaries in the same directory.

3DTOPO · 2016-08-10T04:46:36Z

Thanks for the info. I just checked and they are both in the same directory:

/home/neural/torch/install/bin/th
/home/neural/torch/install/bin/luarocks

Hmm guess I will have to to a clean install of Ubuntu. When you installed yours, just curious if anything deviated from the instructions?

Just curious; how does is the performance compared to your last Titan X?

jcjohnson · 2016-08-10T04:58:32Z

Ubuntu 16.04 ships with a newer version of gcc than the CUDA installer expects, so you need to hack it a bit during CUDA installation:

https://www.pugetsystems.com/labs/hpc/NVIDIA-CUDA-with-Ubuntu-16-04-beta-on-a-laptop-if-you-just-cannot-wait-775/

Apart from that I didn't have to do anything out of the ordinary to get it installed.

Funny you should ask about speed; I just updated the neural-style benchmarks:

https://github.com/jcjohnson/neural-style#speed

and I did more expensive benchmarking of various CNNs here:

https://github.com/jcjohnson/cnn-benchmarks

Overall the new Titan X is about 1.5x faster than the old one!

3DTOPO · 2016-08-10T19:35:10Z

Ubuntu 16.04 ships with a newer version of gcc than the CUDA installer expects, so you need to hack it a bit during CUDA installation

Note that apparently they released a patch on August 8th to support gcc 5.4:

"This patch supports GCC 5.4 as one of the host compilers to use with CUDA 8 RC."

Overall the new Titan X is about 1.5x faster than the old one!

50% faster is definitely an improvement - but I guess I am a bit disappointed after buying into their marketing hype that deep learning could be accelerated 10-fold with the Pascal architecture. Thats what I get for believing marketing I guess!

;)

jcjohnson · 2016-08-11T03:51:55Z

The 10x was absolutely a stretch - they arrived at this number by factoring in a 2x boost from moving to FP16 (which is supported on the P100 but not the Titan X) and also a multiplier due to better multi-GPU parallelism due to NVLink (which might help give more efficient data parallelism during training, but doesn't help neural-style). Overall I'm quite happy with a 1.5x boost; also note that cuDNN 5 is quite a bit more efficient than cuDNN 4 (thanks to winograd convolutions maybe?), so a Pascal Titan X with cuDNN 5 is almost 2x faster than a Maxwell Titan X with cuDNN 4.

Were you able to get your new card up and running?

nthverse · 2016-08-11T08:20:17Z

Hey Jcj, how large of a picture can you process now with the 12GB?

3DTOPO · 2016-08-11T20:26:10Z

Ah, I didn't know they crippled that feature. So if its 1.5 times faster now, another 2x would have made it roughly 3x faster? That would have been really nice! I guess the good news with that is time on a Pascal Quadro would be that much faster.

Dumb question. Does Torch utilize cuDNN?

Yes; I got it running on Ubuntu 16.04. Instead of using the hack you referenced, I used the --override flag to disable the GCC version check for the CUDA 8.0 install. Then I installed the patch I referenced above.

jcjohnson · 2016-08-11T21:23:28Z

Fast PF16 on Pascal requires special hardware that for now is only present in the Tesla P100, so it's not like the new Titan X is artificially limited here.

Torch has cuDNN support through the cudnn.torch package. You can enable cuDNN in neural-style with the flag -backend cudnn; you should probably also add the -cudnn_autotune flag.

3DTOPO · 2016-08-11T21:33:06Z

Torch has cuDNN support through the cudnn.torch package. You can enable cuDNN in neural-style with the flag -backend cudnn; you should probably also add the -cudnn_autotune flag.

Oh yeah duh, sorry I'm a bit rusty - I haven't used it without my shell scripts that add those flags for me in a while!

htoyryla · 2016-08-13T15:14:06Z

Has anyone tried using plain cunn with CUDA 8.0RC on a Pascal card? Just installed my 1070 on a fresh Ubuntu 14.04 yesterday (only four hours of struggling... mostly before I found out that 7.5 will not work, then the rest was getting around the login loop --as I want to use the onboard intel adapter for display and nvidia for cuda only.)

Now, torch/cudnn/neural-style work fine while leaving cudnn out results in an error. Also cunn.test() gives similar errors. I probably should post on the torch.cunn project page but just wanted to check first if cunn works for you others?

Update: I have now reported this as torch/cunn#321 because as far as I can see, my installation is ok otherwise and the problem consistently occurs with cunn.test().

3DTOPO · 2016-08-14T23:56:41Z

@jcjohnson do you have your older Titan X working in the same machine as your new Pascal X?

When I boot up, the "GeForce GTX" green name lights up on both cards (last ten Titan X and Pascal Titan X), but as soon as I run nvidia-smi, the light turns off the older (non-Pascal) Titan X. I tried reinstalling NVIDIA-Linux-x86_64-367.35.run and the same thing happens. Any ideas?

jcjohnson · 2016-08-15T01:27:20Z

@htoyryla I've used plain cunn with CUDA 8.0 RC on both the Pascal Titan X and on the 1080 without any problems.

@3DTOPO My Pascal X is in a new machine where it is the only card; unfortunately I don't have access to my Maxwell X at the moment to see if I have the same issue as you. However I'll eventually want to run them both in the same machine, so I'd love to know of any workaround you find to make that work.

3DTOPO · 2016-08-15T02:20:04Z

@htoyryla same as jcjohnson - both are working here.

@jcjohnson I posted a question over at the nvidia forms, I will let you know what I learn.

htoyryla · 2016-08-15T06:05:14Z

Cunn now works for me too, after I re-installed from nn and further.

3DTOPO closed this as completed Aug 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubles updating to CUDA 8.0RC #306

Troubles updating to CUDA 8.0RC #306

3DTOPO commented Aug 10, 2016

jcjohnson commented Aug 10, 2016

3DTOPO commented Aug 10, 2016

jcjohnson commented Aug 10, 2016

3DTOPO commented Aug 10, 2016

jcjohnson commented Aug 11, 2016

nthverse commented Aug 11, 2016

3DTOPO commented Aug 11, 2016

jcjohnson commented Aug 11, 2016

3DTOPO commented Aug 11, 2016

htoyryla commented Aug 13, 2016 •

edited

Loading

3DTOPO commented Aug 14, 2016

jcjohnson commented Aug 15, 2016

3DTOPO commented Aug 15, 2016

htoyryla commented Aug 15, 2016

Troubles updating to CUDA 8.0RC #306

Troubles updating to CUDA 8.0RC #306

Comments

3DTOPO commented Aug 10, 2016

jcjohnson commented Aug 10, 2016

3DTOPO commented Aug 10, 2016

jcjohnson commented Aug 10, 2016

3DTOPO commented Aug 10, 2016

jcjohnson commented Aug 11, 2016

nthverse commented Aug 11, 2016

3DTOPO commented Aug 11, 2016

jcjohnson commented Aug 11, 2016

3DTOPO commented Aug 11, 2016

htoyryla commented Aug 13, 2016 • edited Loading

3DTOPO commented Aug 14, 2016

jcjohnson commented Aug 15, 2016

3DTOPO commented Aug 15, 2016

htoyryla commented Aug 15, 2016

htoyryla commented Aug 13, 2016 •

edited

Loading