-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
70b models on rocm endless prompt processing / gpu errors #3896
Comments
Weird question, does |
I guess it works, but I guess prompt processing also gets slower with more layers on the gpu. --n-gpu-layers 13 --memory-f32
--n-gpu-layers 1 --memory-f32
--n-gpu-layers 0 --memory-f32
|
I've had some odd issues with 70B models and long prompts also, that are similar to what you were talking about. When I tried I'm not completely sure this fixes it since I haven't done extensive testing but disabling "Resizable BAR" support and "Over 4GB PCI" support in my BIOS seemed to make a difference. If you're comfortable messing around with BIOS settings, you could possibly try that. |
Today I'm actually not sure if this may not be just a symptom of filling up the ram. Yesterday I tried koboldcpp-rocm with |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
I've had the same issues with the koboldcpp-rocm fork, so I tried llama.cpp if it's better, but the same happens here.
With koboldcpp and clblast, prompt processing takes a few seconds, and I get around 0.8-1 token/s on 70b models.
Current Behavior
I run a small model with
llama.cpp-server -m mistral-7b-instruct-v0.1.Q4_0.gguf --n-gpu-layers 15
, go to the web UI and type "Tell me a joke". Processing the prompt is pretty much instant, and I get lots of tokens/s as expected.Then I run a 70b model like
llama.cpp-server -m euryale-1.3-l2-70b.Q5_K_M.gguf --n-gpu-layers 15
(with koboldcpp-rocm I tried a few different 70b models and none worked). It loads fine, resources look good, 13403/16247 mb vram used, ram seems good too (trying zram right now, so exact usage isn't very meaningful, but I know it fits into my 64 gb). On the web ui the same prompt "Tell me a joke" is now processing at least several minutes (never waited long enough for it to start generating). CPU usage is high ~1300% the entire time. htop is super slow for some reason.After a while I get
Memory access fault by GPU node-1 (Agent handle: 0x55c2bca3a160) on address 0x7fe03c665000. Reason: Page not present or supervisor privilege.
andEnvironment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
$ lscpu
$ uname -a
6.5.9-arch2-1
Archlinux with rocm-core 5.6.1
Radeon 6900 XT
The text was updated successfully, but these errors were encountered: