Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] dequantize_row_q4_0 segfaults #791

Closed
sha0coder opened this issue Apr 5, 2023 · 5 comments
Closed

[Bug] dequantize_row_q4_0 segfaults #791

sha0coder opened this issue Apr 5, 2023 · 5 comments

Comments

@sha0coder
Copy link

Environment and Context

Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
g++ (Debian 10.2.1-6) 10.2.1 20210110
GNU Make 4.3

Failure Information (for bugs)

main segfaults at dequantize_row_q4_0+48

Steps to Reproduce

./main -m models/ggml-vocab-q4_0.bin

~/s/llama.cpp ❯❯❯ gdb main
(gdb) r -m models/ggml-vocab-q4_0.bin
Starting program: /home/sha0/soft/llama.cpp/main -m models/ggml-vocab-q4_0.bin
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
main: seed = 1680724006
llama_model_load: loading model from 'models/ggml-vocab-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 0.41 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 1792.49 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'models/ggml-vocab-q4_0.bin'
llama_model_load: model size = 0.00 MB / num tensors = 0
llama_model_load: WARN no tensors loaded from model file - assuming empty model for testing
llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0

[New Thread 0x7fff77560700 (LWP 142639)]
[New Thread 0x7fff76d5f700 (LWP 142640)]
[New Thread 0x7fff7655e700 (LWP 142641)]
[New Thread 0x7fff75d5d700 (LWP 142642)]
[New Thread 0x7fff7555c700 (LWP 142643)]
[New Thread 0x7fff74d5b700 (LWP 142644)]
[New Thread 0x7fff7455a700 (LWP 142645)]
[New Thread 0x7fff73d59700 (LWP 142646)]
[New Thread 0x7fff73558700 (LWP 142647)]
[New Thread 0x7fff72d57700 (LWP 142648)]
[New Thread 0x7fff72556700 (LWP 142649)]
[New Thread 0x7fff71d55700 (LWP 142650)]
[New Thread 0x7fff71554700 (LWP 142651)]
[New Thread 0x7fff70d53700 (LWP 142652)]
[New Thread 0x7fff70552700 (LWP 142653)]

Thread 1 "main" received signal SIGSEGV, Segmentation fault.
0x000055555555e430 in dequantize_row_q4_0 ()
(gdb) bt
#0 0x000055555555e430 in dequantize_row_q4_0 ()
#1 0x0000555555567585 in ggml_compute_forward_get_rows ()
#2 0x000055555556fba3 in ggml_graph_compute ()
#3 0x0000555555578eca in llama_eval_internal(llama_context&, int const*, int, int, int) ()
#4 0x000055555557919f in llama_eval ()
#5 0x000055555555c1aa in main ()
(gdb) x/i $pc
=> 0x55555555e430 <dequantize_row_q4_0+48>: vpmovzxbw 0x4(%rdi),%ymm1
(gdb) i r rdi
rdi 0xa00 2560
(gdb) i r ymm1
ymm1 {v16_bfloat16 = {0x180, 0x0, 0x0, 0x0, 0x180, 0x0 <repeats 11 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xc0, 0x43, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x43, 0x0 <repeats 22 times>}, v16_int16 = {0x43c0, 0x0, 0x0, 0x0, 0x43c0, 0x0 <repeats 11 times>}, v8_int32 = {0x43c0, 0x0, 0x43c0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x43c0, 0x43c0, 0x0, 0x0}, v2_int128 = {0x43c000000000000043c0, 0x0}}
(gdb)

@sha0coder
Copy link
Author

Thread 1 "main" received signal SIGSEGV, Segmentation fault.
dequantize_row_q4_0 (vx=, y=, k=k@entry=4096) at ggml.c:987
987 __m256i vx8 = bytesFromNibbles(pp+l/2);

and without AVX2 the crash is here:
Thread 1 "main" received signal SIGSEGV, Segmentation fault.
dequantize_row_q4_0 (vx=, y=, k=k@entry=4096) at ggml.c:1067
1067 const float d = x[i].d;

image

@sha0coder
Copy link
Author

The vx pointer has wrong value:

image

@slaren
Copy link
Collaborator

slaren commented Apr 5, 2023

llama_model_load: WARN no tensors loaded from model file - assuming empty model for testing

You cannot eval with a vocab only model.

@sha0coder
Copy link
Author

where can I get a proper model?

@slaren
Copy link
Collaborator

slaren commented Apr 5, 2023

I cannot help you with that, but there are some details in the official repository: https://github.com/facebookresearch/llama/

@slaren slaren closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants