Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault during IQ3_XS generation. #6597

Closed
schmorp opened this issue Apr 10, 2024 · 1 comment
Closed

Segmentation fault during IQ3_XS generation. #6597

schmorp opened this issue Apr 10, 2024 · 1 comment

Comments

@schmorp
Copy link

schmorp commented Apr 10, 2024

While quantizing this model:

https://huggingface.co/ibivibiv/hydra-moe-120b

Using the imatrix in this repository: https://huggingface.co/mradermacher/hydra-moe-120b-i1-GGUF

quantize segfaulted after successfully generating other quants (such as Q6_K and IQ3_XXS):

A GDB session on the corefile is below, tell me if you want some more information:

[ 144/1143] blk.7.ffn_down.1.weight - [20480, 7168, 1, 1], type = f32, converting to iq3_s .. size = 560.00 MiB -> 60.16 MiB
[ 145/1143] blk.7.ffn_up.1.weight - [ 7168, 20480, 1, 1], type = f32, converting to iq3_xxs .. size = 560.00 MiB -> 53.59 MiB
[ 146/1143] blk.7.ffn_gate.2.weight - [ 7168, 20480, 1, 1], type = f32, converting to iq3_xxs .. size = 560.00 MiB -> 53.59 MiB
[ 147/1143] blk.7.ffn_down.2.weight - [20480, 7168, 1, 1], type = f32, converting to iq3_s .. size = 560.00 MiB -> 60.16 MiB
[ 148/1143] blk.7.ffn_up.2.weight - [ 7168, 20480, 1, 1], type = f32, converting to iq3_xxs .. size = 560.00 MiB -> 53.59 MiB
[ 149/1143] blk.7.ffn_gate.3.weight - [ 7168, 20480, 1, 1], type = f32, converting to iq3_xxs .. size = 560.00 MiB -> 53.59 MiB
[ 150/1143] blk.7.ffn_down.3.weight - [20480, 7168, 1, 1], type = f32, converting to iq3_s .. size = 560.00 MiB -> 60.16 MiB
[ 151/1143] blk.7.ffn_up.3.weight - [ 7168, 20480, 1, 1], type = f32, converting to iq3_xxs .. /root/s2/quantize: line 166: 348502 Segmentation fault (core dumped) "$QUANTIZE" --allow-requantize $IMATRIX "$srcgguf" ./"$OUT.
-> $HOSTNAME~" "$qmethod"
[Exit 139 (SEGV)]
kaos /tmp# gdb llama.cpp/build/bin/quantize core
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from llama.cpp/build/bin/quantize...
[New LWP 348502]
[...]
[New LWP 348512]

This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.debian.net
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for /lib/x86_64-linux-gnu/libblas.so.3
Downloading separate debug info for system-supplied DSO at 0x7fff319f5000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `llama.cpp/build/bin/quantize --allow-requantize --imatrix hydra-moe-1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 quantize_row_iq3_xxs_impl.constprop.1 (x=, vy=, n=, quant_weights=0x558f469c74c0, grid_size=256) at llama.cpp/ggml-quants.c:11414
11414 int grid_index = kmap_q3xs[u];
[Current thread is 1 (Thread 0x7fb22bae7740 (LWP 348502))]
(gdb) bt
#0 quantize_row_iq3_xxs_impl.constprop.1 (x=, vy=, n=, quant_weights=0x558f469c74c0, grid_size=256) at llama.cpp/ggml-quants.c:11414
#1 0x0000558f44c1d6b3 in quantize_iq3_xxs (quant_weights=0x558f469c74c0, n_per_row=, nrow=, dst=, src=) at llama.cpp/ggml-quants.c:11463
#2 ggml_quantize_chunk (type=GGML_TYPE_IQ3_XXS, src=0x7f56556bc060, dst=0x7f479d7ff010, start=0, nrows=3, n_per_row=, imatrix=0x558f469c74c0) at llama.cpp/ggml.c:20367
#3 0x0000558f44bfd856 in operator() (__closure=) at llama.cpp/llama.cpp:13381
#4 llama_tensor_quantize_internal (nthread=8, workers=std::vector of length 7, capacity 8 = {...}, imatrix=, n_per_row=7168, nrows=20480, chunk_size=21504, new_data=0x7f479d7ff010,
f32_data=0x7f56556bc060, new_type=) at llama.cpp/llama.cpp:13387
#5 llama_model_quantize_internal (fname_inp="./hydra-moe-120b.gguf", fname_out="./hydra-moe-120b-i1-GGUF/hydra-moe-120b.i1-IQ3_XS.gguf.kaos~", params=params@entry=0x7fff319426c0)
at llama.cpp/llama.cpp:13698
#6 0x0000558f44bec19b in llama_model_quantize (params=0x7fff319426c0, fname_out=, fname_inp=0x558f45909c40 "./hydra-moe-120b.gguf") at llama.cpp/llama.cpp:14697
#7 main (argc=, argv=) at llama.cpp/examples/quantize/quantize.cpp:403
(gdb)
#0 quantize_row_iq3_xxs_impl.constprop.1 (x=, vy=, n=, quant_weights=0x558f469c74c0, grid_size=256) at llama.cpp/ggml-quants.c:11414
#1 0x0000558f44c1d6b3 in quantize_iq3_xxs (quant_weights=0x558f469c74c0, n_per_row=, nrow=, dst=, src=) at llama.cpp/ggml-quants.c:11463
#2 ggml_quantize_chunk (type=GGML_TYPE_IQ3_XXS, src=0x7f56556bc060, dst=0x7f479d7ff010, start=0, nrows=3, n_per_row=, imatrix=0x558f469c74c0) at llama.cpp/ggml.c:20367
#3 0x0000558f44bfd856 in operator() (__closure=) at llama.cpp/llama.cpp:13381
#4 llama_tensor_quantize_internal (nthread=8, workers=std::vector of length 7, capacity 8 = {...}, imatrix=, n_per_row=7168, nrows=20480, chunk_size=21504, new_data=0x7f479d7ff010,
f32_data=0x7f56556bc060, new_type=) at llama.cpp/llama.cpp:13387
#5 llama_model_quantize_internal (fname_inp="./hydra-moe-120b.gguf", fname_out="./hydra-moe-120b-i1-GGUF/hydra-moe-120b.i1-IQ3_XS.gguf.kaos~", params=params@entry=0x7fff319426c0)
at llama.cpp/llama.cpp:13698
#6 0x0000558f44bec19b in llama_model_quantize (params=0x7fff319426c0, fname_out=, fname_inp=0x558f45909c40 "./hydra-moe-120b.gguf") at llama.cpp/llama.cpp:14697
#7 main (argc=, argv=) at llama.cpp/examples/quantize/quantize.cpp:403
(gdb)
(gdb) inf thr
Id Target Id Frame

  • 1 Thread 0x7fb22bae7740 (LWP 348502) quantize_row_iq3_xxs_impl.constprop.1 (x=, vy=, n=, quant_weights=0x558f469c74c0, grid_size=256)
    at llama.cpp/ggml-quants.c:11414
    2 Thread 0x7fb2247fc6c0 (LWP 348514) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ede0 <thread_status+480>) at ./nptl/futex-internal.c:57
    3 Thread 0x7f479a7fc6c0 (LWP 356080) 0x0000558f44c1a159 in iq3_find_best_neighbour(const uint16_t * restrict, const uint32_t * restrict, const float * restrict, const float * restrict, float, int8_t * restrict) (neighbours=, grid=0x7f478c005390, xval=, weight=, scale=0.00249774638, L=0x7f479a7ec790 "") at llama.cpp/ggml-quants.c:11243
    4 Thread 0x7fb225ffd6c0 (LWP 348513) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ed60 <thread_status+352>) at ./nptl/futex-internal.c:57
    5 Thread 0x7fb228fff6c0 (LWP 348511) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ec60 <thread_status+96>) at ./nptl/futex-internal.c:57
    6 Thread 0x7f47977fa6c0 (LWP 356082) iq3_find_best_neighbour(const uint16_t * restrict, const uint32_t * restrict, const float * restrict, const float * restrict, float, int8_t * restrict) (neighbours=, grid=0x7f478c005390, xval=, weight=, scale=0.00154302374, L=0x7f47977ea790 "\002") at llama.cpp/ggml-quants.c:11232
    7 Thread 0x7fb20fff96c0 (LWP 348518) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ef60 <thread_status+864>) at ./nptl/futex-internal.c:57
    8 Thread 0x7f4795ff96c0 (LWP 356077) clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
    9 Thread 0x7fb2117fa6c0 (LWP 348517) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60eee0 <thread_status+736>) at ./nptl/futex-internal.c:57
    10 Thread 0x7f47947f86c0 (LWP 356083) nearest_int (fval=) at llama.cpp/ggml-quants.c:1316
    11 Thread 0x7f479d7fe6c0 (LWP 356078) clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
    12 Thread 0x7f479bffd6c0 (LWP 356079) quantize_row_iq3_xxs_impl.constprop.1 (x=, vy=, n=, quant_weights=0x558f469c74c0, grid_size=256)
    at llama.cpp/ggml-quants.c:11367
    13 Thread 0x7fb212ffb6c0 (LWP 348516) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ee60 <thread_status+608>) at ./nptl/futex-internal.c:57
    14 Thread 0x7f4798ffb6c0 (LWP 356081) nearest_int (fval=0.0372944102) at llama.cpp/ggml-quants.c:1316
    15 Thread 0x7fb2277fe6c0 (LWP 348512) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ece0 <thread_status+224>) at ./nptl/futex-internal.c:57
    (gdb) l
    11409 for (int k = 0; k < 4; ++k) block_signs[k] = (~block_signs[k]) & 127;
    11410 }
    11411 for (int k = 0; k < 8; ++k) {
    11412 uint16_t u = 0;
    11413 for (int i = 0; i < 4; ++i) u |= (L[4k+i] << 3i);
    11414 int grid_index = kmap_q3xs[u];
    11415 if (grid_index < 0) {
    11416 printf("Oops: found point %u not on grid:", u);
    11417 for (int i = 0; i < 4; ++i) printf(" %d", L[4*k+i]);
    11418 printf("\n");
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant