Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[User] error: inlining failed in call to 'always_inline' 'vfmaq_f16': target specific option mismatch #1655

Closed
Kangmo opened this issue May 31, 2023 · 5 comments
Labels

Comments

@Kangmo
Copy link

Kangmo commented May 31, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [v ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [v ] I carefully followed the README.md.
  • [v ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [v ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

compilation should succeed.

Current Behavior

[comp](error: inlining failed in call to 'always_inline' 'vfmaq_f16': target specific option mismatch)

Environment and Context

  • Physical (or virtual) hardware you are using, e.g. for Linux:

Host PC : Mac M1 Max, Ventura 13.4
Docker container running in arm64v8/ubuntu:22.10

$ lscpu
lscpu
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: 0x00
Model name: -
Model: 0
Thread(s) per core: 1
Core(s) per cluster: 8
Socket(s): -
Cluster(s): 1
Stepping: 0x0
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm
jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ss
bs sb paca pacg dcpodp flagm2 frint
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Spec store bypass: Vulnerable
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Not affected
Srbds: Not affected
Tsx async abort: Not affected

  • Operating System, e.g. for Linux:

$ uname -a

Linux 6cd3c73970c8 5.15.49-linuxkit #1 SMP PREEMPT Tue Sep 13 07:51:32 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

  • SDK version, e.g. for Linux:
$ python3 --version
Python 3.10.7

$ make --version
GNU Make 4.3
Built for aarch64-unknown-linux-gnu

$ g++ --version
g++ (Ubuntu 12.2.0-3ubuntu1) 12.2.0

Failure Information (for bugs)

compilation failed.

Steps to Reproduce

just run make

Failure Logs

# make
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  aarch64
I UNAME_M:  aarch64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -mcpu=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -mcpu=native
I LDFLAGS:  
I CC:       cc (Ubuntu 12.2.0-3ubuntu1) 12.2.0
I CXX:      g++ (Ubuntu 12.2.0-3ubuntu1) 12.2.0

cc  -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -mcpu=native   -c ggml.c -o ggml.o
ggml.c: In function 'ggml_graph_export_leaf':
ggml.c:14584:39: warning: format '%lld' expects argument of type 'long long int', but argument 6 has type 'int64_t' {aka 'long int'} [-Wformat=]
14584 |     fprintf(fout, "%-6s %-12s %8d %8lld %8lld %8lld %8lld %16zu %16zu %16zu %16zu %16p %16s\n",
      |                                   ~~~~^
      |                                       |
      |                                       long long int
      |                                   %8ld
......
14588 |             ne[0], ne[1], ne[2], ne[3],
      |             ~~~~~                      
      |               |
      |               int64_t {aka long int}
ggml.c:14584:45: warning: format '%lld' expects argument of type 'long long int', but argument 7 has type 'int64_t' {aka 'long int'} [-Wformat=]
14584 |     fprintf(fout, "%-6s %-12s %8d %8lld %8lld %8lld %8lld %16zu %16zu %16zu %16zu %16p %16s\n",
      |                                         ~~~~^
      |                                             |
      |                                             long long int
      |                                         %8ld
......
14588 |             ne[0], ne[1], ne[2], ne[3],
      |                    ~~~~~                     
      |                      |
      |                      int64_t {aka long int}
ggml.c:14584:51: warning: format '%lld' expects argument of type 'long long int', but argument 8 has type 'int64_t' {aka 'long int'} [-Wformat=]
14584 |     fprintf(fout, "%-6s %-12s %8d %8lld %8lld %8lld %8lld %16zu %16zu %16zu %16zu %16p %16s\n",
      |                                               ~~~~^
      |                                                   |
      |                                                   long long int
      |                                               %8ld
......
14588 |             ne[0], ne[1], ne[2], ne[3],
      |                           ~~~~~                    
      |                             |
      |                             int64_t {aka long int}
ggml.c:14584:57: warning: format '%lld' expects argument of type 'long long int', but argument 9 has type 'int64_t' {aka 'long int'} [-Wformat=]
14584 |     fprintf(fout, "%-6s %-12s %8d %8lld %8lld %8lld %8lld %16zu %16zu %16zu %16zu %16p %16s\n",
      |                                                     ~~~~^
      |                                                         |
      |                                                         long long int
      |                                                     %8ld
......
14588 |             ne[0], ne[1], ne[2], ne[3],
      |                                  ~~~~~                   
      |                                    |
      |                                    int64_t {aka long int}
ggml.c: In function 'ggml_graph_export_node':
ggml.c:14598:44: warning: format '%lld' expects argument of type 'long long int', but argument 7 has type 'int64_t' {aka 'long int'} [-Wformat=]
14598 |     fprintf(fout, "%-6s %-6s %-12s %8d %8lld %8lld %8lld %8lld %16zu %16zu %16zu %16zu %8d %16p %16s\n",
      |                                        ~~~~^
      |                                            |
      |                                            long long int
      |                                        %8ld
......
14603 |             ne[0], ne[1], ne[2], ne[3],
      |             ~~~~~                           
      |               |
      |               int64_t {aka long int}
ggml.c:14598:50: warning: format '%lld' expects argument of type 'long long int', but argument 8 has type 'int64_t' {aka 'long int'} [-Wformat=]
14598 |     fprintf(fout, "%-6s %-6s %-12s %8d %8lld %8lld %8lld %8lld %16zu %16zu %16zu %16zu %8d %16p %16s\n",
      |                                              ~~~~^
      |                                                  |
      |                                                  long long int
      |                                              %8ld
......
14603 |             ne[0], ne[1], ne[2], ne[3],
      |                    ~~~~~                          
      |                      |
      |                      int64_t {aka long int}
ggml.c:14598:56: warning: format '%lld' expects argument of type 'long long int', but argument 9 has type 'int64_t' {aka 'long int'} [-Wformat=]
14598 |     fprintf(fout, "%-6s %-6s %-12s %8d %8lld %8lld %8lld %8lld %16zu %16zu %16zu %16zu %8d %16p %16s\n",
      |                                                    ~~~~^
      |                                                        |
      |                                                        long long int
      |                                                    %8ld
......
14603 |             ne[0], ne[1], ne[2], ne[3],
      |                           ~~~~~                         
      |                             |
      |                             int64_t {aka long int}
ggml.c:14598:62: warning: format '%lld' expects argument of type 'long long int', but argument 10 has type 'int64_t' {aka 'long int'} [-Wformat=]
14598 |     fprintf(fout, "%-6s %-6s %-12s %8d %8lld %8lld %8lld %8lld %16zu %16zu %16zu %16zu %8d %16p %16s\n",
      |                                                          ~~~~^
      |                                                              |
      |                                                              long long int
      |                                                          %8ld
......
14603 |             ne[0], ne[1], ne[2], ne[3],
      |                                  ~~~~~                        
      |                                    |
      |                                    int64_t {aka long int}
ggml.c: In function 'ggml_graph_export':
ggml.c:14631:34: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'uint64_t' {aka 'long unsigned int'} [-Wformat=]
14631 |         fprintf(fout, "%-16s %8llu\n", "eval",    size_eval);
      |                              ~~~~^                ~~~~~~~~~
      |                                  |                |
      |                                  |                uint64_t {aka long unsigned int}
      |                                  long long unsigned int
      |                              %8lu
ggml.c: In function 'ggml_graph_import':
ggml.c:14865:9: warning: ignoring return value of 'fread' declared with attribute 'warn_unused_result' [-Wunused-result]
14865 |         fread(data->data, sizeof(char), fsize, fin);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ggml.c:168:
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h: In function 'ggml_vec_dot_f16':
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h:29182:1: error: inlining failed in call to 'always_inline' 'vfmaq_f16': target specific option mismatch
29182 | vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
      | ^~~~~~~~~
ggml.c:1645:37: note: called from here
 1645 |     #define GGML_F16x8_FMA(a, b, c) vfmaq_f16(a, b, c)
      |                                     ^~~~~~~~~~~~~~~~~~
ggml.c:1669:41: note: in expansion of macro 'GGML_F16x8_FMA'
 1669 |     #define GGML_F16_VEC_FMA            GGML_F16x8_FMA
      |                                         ^~~~~~~~~~~~~~
ggml.c:2151:22: note: in expansion of macro 'GGML_F16_VEC_FMA'
 2151 |             sum[j] = GGML_F16_VEC_FMA(sum[j], ax[j], ay[j]);
      |                      ^~~~~~~~~~~~~~~~
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h:29182:1: error: inlining failed in call to 'always_inline' 'vfmaq_f16': target specific option mismatch
29182 | vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
      | ^~~~~~~~~
ggml.c:1645:37: note: called from here
 1645 |     #define GGML_F16x8_FMA(a, b, c) vfmaq_f16(a, b, c)
      |                                     ^~~~~~~~~~~~~~~~~~
ggml.c:1669:41: note: in expansion of macro 'GGML_F16x8_FMA'
 1669 |     #define GGML_F16_VEC_FMA            GGML_F16x8_FMA
      |                                         ^~~~~~~~~~~~~~
ggml.c:2151:22: note: in expansion of macro 'GGML_F16_VEC_FMA'
 2151 |             sum[j] = GGML_F16_VEC_FMA(sum[j], ax[j], ay[j]);
      |                      ^~~~~~~~~~~~~~~~
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h:28760:1: error: inlining failed in call to 'always_inline' 'vaddq_f16': target specific option mismatch
28760 | vaddq_f16 (float16x8_t __a, float16x8_t __b)
      | ^~~~~~~~~
ggml.c:1651:22: note: called from here
 1651 |             x[2*i] = vaddq_f16(x[2*i], x[2*i+1]);                 \
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1672:41: note: in expansion of macro 'GGML_F16x8_REDUCE'
 1672 |     #define GGML_F16_VEC_REDUCE         GGML_F16x8_REDUCE
      |                                         ^~~~~~~~~~~~~~~~~
ggml.c:2156:5: note: in expansion of macro 'GGML_F16_VEC_REDUCE'
 2156 |     GGML_F16_VEC_REDUCE(sumf, sum);
      |     ^~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h:28760:1: error: inlining failed in call to 'always_inline' 'vaddq_f16': target specific option mismatch
28760 | vaddq_f16 (float16x8_t __a, float16x8_t __b)
      | ^~~~~~~~~
ggml.c:1654:22: note: called from here
 1654 |             x[4*i] = vaddq_f16(x[4*i], x[4*i+2]);                 \
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1672:41: note: in expansion of macro 'GGML_F16x8_REDUCE'
 1672 |     #define GGML_F16_VEC_REDUCE         GGML_F16x8_REDUCE
      |                                         ^~~~~~~~~~~~~~~~~
ggml.c:2156:5: note: in expansion of macro 'GGML_F16_VEC_REDUCE'
 2156 |     GGML_F16_VEC_REDUCE(sumf, sum);
      |     ^~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h:28760:1: error: inlining failed in call to 'always_inline' 'vaddq_f16': target specific option mismatch
28760 | vaddq_f16 (float16x8_t __a, float16x8_t __b)
      | ^~~~~~~~~
ggml.c:1657:22: note: called from here
 1657 |             x[8*i] = vaddq_f16(x[8*i], x[8*i+4]);                 \
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1672:41: note: in expansion of macro 'GGML_F16x8_REDUCE'
 1672 |     #define GGML_F16_VEC_REDUCE         GGML_F16x8_REDUCE
      |                                         ^~~~~~~~~~~~~~~~~
ggml.c:2156:5: note: in expansion of macro 'GGML_F16_VEC_REDUCE'
 2156 |     GGML_F16_VEC_REDUCE(sumf, sum);
      |     ^~~~~~~~~~~~~~~~~~~
make: *** [Makefile:211: ggml.o] Error 1

@dcordi
Copy link

dcordi commented Jun 6, 2023

On my configuration with Virtualbox I added the following lines in the source code to disable AVX2 support.

in the ggml.c file

line 1528: #undef AVX2
just before the lines:
static void ggml_vec_dot_q4_0_q8_0(const int n, float * restrict s, const void * restrict vx, const void * restrict vy);
static void ggml_vec_dot_q4_1_q8_1(const int n, float * restrict s, const void * restrict vx, const void * restrict vy);
static void ggml_vec_dot_q5_0_q8_0(const int n, float * restrict s, const void * restrict vx, const void * restrict vy);
...

in the ggml-quants-k.c file

line 46: #undef AVX2
just before the lines:
//
// ===================== Helper functions
//
static inline int nearest_int(float fval) {
...

A make command and away you go

@muxx
Copy link

muxx commented Aug 21, 2023

The same error output in the same environment on last master

MoonKraken added a commit to MoonKraken/rusty_llama that referenced this issue Sep 1, 2023
It doesn't work on macOS, due to an issue that seems to originate in
llama.cpp: ggerganov/llama.cpp#1655
@AndreasKunar
Copy link
Contributor

@Kangmo, @muxx, @MoonKraken

I found a solution for my using llama.cpp in Apple Silicon Linux VMs (and probably also Docker on Apple Silicon) without changing any code. Maybe this also helps your issue and you can close it

Just build with the following command for Apple Silicon Linux VMs:
UNAME_M=arm64 UNAME_p=arm LLAMA_NO_METAL=1 make

@hkevicBB
Copy link

@Kangmo, @muxx, @MoonKraken

I found a solution for my using llama.cpp in Apple Silicon Linux VMs (and probably also Docker on Apple Silicon) without changing any code. Maybe this also helps your issue and you can close it

Just build with the following command for Apple Silicon Linux VMs: UNAME_M=arm64 UNAME_p=arm LLAMA_NO_METAL=1 make

this appears to be working for Virtualbox installation of Linux Kali as well.

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants