llama : fix non-quantization of expert gating tensors #5754

compilade · 2024-02-27T20:05:17Z

This reverts a single line from #5475 and adds a comment.

Using LLM_TN here to get the tensor name can't work, because the layer number is not known, so the string compared to the actual tensor name contains %d instead of a layer number, so it never matches, and expert gating tensors are quantized anyway.

I've added a comment so that it won't happen again by accident.

I've noticed this when refactoring some Mamba-related code (in #5328) preventing some tensors to be quantized to use LLM_TN (since not using hard-coded strings seemed cleaner), and it didn't work anymore, while checking for the suffix (as was done before for the expert gating tensors) works.

I've tested this with the smallest MoE model I could find, and it works (i.e. the ffn_gate_inp.weight tensors are not quantized, while on master, they are even though they should not).

This reverts a single line from ggerganov#5475

ggerganov

Oh, sorry about that 🤦

This reverts a single line from ggerganov#5475

llama : fix non-quantization of expert gating tensors

969be5d

This reverts a single line from ggerganov#5475

cebtenzzre approved these changes Feb 27, 2024

View reviewed changes

ggerganov approved these changes Feb 28, 2024

View reviewed changes

ggerganov merged commit adcb12a into ggerganov:master Feb 28, 2024
59 checks passed

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

llama : fix non-quantization of expert gating tensors (ggerganov#5754)

70ddbcc

This reverts a single line from ggerganov#5475

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

llama : fix non-quantization of expert gating tensors (ggerganov#5754)

d037c8f

This reverts a single line from ggerganov#5475

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : fix non-quantization of expert gating tensors #5754

llama : fix non-quantization of expert gating tensors #5754

compilade commented Feb 27, 2024

ggerganov left a comment

llama : fix non-quantization of expert gating tensors #5754

llama : fix non-quantization of expert gating tensors #5754

Conversation

compilade commented Feb 27, 2024

ggerganov left a comment

Choose a reason for hiding this comment