Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : fix non-quantization of expert gating tensors #5754

Merged
merged 1 commit into from
Feb 28, 2024

Conversation

compilade
Copy link
Collaborator

This reverts a single line from #5475 and adds a comment.

Using LLM_TN here to get the tensor name can't work, because the layer number is not known, so the string compared to the actual tensor name contains %d instead of a layer number, so it never matches, and expert gating tensors are quantized anyway.

I've added a comment so that it won't happen again by accident.

I've noticed this when refactoring some Mamba-related code (in #5328) preventing some tensors to be quantized to use LLM_TN (since not using hard-coded strings seemed cleaner), and it didn't work anymore, while checking for the suffix (as was done before for the expert gating tensors) works.

I've tested this with the smallest MoE model I could find, and it works (i.e. the ffn_gate_inp.weight tensors are not quantized, while on master, they are even though they should not).

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry about that 🤦

@ggerganov ggerganov merged commit adcb12a into ggerganov:master Feb 28, 2024
59 checks passed
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants