ollama 加载 glm-4-9b-chat 胡言乱语 #521

siegrainwong · 2024-08-30T04:20:08Z

cuda: 12.6
transformer: 4.44.0
OS: win10
python: 3.11.4
ollama: 0.3.8 & 0.2.3
配置: RTX3090 12700kf

No response

download gguf model from https://www.modelscope.cn/models/llm-research/glm-4-9b-chat-gguf/files
ollama create xxx
ollama serve & open open-webui

只要我不点停就会一直写下去，没在别的model上发现过这种情况（gemma2-7b\ yi-9b），根据以往记录下了0.2.3的ollama但响应差不多

跑原模型时挺正常

The text was updated successfully, but these errors were encountered:

zhipuch · 2024-08-30T06:29:30Z

siegrainwong · 2024-08-30T07:38:55Z

开过flash attention，不起作用

zhipuch self-assigned this Aug 30, 2024

Provide feedback