Replies: 4 comments 1 reply
-
Hey, I'm the guy who made this sampler ^ I don't know the llama.cpp codebase at all but might be able to offer some pointers & possibly help with the coding. The antislop sampler requires these main things to function:
There aren't any other unusual requirements. Any caching is for optimisation and not strictly necessary. On transformers, it works performantly just using model.generate without managing the cache. I was using key cache values in the notebook you linked above, but that was only to mitigate a nasty vram leak, which has since been resolved by switching to the generate function. That usage of key cache is being deprecated anyway. So long story short, you probably don't need to be messing with cache to make this work. From my brief skim it seems like llama.cpp doesn't have plugins or extensions for this kind of thing. So I suppose it's a matter of implementing it here: Let me know if you have any questions about the sampler. Btw, the notebook you linked is an outdated/buggy implementation. These are the latest: https://github.com/sam-paech/antislop-sampler/blob/main/antislop_generate.py https://colab.research.google.com/drive/1Rd3V4AN31cDytfmY9u80rzHXPD_dS6x9?usp=sharing |
Beta Was this translation helpful? Give feedback.
-
There's a logit bias option
|
Beta Was this translation helpful? Give feedback.
-
Currently I am using the LLAMA_API in Rust and AFAIK the only way to decode is to use the llama_decode function. This automatically modifies the kv_cache. The backtracking is implemented in llama-lookup and that uses llama_kv_cache_seq_rm. In my Rust program it works somehow, but every time it backtracks it generates something different, which as far as I know should not be the case. And the sampler does not provide any functionality to backtrack as it has no access to the context and/or model. But I guess this can be passed as a parameter to the sampler when registering it. As for antislop_generate.py, I played with it to the point where it started generating Vietnamese. I used the classic bomb test (I am not interested in bombs) and sometimes it even ignored sequences like "**Warning:". |
Beta Was this translation helpful? Give feedback.
-
It seems that to get backtracking to work, the KV cache needs to be reset using llama_kv_cache_seq_rm, plus the logits and sampler need to be manually cached/restored. This means that it probably can't be implemented as a regular sampler similar to the existing ones if it is to be implemented directly in llama.cpp. |
Beta Was this translation helpful? Give feedback.
-
After looking at this: https://github.com/sam-paech/antislop-sampler/blob/main/antislop_sampler.ipynb, I tried to achieve something similar using llama.cpp, but it does not work.
So far, to backtrack I've tried:
Any hints are welcome.
Beta Was this translation helpful? Give feedback.
All reactions