SLOP Removal #9699

darxkies · 2024-09-30T19:25:57Z

darxkies
Sep 30, 2024

After looking at this: https://github.com/sam-paech/antislop-sampler/blob/main/antislop_sampler.ipynb, I tried to achieve something similar using llama.cpp, but it does not work.

So far, to backtrack I've tried:

llama_kv_cache_seq_rm & llama_kv_cache_update
llama_state_set_data & llama_state_get_data
llama_state_seq_set_data & llama_state_seq_get_data

Any hints are welcome.

sam-paech · 2024-10-01T04:07:24Z

sam-paech
Oct 1, 2024

Hey, I'm the guy who made this sampler ^

I don't know the llama.cpp codebase at all but might be able to offer some pointers & possibly help with the coding.

The antislop sampler requires these main things to function:

raw logits per token
ability to bias logits
the sampling involves backtracking so there needs to be an intermediate process to manage the sampler state & buffer

There aren't any other unusual requirements. Any caching is for optimisation and not strictly necessary. On transformers, it works performantly just using model.generate without managing the cache. I was using key cache values in the notebook you linked above, but that was only to mitigate a nasty vram leak, which has since been resolved by switching to the generate function. That usage of key cache is being deprecated anyway.

So long story short, you probably don't need to be messing with cache to make this work.

From my brief skim it seems like llama.cpp doesn't have plugins or extensions for this kind of thing. So I suppose it's a matter of implementing it here:

https://github.com/ggerganov/llama.cpp/blob/6f1d9d71f4c568778a7637ff6582e6f6ba5fb9d3/src/llama-sampling.cpp

Let me know if you have any questions about the sampler. Btw, the notebook you linked is an outdated/buggy implementation. These are the latest:

https://github.com/sam-paech/antislop-sampler/blob/main/antislop_generate.py

https://colab.research.google.com/drive/1Rd3V4AN31cDytfmY9u80rzHXPD_dS6x9?usp=sharing

0 replies

nisten · 2024-10-01T16:07:25Z

nisten
Oct 1, 2024

After looking at this: https://github.com/sam-paech/antislop-sampler/blob/main/antislop_sampler.ipynb, I tried to achieve something similar using llama.cpp, but it does not work.

So far, to backtrack I've tried:

llama_kv_cache_seq_rm & llama_kv_cache_update

llama_state_set_data & llama_state_get_data

llama_state_seq_set_data & llama_state_seq_get_data

Any hints are welcome.

There's a logit bias option

--ignore-eos ignore end of stream token and continue generating (implies --logit-bias EOS-inf)

-l,  --logit-bias TOKEN_ID(+/-)BIAS   modifies the likelihood of token appearing in the completion,
                                      i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello',
                                      or `--logit-bias 15043-1` to decrease likelihood of token ' Hello' ```
                                      
This might be easy

1 reply

darxkies Oct 1, 2024
Author

Anti-slop spans over several consecutive tokens and that option is not enough.

darxkies · 2024-10-01T16:33:45Z

darxkies
Oct 1, 2024
Author

Currently I am using the LLAMA_API in Rust and AFAIK the only way to decode is to use the llama_decode function. This automatically modifies the kv_cache.

The backtracking is implemented in llama-lookup and that uses llama_kv_cache_seq_rm. In my Rust program it works somehow, but every time it backtracks it generates something different, which as far as I know should not be the case.

And the sampler does not provide any functionality to backtrack as it has no access to the context and/or model. But I guess this can be passed as a parameter to the sampler when registering it.

As for antislop_generate.py, I played with it to the point where it started generating Vietnamese. I used the classic bomb test (I am not interested in bombs) and sometimes it even ignored sequences like "**Warning:".

0 replies

darxkies · 2024-10-01T19:42:36Z

darxkies
Oct 1, 2024
Author

It seems that to get backtracking to work, the KV cache needs to be reset using llama_kv_cache_seq_rm, plus the logits and sampler need to be manually cached/restored.

This means that it probably can't be implemented as a regular sampler similar to the existing ones if it is to be implemented directly in llama.cpp.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLOP Removal #9699

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

SLOP Removal #9699

darxkies Sep 30, 2024

Replies: 4 comments · 1 reply

sam-paech Oct 1, 2024

nisten Oct 1, 2024

darxkies Oct 1, 2024 Author

darxkies Oct 1, 2024 Author

darxkies Oct 1, 2024 Author

darxkies
Sep 30, 2024

Replies: 4 comments 1 reply

sam-paech
Oct 1, 2024

nisten
Oct 1, 2024

darxkies Oct 1, 2024
Author

darxkies
Oct 1, 2024
Author

darxkies
Oct 1, 2024
Author