Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to speed up model inference #1

Open
whyiug opened this issue Jun 14, 2024 · 2 comments
Open

How to speed up model inference #1

whyiug opened this issue Jun 14, 2024 · 2 comments

Comments

@whyiug
Copy link

whyiug commented Jun 14, 2024

Hi, guys, thanks for your work.
I got a question: the fixed policy templates are too long, which can seriously affect the speed of model inference, have you considered optimisation methods?
Is it possible to store kv cache. For the llamaguard, prefix KV caching can be used if it is prefixed.(This may not be possible because of the llava architecture, where the prefix is an image and not a fixed template, and the token of the image is not fixed. I was just wondering what you guys were thinking.)

@whyiug
Copy link
Author

whyiug commented Jul 2, 2024

Here's an idea, put the policy in the system prompt.

@lukashelff
Copy link
Contributor

Thank you for the hint. Initially, we also thought about stating the policy within our system prompt. Unfortunately, the conversation templates are implemented relatively statically in the training code of llava. So far, we haven't had the chance to implement it, but the idea is very sensible, and we will probably include it in our next iteration of LlavaGuard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants