Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : rename n_ctx to kv_size #5568

Closed
wants to merge 4 commits into from
Closed

llama : rename n_ctx to kv_size #5568

wants to merge 4 commits into from

Conversation

ggerganov
Copy link
Owner

The n_ctx name is causing some confusion since it's actual meaning is the size of the KV cache, while n_ctx_train is the training context of the model

This change fixes that, but since it is a big one and touches a lot of stuff, I'm not sure if it worth merging. Maybe sometime in the future, when the time is right

Original PR: #5546

@ggerganov ggerganov added breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. refactoring Refactoring labels Feb 18, 2024
@bobqianic
Copy link
Contributor

Does n_ctx in whisper.cpp also refer to the size of the KV cache?

@ggerganov
Copy link
Owner Author

In the decoder - yes

@@ -1545,7 +1545,7 @@ struct llama_hparams {
int32_t n_tokens;

// llm_build_context
static constexpr int32_t n_kv = 32; // size of KV cache to consider (n_kv <= n_ctx
static constexpr int32_t n_kv = 32; // size of KV cache to consider (n_kv <= kv_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this commend was missing a closing )

@compilade
Copy link
Collaborator

compilade commented Feb 21, 2024

I do not agree with this change (but I like the underlying intention of making llama.cpp less confusing).

As I'm working on supporting Mamba in llama.cpp (see #5328), I'd like to warn that renaming n_ctx to kv_size would make it harder to support non-Transformer architectures in a straightforward way.

With Mamba, the KV cache size is tied to the maximum number of distinct sequences processed at the same time. Not the "context size". n_ctx is still used to limit the maximum number of processed tokens, which is fine, because some examples need a fixed size for the buffer of input tokens (e.g. in server, lookahead, lookup, parallel, and perplexity when calling llama_batch_init).

What I propose instead (and this is what I've started doing in #5328) is to keep n_ctx, but use kv_self.size instead of n_ctx in the places where it's really the KV cache size that is meant, because Mamba breaks the equivalence of n_ctx with kv_self.size.

TL;DR: renaming n_ctx to kv_size makes it harder to decouple the context size from the KV cache size.

@ggerganov ggerganov closed this Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. refactoring Refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants