-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : greatly reduce output buffer memory usage #6122
Commits on Mar 17, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1fd1918 - Browse repository at this point
Copy the full SHA 1fd1918View commit details -
Configuration menu - View commit details
-
Copy full SHA for 98914c0 - Browse repository at this point
Copy the full SHA 98914c0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 705d393 - Browse repository at this point
Copy the full SHA 705d393View commit details -
Configuration menu - View commit details
-
Copy full SHA for 25981fc - Browse repository at this point
Copy the full SHA 25981fcView commit details -
perplexity : fix Winogrande, use correct logits for second choice start
The first logits used to evaluate the second choice were not from the end of the common prefix; instead, they were the logits from the end of the first choice. This has been corrected. The previous implementation sometimes had outliers in the scores of choices for some tasks, and the logic to skip choices words in the log-likelihood evaluation probably was an attempt to reduce those, but it was complex and didn't quite seem to be the right thing. This is simpler now, and the outlier scores aren't there anymore.
Configuration menu - View commit details
-
Copy full SHA for 17b45c9 - Browse repository at this point
Copy the full SHA 17b45c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for d0129e8 - Browse repository at this point
Copy the full SHA d0129e8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 487f89e - Browse repository at this point
Copy the full SHA 487f89eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 408fcb0 - Browse repository at this point
Copy the full SHA 408fcb0View commit details -
llama : fix wrong n_outputs in llama_set_inputs
A mismatch happened when using a smaller n_ubatch than n_batch and then using llama_batch_get_one(). The decision of what n_outputs should be now almost fully depends on how lctx.n_outputs is set in llama_decode_internal. The conditions are simpler this way. * llama : when saving the state, recalculate n_outputs This ensures the correct number of outputs for the entire previous batch is stored in the session file, even when n_ubatch is smaller than n_batch.
Configuration menu - View commit details
-
Copy full SHA for e19cb3a - Browse repository at this point
Copy the full SHA e19cb3aView commit details
Commits on Mar 18, 2024
-
Configuration menu - View commit details
-
Copy full SHA for a57fa7f - Browse repository at this point
Copy the full SHA a57fa7fView commit details -
llama : fix running a batch with n_outputs == 0
It previously worked because lctx.inp_out_ids was not initialized, so it pointed to some garbage address which was somehow still valid when I ran my tests.
Configuration menu - View commit details
-
Copy full SHA for 711b0bc - Browse repository at this point
Copy the full SHA 711b0bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for d100502 - Browse repository at this point
Copy the full SHA d100502View commit details -
ggml : saner ggml_can_repeat with empty tensors
* ggml : future-proof ggml_is_empty by using GGML_MAX_DIMS - 1
Configuration menu - View commit details
-
Copy full SHA for 99c37cc - Browse repository at this point
Copy the full SHA 99c37ccView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6bf7f3f - Browse repository at this point
Copy the full SHA 6bf7f3fView commit details
Commits on Mar 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 09bb15a - Browse repository at this point
Copy the full SHA 09bb15aView commit details -
llama : use a vector for ctx->output_ids
* llama : rework reallocation logic for llama_output_reserve Now comparing the actual size with the new total size of the output buffer to allow more efficient enabling and disabling of the embeddings and/or logits output in the future.
Configuration menu - View commit details
-
Copy full SHA for 4551e7e - Browse repository at this point
Copy the full SHA 4551e7eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8b826c5 - Browse repository at this point
Copy the full SHA 8b826c5View commit details -
Configuration menu - View commit details
-
Copy full SHA for d04cfaf - Browse repository at this point
Copy the full SHA d04cfafView commit details -
perplexity : make Winogrande work as it does on master
The problems with the Winogrande implementation will need to be fixed in a separate PR to ease review.
Configuration menu - View commit details
-
Copy full SHA for 8f70dcb - Browse repository at this point
Copy the full SHA 8f70dcbView commit details -
llama : clearer error messages for invalid logits or embeddings ids
* llama : assert all models that can have inp_out_ids Since the graph topology is now constant, this presence check can be done even when there are no outputs. * llama : assert logits and embd buffers exist before writing to them
Configuration menu - View commit details
-
Copy full SHA for 615a3a4 - Browse repository at this point
Copy the full SHA 615a3a4View commit details
Commits on Mar 21, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 7d8d6b5 - Browse repository at this point
Copy the full SHA 7d8d6b5View commit details -
perplexity : make hellaswag and multiple-choice outputs identical to …
…master Due to how the KV cache is updated, the logprobs for tokens in a batch are very slightly affected by the other tokens present in the batch, so to make hellaswag and multiple-choice return exactly the same results as on master, the last token of each sequence needs to be evaluated even though its output is not used at all. This will probably be changed back in the future to make these benchmarks a tiny bit faster. * perplexity : fix division by zero when using less than 100 multiple-choice tasks
Configuration menu - View commit details
-
Copy full SHA for 5f33a67 - Browse repository at this point
Copy the full SHA 5f33a67View commit details
Commits on Mar 25, 2024
-
Merge branch 'master' into compilade/smaller-output-buffer
Notably includes the new repetition penalty default, support for grok-1, and support for split GGUF.
Configuration menu - View commit details
-
Copy full SHA for ffa9abd - Browse repository at this point
Copy the full SHA ffa9abdView commit details
Commits on Mar 26, 2024
-
llama : allow loading state saved with a different ctx size
When loading a session file, the context size is now only required to be at least enough to load the KV cells contained in that session file, instead of requiring to use exactly the same context size as when saving. Doing this enables the use-case of extending or shrinking the context size of a saved session. This breaks existing session files because the meaning of kv_buf_size is slightly changed (previously it was the size of the whole KV cache, now it's only the size of the saved part of it). This allows for finer-grained sanity checks when loading in an effort to keep kv_buf_size useful even when the kv_size is changed.
Configuration menu - View commit details
-
Copy full SHA for e9095ac - Browse repository at this point
Copy the full SHA e9095acView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5027d81 - Browse repository at this point
Copy the full SHA 5027d81View commit details -
Configuration menu - View commit details
-
Copy full SHA for 20248e8 - Browse repository at this point
Copy the full SHA 20248e8View commit details