change default temperature of OAI compat API from 0 to 1 #7226

Kartoffelsaft · 2024-05-11T19:19:09Z

This should make the API more similar to that of OpenAI's actual API

github-actions · 2024-05-12T02:42:12Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 527 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8893.87ms p(95)=21900.26ms fails=, finish reason: stop=474 truncated=53
Prompt processing (pp): avg=102.31tk/s p(95)=434.58tk/s
Token generation (tg): avg=47.46tk/s p(95)=49.31tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=oai-temp commit=540d9b5970644896c1281bad56b2ae6ebeae5bd7

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715481093 --> 1715481727
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 642.9, 642.9, 642.9, 642.9, 642.9, 729.09, 729.09, 729.09, 729.09, 729.09, 722.42, 722.42, 722.42, 722.42, 722.42, 752.89, 752.89, 752.89, 752.89, 752.89, 793.82, 793.82, 793.82, 793.82, 793.82, 803.04, 803.04, 803.04, 803.04, 803.04, 801.41, 801.41, 801.41, 801.41, 801.41, 821.64, 821.64, 821.64, 821.64, 821.64, 827.66, 827.66, 827.66, 827.66, 827.66, 842.68, 842.68, 842.68, 842.68, 842.68, 867.25, 867.25, 867.25, 867.25, 867.25, 871.84, 871.84, 871.84, 871.84, 871.84, 892.85, 892.85, 892.85, 892.85, 892.85, 889.82, 889.82, 889.82, 889.82, 889.82, 902.11, 902.11, 902.11, 902.11, 902.11, 890.35, 890.35, 890.35, 890.35, 890.35, 889.12, 889.12, 889.12, 889.12, 889.12, 890.72, 890.72, 890.72, 890.72, 890.72, 887.99, 887.99, 887.99, 887.99, 887.99, 884.57, 884.57, 884.57, 884.57, 884.57, 886.79, 886.79, 886.79, 886.79, 886.79, 890.82, 890.82, 890.82, 890.82, 890.82, 891.77, 891.77, 891.77, 891.77, 891.77, 902.32, 902.32, 902.32, 902.32, 902.32, 905.99, 905.99, 905.99, 905.99, 905.99, 906.8, 906.8, 906.8, 906.8, 906.8, 906.55, 906.55, 906.55, 906.55, 906.55, 898.5, 898.5, 898.5, 898.5, 898.5, 895.31, 895.31, 895.31, 895.31, 895.31, 893.05, 893.05, 893.05, 893.05, 893.05, 893.23, 893.23, 893.23, 893.23, 893.23, 885.52, 885.52, 885.52, 885.52, 885.52, 881.69, 881.69, 881.69, 881.69, 881.69, 879.51, 879.51, 879.51, 879.51, 879.51, 890.54, 890.54, 890.54, 890.54, 890.54, 895.04, 895.04, 895.04, 895.04, 895.04, 889.29, 889.29, 889.29, 889.29, 889.29, 883.99, 883.99, 883.99, 883.99, 883.99, 883.7, 883.7, 883.7, 883.7, 883.7, 887.95, 887.95, 887.95, 887.95, 887.95, 888.99, 888.99, 888.99, 888.99, 888.99, 890.01, 890.01, 890.01, 890.01, 890.01, 882.56, 882.56, 882.56, 882.56, 882.56, 881.13, 881.13, 881.13, 881.13, 881.13, 880.35, 880.35, 880.35, 880.35, 880.35, 873.33, 873.33, 873.33, 873.33, 873.33, 871.83, 871.83, 871.83, 871.83, 871.83, 867.02, 867.02, 867.02, 867.02, 867.02, 866.84, 866.84, 866.84, 866.84, 866.84, 866.13, 866.13, 866.13, 866.13, 866.13, 869.62, 869.62, 869.62, 869.62, 869.62, 869.13, 869.13, 869.13, 869.13, 869.13, 871.25, 871.25, 871.25, 871.25, 871.25, 874.09, 874.09, 874.09, 874.09, 874.09, 873.68, 873.68, 873.68, 873.68, 873.68, 877.2, 877.2, 877.2, 877.2, 877.2, 878.15, 878.15, 878.15, 878.15, 878.15, 876.71, 876.71, 876.71, 876.71, 876.71, 872.29, 872.29, 872.29, 872.29, 872.29, 873.29, 873.29, 873.29, 873.29, 873.29, 873.15, 873.15, 873.15]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715481093 --> 1715481727
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 43.75, 43.75, 43.75, 43.75, 43.75, 31.64, 31.64, 31.64, 31.64, 31.64, 27.85, 27.85, 27.85, 27.85, 27.85, 27.93, 27.93, 27.93, 27.93, 27.93, 29.1, 29.1, 29.1, 29.1, 29.1, 29.84, 29.84, 29.84, 29.84, 29.84, 31.04, 31.04, 31.04, 31.04, 31.04, 31.92, 31.92, 31.92, 31.92, 31.92, 32.3, 32.3, 32.3, 32.3, 32.3, 32.68, 32.68, 32.68, 32.68, 32.68, 33.08, 33.08, 33.08, 33.08, 33.08, 33.03, 33.03, 33.03, 33.03, 33.03, 32.59, 32.59, 32.59, 32.59, 32.59, 31.58, 31.58, 31.58, 31.58, 31.58, 30.01, 30.01, 30.01, 30.01, 30.01, 29.9, 29.9, 29.9, 29.9, 29.9, 28.27, 28.27, 28.27, 28.27, 28.27, 28.11, 28.11, 28.11, 28.11, 28.11, 28.35, 28.35, 28.35, 28.35, 28.35, 28.24, 28.24, 28.24, 28.24, 28.24, 28.32, 28.32, 28.32, 28.32, 28.32, 28.44, 28.44, 28.44, 28.44, 28.44, 28.51, 28.51, 28.51, 28.51, 28.51, 28.64, 28.64, 28.64, 28.64, 28.64, 28.55, 28.55, 28.55, 28.55, 28.55, 28.68, 28.68, 28.68, 28.68, 28.68, 28.91, 28.91, 28.91, 28.91, 28.91, 29.09, 29.09, 29.09, 29.09, 29.09, 29.0, 29.0, 29.0, 29.0, 29.0, 29.07, 29.07, 29.07, 29.07, 29.07, 29.33, 29.33, 29.33, 29.33, 29.33, 29.47, 29.47, 29.47, 29.47, 29.47, 29.55, 29.55, 29.55, 29.55, 29.55, 29.87, 29.87, 29.87, 29.87, 29.87, 29.86, 29.86, 29.86, 29.86, 29.86, 29.68, 29.68, 29.68, 29.68, 29.68, 29.6, 29.6, 29.6, 29.6, 29.6, 29.46, 29.46, 29.46, 29.46, 29.46, 29.65, 29.65, 29.65, 29.65, 29.65, 29.8, 29.8, 29.8, 29.8, 29.8, 29.95, 29.95, 29.95, 29.95, 29.95, 30.01, 30.01, 30.01, 30.01, 30.01, 29.87, 29.87, 29.87, 29.87, 29.87, 29.7, 29.7, 29.7, 29.7, 29.7, 29.7, 29.7, 29.7, 29.7, 29.7, 28.26, 28.26, 28.26, 28.26, 28.26, 28.0, 28.0, 28.0, 28.0, 28.0, 28.03, 28.03, 28.03, 28.03, 28.03, 28.06, 28.06, 28.06, 28.06, 28.06, 28.09, 28.09, 28.09, 28.09, 28.09, 28.08, 28.08, 28.08, 28.08, 28.08, 28.14, 28.14, 28.14, 28.14, 28.14, 28.25, 28.25, 28.25, 28.25, 28.25, 28.25, 28.25, 28.25, 28.25, 28.25, 28.27, 28.27, 28.27, 28.27, 28.27, 28.25, 28.25, 28.25, 28.25, 28.25, 28.2, 28.2, 28.2, 28.2, 28.2, 28.21, 28.21, 28.21, 28.21, 28.21, 28.38, 28.38, 28.38, 28.38, 28.38, 28.49, 28.49, 28.49, 28.49, 28.49, 28.59, 28.59, 28.59]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715481093 --> 1715481727
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.24, 0.24, 0.24, 0.24, 0.24, 0.37, 0.37, 0.37, 0.37, 0.37, 0.31, 0.31, 0.31, 0.31, 0.31, 0.11, 0.11, 0.11, 0.11, 0.11, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.29, 0.29, 0.29, 0.29, 0.29, 0.37, 0.37, 0.37, 0.37, 0.37, 0.3, 0.3, 0.3, 0.3, 0.3, 0.52, 0.52, 0.52, 0.52, 0.52, 0.42, 0.42, 0.42, 0.42, 0.42, 0.38, 0.38, 0.38, 0.38, 0.38, 0.09, 0.09, 0.09, 0.09, 0.09, 0.3, 0.3, 0.3, 0.3, 0.3, 0.21, 0.21, 0.21, 0.21, 0.21, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2, 0.2, 0.2, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.22, 0.22, 0.22, 0.22, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.28, 0.28, 0.28, 0.28, 0.28, 0.2, 0.2, 0.2, 0.2, 0.2, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.26, 0.26, 0.26, 0.26, 0.26, 0.23, 0.23, 0.23, 0.23, 0.23, 0.27, 0.27, 0.27, 0.27, 0.27, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.3, 0.3, 0.3, 0.3, 0.3, 0.58, 0.58, 0.58, 0.58, 0.58, 0.68, 0.68, 0.68, 0.68, 0.68, 0.62, 0.62, 0.62, 0.62, 0.62, 0.43, 0.43, 0.43, 0.43, 0.43, 0.19, 0.19, 0.19, 0.19, 0.19, 0.23, 0.23, 0.23, 0.23, 0.23, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.22, 0.22, 0.22, 0.22, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.23, 0.23, 0.23, 0.23, 0.23, 0.18, 0.18, 0.18, 0.18, 0.18, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.1, 0.1, 0.1, 0.1, 0.1, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715481093 --> 1715481727
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0]

mofosyne

Double checking your assertion, can confirm that at least for chat completion mode, which is what we are dealing with this PR. The default is indeed temperature=1.0

Source: https://platform.openai.com/docs/api-reference/chat/create#chat-create-temperature

temperature
number or null
The sampling temperature used for this run. If not set, defaults to 1.

Just a quick note that this is example code not the actual llama.cpp endpoint itself. But still be useful to maintain consistency.

Note that when in transcript mode, creativity/temperature is by default 0. So temperature defaults can differ between different api endpoints.

shibe2 · 2024-05-13T15:31:40Z

Different models can tolerate different temperatures. What if 1 is too high for most models that people run locally? Default in main is 0.8.

) * change default temperature of OAI compat API from 0 to 1 * make tests explicitly send temperature to OAI API

jukofyork · 2024-07-03T11:13:51Z

Different models can tolerate different temperatures. What if 1 is too high for most models that people run locally? Default in main is 0.8.

The value of 1 should work for any model assuming the logits weren't scaled whilst training.

The value of 1 actually corresponds to the model outputting "well calibrated" probability estimates, ie: if you were to plot the post-softmax probability estimates along with the empirical fraction of times the next token fell in the respective "bin" (or the log of these values more likely), then assuming log-loss (aka "cross entropy" loss) was used; you'd find that temperature=1 would make the plots line up the best.

(The inverse of this is even used to calibrate non-probabilistic models' outputs for SVM using "maximum margin" loss, etc: https://en.m.wikipedia.org/wiki/Platt_scaling)

This doesn't necessarily mean the temperature=1 will be optimal for different use cases, but it should definitely not be broken and likely the best default IMO.

change default temperature of OAI compat API from 0 to 1

a880154

Kartoffelsaft mentioned this pull request May 11, 2024

server : improvements and maintenance #4216

Open

10 tasks

make tests explicitly send temperature to OAI API

540d9b5

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label May 12, 2024

mofosyne approved these changes May 12, 2024

View reviewed changes

mofosyne added examples server/api labels May 12, 2024

mofosyne merged commit e586ee4 into ggerganov:master May 13, 2024
64 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change default temperature of OAI compat API from 0 to 1 #7226

change default temperature of OAI compat API from 0 to 1 #7226

Kartoffelsaft commented May 11, 2024

github-actions bot commented May 12, 2024

mofosyne left a comment •

edited

Loading

shibe2 commented May 13, 2024

jukofyork commented Jul 3, 2024 •

edited

Loading

change default temperature of OAI compat API from 0 to 1 #7226

change default temperature of OAI compat API from 0 to 1 #7226

Conversation

Kartoffelsaft commented May 11, 2024

github-actions bot commented May 12, 2024

mofosyne left a comment • edited Loading

Choose a reason for hiding this comment

shibe2 commented May 13, 2024

jukofyork commented Jul 3, 2024 • edited Loading

mofosyne left a comment •

edited

Loading

jukofyork commented Jul 3, 2024 •

edited

Loading