Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server gets stuck after invalid request #5724

Closed
ggerganov opened this issue Feb 26, 2024 · 4 comments · Fixed by #5733
Closed

Server gets stuck after invalid request #5724

ggerganov opened this issue Feb 26, 2024 · 4 comments · Fixed by #5733
Labels
bug Something isn't working server/webui

Comments

@ggerganov
Copy link
Owner

ggerganov commented Feb 26, 2024

Repro:

./server -m models/bert-bge-small/ggml-model-f16.gguf --embedding
# send invalid request
curl http://localhost:8080/v1/embeddings -H "Content-Type: application/json" -H "Authorization: Bearer no-key" -d '{ }'

# next requests makes server hang
curl http://localhost:8080/v1/embeddings -H "Content-Type: application/json" -H "Authorization: Bearer no-key" -d '{ "input": "hello" }'

# need to kill it
killall server
@ggerganov ggerganov added bug Something isn't working server/webui labels Feb 26, 2024
@z80maniac
Copy link
Contributor

Maybe it's also related to #5246

@phymbert
Copy link
Collaborator

phymbert commented Feb 26, 2024

I will add a test scenario later today, I guess @ngxson already fixed it in #5710

@ngxson
Copy link
Collaborator

ngxson commented Feb 26, 2024

No it's not fixed, in fact the issue is because if "input" does not exist inside body payload, it will be defaulted to an empty string. This causes slot to not be cleaned up correctly (it never moves back to IDLE state).

In short, you will get the same bug with { "input": "" }

I checked on OpenAI and in fact we can get embedding even if the input prompt is empty, I believe the output vector means "empty document".

The simple fix is to enforce the input to have at least one token. If it's empty, we add a space to it prompt = " ", do you think that's enough @ggerganov ?

@ngxson
Copy link
Collaborator

ngxson commented Feb 26, 2024

I think I pinpointed the bug: In the loop where we try different batch sizes, if the n_tokens = 0 then the loop is totally skipped:

// batch.n_tokens = 0
for (int32_t i = 0; i < (int32_t) batch.n_tokens; i += n_batch)

Also, the has_prompt condition checks if the prompt is empty or not in order to release it:

const bool has_prompt = slot.prompt.is_array() || (slot.prompt.is_string() && !slot.prompt.get<std::string>().empty()) || !slot.images.empty();

Still I not sure it's worth to fix this, because @ggerganov said that the new llama_decode will handle batch size itself. Also, a sequence should always have at least a BOS token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working server/webui
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants