Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to revoke entry (certificate not found) #9146

Closed
pellepelster opened this issue Jun 4, 2020 · 5 comments
Closed

failed to revoke entry (certificate not found) #9146

pellepelster opened this issue Jun 4, 2020 · 5 comments
Labels
bug Used to indicate a potential bug docs secret/pki

Comments

@pellepelster
Copy link

pellepelster commented Jun 4, 2020

Describe the bug

We are experiencing a lot of errors in the vault log around vault trying to expire leases:

2020-06-03T12:28:28.896+0200 [ERROR] expiration: failed to revoke lease: lease_id=ca-servers/issue/hivemq_broker/a477c3e2-3a03-f004-d6fd-c7d268da31a9 error="failed to revoke entry: resp: &logical.Response{Secret:<nil>, Auth:<nil>, Data:map[string]interface {}{"error":"certificate with serial 26:a2:67:e4:ed:56:aa:e8:96:40:8a:dc:d5:a2:89:15:5e:a4:b6:fb not found"}, Redirect:"", Warnings:[]string(nil), WrapInfo:(*wrapping.ResponseWrapInfo)(nil), Headers:map[string][]string(nil)} err: <nil>"
2020-06-03T12:28:29.154+0200 [ERROR] expiration: failed to revoke lease: lease_id=ca-servers/issue/identity_service/016e2c4c-1acf-69d4-b3a5-ee0334532915 error="failed to revoke entry: resp: &logical.Response{Secret:<nil>, Auth:<nil>, Data:map[string]interface {}{"error":"certificate with serial 03:1f:91:b9:c3:e5:67:e2:ad:4d:e2:33:26:1e:f8:78:37:f7:9f:b2 not found"}, Redirect:"", Warnings:[]string(nil), WrapInfo:(*wrapping.ResponseWrapInfo)(nil), Headers:map[string][]string(nil)} err: <nil>"
2020-06-03T12:28:29.555+0200 [DEBUG] expiration: leases loading: progress=1500

Especially when starting vault this leads to thousand of error log messages and during this time the whole CRL functionality seems to be unresponsive.

To Reproduce
We were not able to figure out how we came into this position, we figured out then when we change

return logical.ErrorResponse(fmt.Sprintf("certificate with serial %s not found", serial)), nil

to

return nil, nil

vault is eventually able to work through all the leases it wants to revoke. We guess we could also achieve this by issuing a forced revoke from the outside.

Expected behaviour
We are not sure if what we see is actually a problem but given the fact that this is logged as an error are questioning how we possibly came into the situation that vault wants to expire leases and is unable to find the accompanying certificates. could this be causes by vault tidy?

Additional context

We regularly run a vault tidy with very little buffer die clean up our PKIS. We are currently running vault 1.1.2, we also observed this behaviour with 1.4.2

@ncabatoff
Copy link
Contributor

Hi @pellepelster,

Thanks for the bug report. How are you invoking pki tidy (i.e. what arguments are you providing)?

@pellepelster
Copy link
Author

Hi,
see the call we use for tidy

curl -X POST --header "X-Vault-Token: ${VAULT_TOKEN}" --data '{ "tidy_cert_store": true, "tidy_revoked_certs": true, "safety_buffer": "1h" }' http://vault:8200/v1/${pki}/tidy

@ncabatoff
Copy link
Contributor

Ok, it looks like when use tidy_cert_store, that will delete any expired certs, but doesn't touch the corresponding lease. Then when the expiration manager tries to clean up the lease, it wants to revoke the cert, and it's surprised to find that it's been deleted.

This is a bug: we should make cert revocation handle this gracefully by tolerating a deleted cert when the lease entry shows it was expired. We should add warnings to the Tidy api docs as well: tidy_cert_store should probably not be used except to clean up specific kinds of problems, and because it doesn't go through the proper revocation path, will not populate the CRL for the certs it deletes.

@ncabatoff ncabatoff added bug Used to indicate a potential bug docs secret/pki and removed waiting-for-response labels Jun 8, 2020
@pellepelster
Copy link
Author

Do you think the return nil, nil approach from the initial bug is a feasible short-term solution. From our point of view it should be safe, as it tries do revoke any certs and ignores if the are not around anymore.

@cipherboy
Copy link
Contributor

cipherboy commented Aug 31, 2022

I believe the answer to the above question was yes, based on #9880. I agree with this behavior and believe this shouldn't be seen much any more in newer versions of Vault. As such, I'm inclined to close this PR as resolved; feel free to reopen if anyone disagrees.

Also regarding the above docs suggestion, tidy_cert_store=true should definitely be run when leases aren't used, so I'm inclined to believe fixing the interaction between revocation and tidy should be the correct approach (which I think is already done).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug docs secret/pki
Projects
None yet
Development

No branches or pull requests

3 participants