-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vault Leader flapping in HA mode with DynamoDB #6572
Comments
Looking at CloudWatch metrics for DynamoDB shows that this otherwise unused vault cluster is using 1.3 Read Units/Sec and 2.3 Write Units/Sec which I'm assuming are the nodes fighting over the lock |
I've also added the |
Hi. You may want to try Vault 1.1.1 which was just released as it has some improvements for DynamoDB HA handling (#5828). The race condition in that issue is a bit different, but your setup may be triggering it in other ways. |
Thanks! Just bumped to running 1.1.1 and I'm still seeing the same behavior. One interesting thing I'm noticing. When I run |
...and now I feel silly. The VAULT_ADDR I'm using will land on any of the three nodes so of course it'll show standby/active at different times. Running the same watch with the VAULT_ADDR of the local node works as expected and leadership is stable. False alarm! |
Describe the bug
I've set up a new vault cluster with three nodes using kms for auto-unseal and dynamodb for the backing store. Watching
vault status
on a node shows that leadership is changing nearly every second between all three nodes.To Reproduce
Steps to reproduce the behavior:
watch vault status
Expected behavior
The leader to be stable and change in response to a failed server or other unhealthy state.
Environment:
vault status
): 1.1.0vault version
): 1.1.0 ('36aa8c8dd1936e10ebd7a4c1d412ae0e6f7900bd')Docker Image vault:1.1.0
The vault instances are running in Kubernetes
Vault server configuration file(s):
Also am using the following env vars:
Additional context
I'm running the logs in trace mode and don't see anything that would suggest an issue or that leaders are even changing.
Let me know if any other info would be useful
The text was updated successfully, but these errors were encountered: